mathspp.com feed Stay up-to-date with the articles on mathematics and programming that get published to mathspp.com. 2023-10-03T11:25:04+02:00 Rodrigo Girão Serrão https://mathspp.com/blog/til TIL #084 – type statement and type aliases https://mathspp.com/blog/til/type-statement-and-type-aliases 2023-10-03T11:25:04+02:00 2023-10-03T10:00:00+02:00

Today I learned about the Python 3.12 type statement you can use to create type aliases.

# type statement and type aliases

Python 3.12 introduced the soft keyword type, which is used in the type statement to create type aliases. The type statement offers a more convenient and powerful way of creating type aliases and it supersedes typing.TypeAlias.

In its simplest form, the type statement is composed of the type keyword, the name of the type alias you are creating, and the type you are aliasing. The example below shows how to create a type alias called Point that is the same as a pair with two floats:

type Point = tuple[float, float]

Before the introduction of the type statement, you could create a type alias via a regular assignment:

Point = tuple[float, float]

You could also annotate Point with typing.TypeAlias to make it clear that you were creating a type alias:

from typing import TypeAlias

Point: TypeAlias = tuple[float, float]

So, why do we care about the type statement?

## Forward references

One of the advantages of the type statement is that it supports forward referencing without having to quote the names of the types you are refering to. This is possible because the type value is lazily evaluated.

For example, suppose we want to create a recursive type for a linked list, where a linked list is a tuple with two elements: an integer and a linked list (the remainder of the linked list). In 3.12, you could write it as such:

type LinkedList = tuple[int, LinkedList]

The self-reference works just fine, and so does the forward reference of the example below:

type A = tuple[B, C, D]
type B = int
type C = str
type D = list[str]

In Python 3.11 and earlier, you'd have to quote the forward references or the self-reference of the first example, like so:

from typing import TypeAlias

A: TypeAlias = tuple["A", "B", "C"]
B: TypeAlias = int
C: TypeAlias = str
D: TypeAlias = list[str]

## Generic type aliases

Type aliases can also be made generic. For example, the linked list could be a linked list of any kind of value we want, not just integers.

We could type the linked list like so:

type LinkedList[T] = T | tuple[T, LinkedList[T]]

This means that a linked list is either a value of its type T or it is a pair with a value and a linked list.

For example, the variable ll below defines a linked list of integers:

ll: LinkedList[int] = (42, (73, (10, (16, 0))))

This is just the tip of the iceberg. Generics were also improved in Python 3.12, so there's even more you can do. You can take a look at the references below to learn more about this.

]]>
TIL #083 – sentinel value for default arguments https://mathspp.com/blog/til/sentinel-value-for-default-arguments 2023-09-16T00:30:21+02:00 2023-09-15T21:00:00+02:00

Today I learned how to create a sentinel value to use as a default argument in a way that respects Python typing.

# Sentinel value for default arguments

## Context

Today at work I had a problem. I was working on a progress bar that Textual has: The ProgressBar class has a method update that you can use to update the status of the progress bar. The method update looked roughly like this:

class ProgressBar(...):
...

def update(
self,
total: float | None = None,
progress: float | None = None,
advance: float | None = None,
) -> None:
"""Update the status of the progress bar."""
if total is not None:
self.total = total
if progress is not None:
self.progress = progress
self.progress += advance

By calling update, you can change the total “size” of the progress bar (total), you can change how far along you are with the progress (progress), and you can increment the current progress by a given amount (advance).

For example, if you create a progress bar pb and then call pb.update(total=100, progress=30, advance=7), your progress bar ends up at 37% completion and it would look something like this: What's the problem, then?

## Using another default value instead of None

The default value for total is None, but the truth is that I also want to be able to set total to None, because that turns the progress bar into its indeterminate/pulsing state. So, how can I do this?

The answer is obvious: just use another default value!

That's right, but I can't use any random default value, because then the typing will look odd! For example, I could use the string "default", and modify the method to look like this:

class ProgressBar(...):
...

def update(
self,
total: float | str | None = "default",
progress: float | None = None,
advance: float | None = None,
) -> None:
"""Update the status of the progress bar."""
if not isinstance(total, str):
self.total = total
if progress is not None:
self.progress = progress
self.progress += advance

The problem with this, I argue, is that it looks really odd to have total accept values of the type str, when it really doesn't, it's just so that the typing matches the default value given...

In other words, if I just had

...
total: float | None = "default",
...

then the typing would be wrong. And I want to have the code typed correctly, if possible.

So, how do we solve this?

## Using object as the default value

My next thought went to using object as the default value. So, something like this:

_sentinel = object()

class ProgressBar(...):
...

def update(
self,
total: float | None | object = _sentinel,
progress: float | object = _sentinel,
advance: float | object = _sentinel,
) -> None:
"""Update the status of the progress bar."""
if total is not _sentinel:
self.total...
]]>
TIL #082 – 20 million lines of Python code https://mathspp.com/blog/til/20-million-lines-of-python-code 2023-09-07T13:27:41+02:00 2023-09-07T11:00:00+02:00

Today I learned that the largest file ever published to PyPI has 20 MILLION lines of code.

# 20 million lines of Python code

Today I'm attending PyCon Portugal 2023 and Tom Forbes, the Thursday keynoter, presented the largest Python file ever published to PyPI. It is a Python file with over 20 million lines of Python code. MILLION.

What does this file do, you might ask?

I'll give you a hint. This file comes from a project called EvenOrOdd... Can you see where we are going?

The file implements a single function isEven that starts like this:

def isEven(num):
if num == 0:
return True
elif num == 1:
return False
elif num == 2:
return True
elif num == 3:
return False
# ...

If you scroll down for long enough, you'll eventually reach the end of the function, which looks like this:

def isEven(num):
# ...
elif num == 1048571:
return False
elif num == 1048572:
return True
elif num == 1048573:
return False
elif num == 1048574:
return True
elif num == 1048575:
return False
else:
raise Exception("Number is not within bounds")

That's a pretty silly function! Why on Earth would it stop at 1,048,575? (Starting at 0 and going up to 1_048_575 means that the function isEven covers $$2^{20}$$ integers.)

I installed the package:

python -m pip install evenorodd

And then decided to test it, just to make sure none of the branches were wrong:

>>> from EvenOrOdd
>>> for i in range(2 ** 20):
...     assert EvenOrOdd.isEven(i) == (not (i % 2))

Given that this loop is taking a long time to finish, I wonder if the difference between isEven(0) and isEven(1048575) is noticeable.

Using the module timeit, I tried checking how fast I could compute isEven(0) versus isEven(1048575) and I got these numbers:

>>> from timeit import timeit

>>> setup = "from EvenOrOdd.EvenOrOdd import isEven"
>>> timeit("isEven(0)", setup=setup, number=1000)
2.8417000066838227e-05
>>> timeit("isEven(1048575)", setup=setup, number=1000)
6.486064583999905
>>> _ / 1000
0.0064860645839999054

The timings above might be different on your machine, but the relative comparisons should be somewhat similar, and the numbers above show that it is faster to call isEven(0) 1000 times than it is to compute isEven(1048575) once.

We can take this further:

>>> timeit("isEven(0)", setup=setup, number=100000)  # Changed number=...
0.002839708999999857

This new timing shows that we can call isEven(0) about 100,000 times and be done before isEven(1048575) finishes computing...

What is more, I was able to write this blog post and the testing of the function isEven is still running (the for loop I shared above), so I'll interrupt that loop:

>>> for i in range(2 ** 20):
...     assert EvenOrOdd.isEven(i) == (not (i % 2))
...
^CTraceback (most recent call last):
File "<stdin>", line 2, in <module>
KeyboardInterrupt
>>> i
475308

So, as you can see, I've gone through 46% of the faster test cases. At least we know that the numbers up to 475,307 are correctly classified as even or odd by the function isEven.

]]>
TIL #081 – find commits that affected a file https://mathspp.com/blog/til/find-commits-that-affected-a-file 2023-09-06T14:37:25+02:00 2023-09-06T00:00:00+02:00

Today I learned how to find the commits that affected a specific file with git log.

# Find commits that affected a file

When using git, you can use the command git log to show a log of all commits (I like git log --oneline to show a terser log).

What I learned today is that using the command git log -- path/to/file will filter the logs to only show commits that affected that file.

## Practical example

I was looking at commit 1b61a95 of the Textual repository, which created a file tests/test_table.py with a test that I cared about.

What I wanted was to figure out how that file evolved over time because the file tests/test_table.py no longer exists and because I couldn't find the tests that existed in that file.

So, what I did was run the command git log -- tests/test_table.py and I got this output:

commit 12d429dbd0c9e86fb45251c17dbb9e15ad6624cf
Author: Darren Burns <darrenb900@gmail.com>
Date:   Tue Jan 24 11:33:35 2023 +0000

Replace DataTable row_count with property, test improvements

commit d1413ee352a79b8b5e0a8f23cee9276893896903
Author: Darren Burns <darrenb900@gmail.com>
Date:   Tue Jan 24 10:38:47 2023 +0000

Updating test to ensure row_key mapped data is correct

commit a45a3dca6a1513b9e64cb4b680d337ae3b75c1f8
Author: Josh Karpel <josh.karpel@gmail.com>
Date:   Wed Dec 21 16:57:43 2022 -0600

add option to clear columns in DataTable.clear

commit 1b61a95c7025160cbbcb74cf9562a3b057afd3e6
Author: Will McGugan <willmcgugan@gmail.com>
Date:   Thu Nov 10 16:22:52 2022 +0000

table tests

Then, using git show <commit> you can take a look at each commit and figure out what happened with the file. If the file was deleted or renamed, then the commit at the top of the log should be the commit where said file was deleted or renamed.

For example, in this case, running git log --oneline -- tests/test_table.py outputs this:

12d429dbd Replace DataTable row_count with property, test improvements
d1413ee35 Updating test to ensure row_key mapped data is correct
a45a3dca6 add option to clear columns in DataTable.clear
1b61a95c7 table tests

Then, if I do git show 12d429dbd I can scroll through the commit changes and eventually hit the part of the diff that shows that the file was deleted:

diff --git a/tests/test_table.py b/tests/test_table.py
deleted file mode 100644
index d57be5a60..000000000
--- a/tests/test_table.py
+++ /dev/null
]]>
TIL #080 – how to draw a Bézier curve https://mathspp.com/blog/til/how-to-draw-a-bezier-curve 2023-09-04T19:39:27+02:00 2023-09-04T00:00:00+02:00

Today I learned how to draw a Bézier curve with the De Casteljau's algorithm.

# How to draw a Bézier curve

Today I learned that there is a straightforward way to draw Bézier curves and I wrote a small Python program to test this out.

The code uses pygame and it creates a window with the two endpoints for the curve and a control point (so, we're drawing a quadratic Bézier curve).

The animation above shows the program running and the code has been included below.

The main part of that program is the little bit of maths that actually computes the points of the curve itself. For a given $$t \in [0, 1]$$, if $$s$$, $$e$$, and $$c$$ and the start, end, and control points, respectively, then the point $$p(t)$$ of the curve is

$\begin{cases} p_1(t) = s + (c - s) \times t \\ p_2(t) = c + (e - c) \times t \\ p(t) = p_1(t) + (p_2(t) - p_1(t)) \times t \end{cases}$

This formula is the application of the De Casteljau's algorithm to quadratic Bézier curves.

What the code does, inside the function draw_curve, is use a for loop to sample multiple values of t to approximate the curve by drawing a bunch of points.

The code below ran on Python 3.11 and pygame 2.5.1. It shouldn't have major compatibility issues with other version of Python/pygame, though, as I'm only using fairly basic functionality.

import dataclasses
from dataclasses import dataclass
import sys

import pygame
import pygame.locals

WIDTH = 800
HEIGHT = 600
ENDPOINT_COLOUR = (248, 248, 242)
BACKGROUND = (40, 42, 54)
CONTROL_POINT_COLOUR = (255, 85, 85)
CURVE_COLOUR = (189, 147, 249)
GUIDELINE_COLOUR = (139, 233, 253)

screen = pygame.display.set_mode((WIDTH, HEIGHT))
screen.fill(BACKGROUND)

@dataclass
class Curve:
start: tuple[int, int]
end: tuple[int, int]
control_point: tuple[int, int]

def dist_squared(p1, p2):
x1, y1 = p1
x2, y2 = p2
return (x2 - x1) ** 2 + (y2 - y1) ** 2

def draw_endpoint(screen, point):
pygame.draw.circle(
screen,
ENDPOINT_COLOUR,
point,
)

def draw_control_point(screen, point):
pygame.draw.circle(
screen,
CONTROL_POINT_COLOUR,
point,
)

def draw_curve(screen, curve):
pygame.draw.line(screen, GUIDELINE_COLOUR, curve.start, curve.control_point)
pygame.draw.line(screen, GUIDELINE_COLOUR, curve.end, curve.control_point)

sx, sy = curve.start
ex, ey = curve.end
cx, cy = curve.control_point

iters = 1000
for iter in range(0, iters + 1):
delta = iter / iters
m1x = sx + (cx - sx) * delta
m2x = cx + (ex - cx) * delta
m1y = sy + (cy - sy) * delta
m2y = cy + (ey - cy) * delta
px = m1x + (m2x - m1x) * delta
py = m1y + (m2y - m1y) * delta

draw_endpoint(screen, curve.start)
draw_endpoint(screen, curve.end)
draw_control_point(screen, curve.control_point)

previous_curve = None
curve = Curve((100, 100), (WIDTH - 100, 100), (WIDTH // 2, HEIGHT // 2))
dragging = False
point_being_dragged = None

print("Click a point to “pick it up” and click again to drop it.")

while True:
for event in pygame.event.get():
if (
event.type == pygame.locals.QUIT...
]]>

Today I learned how to create app notifications in Textual.

Textual has added support for application notifications for a couple of versions and I haven't had the chance to play around with them, so this tiny blog article shows how to use them.

The video above shows Textual app notifications in action. To create a notification, all you need to do is use the method .notify on a Textual app.

I created a small application that posts three different notifications when you press the three buttons. Notifications in Textual can have one of three severity levels:

1. "information" – the default;
2. "warning"; and
3. "error".

I paired each severity level with a different button and I also changed the timeout on the "information" notification so it lingers around for 10 seconds instead of the default (which I think is 2s, but I'm not sure).

The code for the app above is shown below:

from textual import on
from textual.app import App, ComposeResult
from textual.containers import Center, Middle
from textual.widgets import Button

CSS = "Button { margin: 1; }"

def compose(self) -> ComposeResult:
with Middle():
with Center():
yield Button("Click me.", id="standard")
yield Button.warning("Warning...", id="warning")
yield Button.error("Error!", id="error")

@on(Button.Pressed, "#standard")
self.notify("Standard button clicked.", timeout=10)

@on(Button.Pressed, "#warning")
self.notify("Attention, this is a warning.", severity="warning")

@on(Button.Pressed, "#error")
self.notify("An error just occurred!", severity="error")

if __name__ == "__main__":
NotificationsApp().run()

See the documentation for the method .notify to learn more!

]]>
TIL #078 – context variables https://mathspp.com/blog/til/context-variables 2023-08-31T12:30:32+02:00 2023-08-24T00:00:00+02:00

Today I learned about context variables from the module contextvars and how to use them.

# Context variables

I was doing some work on Textual, the project I work on for my job, and I came across the module contextvars. Textual uses contextvars.ContextVar in a couple of places and so I decided to write this article to explore how they work.

## What's the point of context variables?

Context variables can store data that is available across different function calls without having to pass them as arguments. They are particularly useful in asynchronous code.

Passing all the data as arguments to every single function call would bloat function signatures because not all functions need everything, but you'd need to pass everything around because you never know when you might end up calling a function that needs some specific data.

Another alternative would be creating global variables, but I'm guessing that would be just asking for trouble in concurrent code. I personally don't know, but if global variables worked just fine, I'm sure we wouldn't have a module just for this.

## Creating a context variable

To create a context variable, you use ContextVar from the module contextvars. To set and get the value of the variable, you use the methods .set and .get, respectively:

from contextvars import ContextVar

name = ContextVar("name")
name.set("Rodrigo")
print(name.get())  # Rodrigo

As per the documentation, context variables should be declared at the top level of your module, and never inside closures. In practice, this means you should be careful and never create context variables inside methods or functions.

### Default value

A context variable can also be created with a default value:

from contextvars import ContextVar

name = ContextVar("name", default="Rodrigo")
print(name.get())  # Rodrigo

This value is used if you try to get a variable before setting it first.

### Default value for .get

Finally, the method .get also accepts a default value, similar to the method dict.get:

from contextvars import ContextVar

name = ContextVar("name")
print(name.get("no name set"))  # no name set

When you call .get, if the context variable hasn't been set explicitly, three things can happen, in this order:

1. you get the default value from the method .get; or
2. you get the default value from the context variable; or
3. an exception LookupError is raised.

The example above shows 1. It is also useful to note that the value you get doesn't change if you also provide a default value for the context variable:

from contextvars import ContextVar

name = ContextVar("name", default="Rodrigo")
print(name.get("no name set"))  # no name set

We've already seen an example of 2. in the previous subsection, so all there is left to see is the LookupError:

from contextvars import ContextVar

name = ContextVar("name")
print(name.get())  # LookupError

## Context variables are available across function calls

As I mentioned previously, a context variable can be accessed from functions and its value will be available:

from contextvars import ContextVar

name = ContextVar("name")
name.set("Rodrigo")

def print_name():
print(name.get("no name set"))

print_name()  # Rodrigo

## Different contexts

... ]]>
TIL #077 – piece table data structure https://mathspp.com/blog/til/piece-table-data-structure 2023-08-20T00:30:52+02:00 2023-08-19T00:00:00+02:00

Today I learned about the piece table data structure.

# Piece table

A piece table is a data structure commonly used in text editors. It is similar to a linked list, but instead of each node containing one single item (a character), each node contains a span of the data (a substring).

So, instead of a structure that would look like

H > e > l > l > o > _ > w > o > r > l > d

you could have something like

Hel > lo worl > d

In the context of a text editor, the idea is that these spans of contiguous data represent spans of text that the user hasn't edited yet. Then, whenever the user adds or removes text, we add or remove corresponding spans.

According to the sources I found, text editors may typically use other bits and bobs together with this data structure to handle the text and user input. Here, I provide a minimal toy implementation that modifies the piece table through insertions, deletions, and replacements, of arbitrary text.

My class PieceTable implements three methods:

1. insert(idx, text) – inserts the string text at index idx, so that the first character of text ends up at the index idx;
2. delete(begin, end) – deletes the substring from indices begin up to end, including the beginning and excluding the end (like a slice obj[begin:end]); and
3. replace(begin, end, replacement) – replaces the text between indices begin and end with replacemente, regardless of the length of replacement.

Here are some examples:

pt = PieceTable("hey world")  # Initial data.
pt.replace(5, 7, "!!!")  # "hey w!!!ld"
pt.delete(5, 8)  # "hey wld"
pt.insert(5, "or")  # "hey world"
pt.delete(2, 6)  # "herld"
pt.insert(0, "A new word: ")  # "A new word: herld"
pt.insert(15, "a")  # "A new word: herald"
pt.delete(0, 0) # "A new word: herald"
pt.replace(1, 5, "n existing")  # "An existing word: herald"

## Python implementation of a piece table

I provide a toy implementation below. To reduce the amount of work I had to do, I decided to implement insert and delete at the expense of replace. All I needed to do was realise that

• insert(idx, text) == replace(idx, idx, text); and
• delete(begin, end) == replace(begin, end, "").

Additionally, to make the code a bit simpler, I added a “sentinel node” that is always present at the beginning of the piece table and that holds no data whatsoever. This made it simpler to insert text at the beginning.

So, here is the code:

from __future__ import annotations

from dataclasses import dataclass
from typing import Generator

@dataclass
class _Node:
next: "_Node" | None
data: str
start_idx: int

@dataclass
class _SentinelNode(_Node):
next: _Node | None
data: str = ""
start_idx: int = 0

class PieceTable:
_pieces: = _SentinelNode(None)

def __init__(self, initial_data):
if initial_data is not None:
self._pieces = _SentinelNode(_Node(None, initial_data, 0))

def insert(self, start_idx, data):
return self.replace(start_idx, start_idx, data)

def delete(self, start_idx, end_idx):
return self.replace(start_idx, end_idx, "")

def replace(self, start_idx, end_idx,...
]]>
TIL #076 – Hypothesis for code refactoring https://mathspp.com/blog/til/hypothesis-for-code-refactoring 2023-08-08T19:15:15+02:00 2023-08-08T18:10:00+02:00

Today I learned how to use Hypothesis to do confident code refactoring.

# Hypothesis for code refactoring

## What is Hypothesis?

Hypothesis is a Python library that you can use when testing your code. In a nutshell, Hypothesis will generate test cases automatically for you.

In case you already know what that means, Hypothesis enables property-based testing, which is a “testing paradigm”. In case you are interested, I've written about getting started with Hypothesis before.

One of the cool things about Hypothesis is that you can use it when you are refactoring code and you want to be as certain as possible that you are not breaking anything. Let me give you a concrete example.

## Code refactoring

Recently I played a bit with the Damerau-Levenshtein distance and I implemented it in Python. The code (taken from this TIL article) looked like this:

from functools import lru_cache

@lru_cache
def dl(a, b):
edit_distances = []

if len(a) == len(b) == 0:
edit_distances.append(0)

if len(a) > 0:
edit_distances.append(dl(a[:-1], b) + 1)

if len(b) > 0:
edit_distances.append(dl(a, b[:-1]) + 1)

if len(a) > 0 and len(b) > 0:
edit_distances.append(dl(a[:-1], b[:-1]) + (a[-1] != b[-1]))

if len(a) > 1 and len(b) > 1 and a[-1] == b[-2] and a[-2] == b[-1]:
edit_distances.append(dl(a[:-2], b[:-2]) + (a[-1] != b[-1]))

return min(edit_distances)

However, this was just a basic implementation that translated the mathematical formula that the Wikipedia page showed. I wanted to have a go at rewriting this in a better way. I wanted to refactor the code.

So, I did. I came up with this alternative implementation:

from functools import lru_cache

@lru_cache
def dl(a, b):
if not a or not b:
return len(a) + len(b)

levenshstein = min(
dl(a[:-1], b) + 1,
dl(a, b[:-1]) + 1,
dl(a[:-1], b[:-1]) + (a[-1] != b[-1]),
)

if a[:-1] and b[:-1] and a[-1] == b[-2] and b[-1] == a[-2]:
return min(
levenshstein,
dl(a[:-2], b[:-2]) + (a[-1] != b[-1]),
)

return levenshstein

Now, the question is: are these two functions the same? Do the two functions compute the same thing?

Enter: Hypothesis!

## Verifying a code refactor with Hypothesis

Because Hypothesis generates random test cases for you, what you can do is create a test where Hypothesis generates two random strings, we feed the two strings to the two alternative implementations, and then we check if they return the same result!

In essence, if dl and dl2 are the two functions, you just need to write this:

# dl.py
from functools import lru_cache

from hypothesis import given
from hypothesis.strategies import text

@lru_cache
def dl(a, b):
...

@lru_cache
def dl2(a, b):
...

@given(text(max_size=15), text(max_size=15))
def test_dl_match(a, b):
assert dl(a, b) == dl2(a, b)

Then, you can run your tests – for example, with pytest dl.py – and wait for Hypothesis's verdict. If the test passes, that's because the two functions probably compute the same thing.

In this case, the test passes, so the two functions likely are the same and my refactor is OK!

Notice that this doesn't guarantee that the two functions are correct!...

]]>
TIL #075 – Damerau-Levenshtein distance https://mathspp.com/blog/til/damerau-levenshtein-distance 2023-08-08T19:17:34+02:00 2023-08-08T17:50:00+02:00

Today I learned about the Damerau-Levenshtein distance used on strings in the field of genetics.

# Damerau-Levenshtein distance

## Intuitive definition

The Damerau-Levenshtein distance is an algorithm that you can use to determine how similar two strings are to each other. This algorithm works in quite an intuitive way, and I like to think of it like this:

The distance between two strings is the number of typos that it takes to go from one to the other.

For example, the DL (Damerau-Levenshtein) distance between the words "bald" and "ball" is 1 because with a single typo (replace the last l with a d) you can get from one word to the other.

So, the point of the DL distance is to determine what kind of typos are considered, and then we just use some recursion to count how many are needed! In the DL distance, we consider the following kinds of typos:

• if you added a letter by mistake, that's one typo (e.g., "ball" and "balll");
• if you forgot a letter, that's one typo (e.g., "ball" and "bal"); and
• if you swapped two letters, that's one typo (e.g., "ball" and "blal").

In case you know it, this is similar to the Levenshtein distance, except here we also consider transpositions of characters (swapping two consecutive characters).

Here are some concrete examples, assuming dl computes the DL distance:

>>> dl("ball", "balll")
1
>>> dl("ball", "bal")
1
>>> dl("ball", "blal")
1
>>> dl("hello", "halo")
2
>>> dl("bananas", "cnnaanas")
3

## Recursive definition

The DL distance can be defined recursively in a “straightforward way” once you realise how to translate the math into English (or whatever your native language is!).

This is the math definition that the Wikipedia article shows:

$d_{a, b}(i, j) = \min \begin{cases} 0, ~ i = j = 0\\ d_{a, b}(i - 1, j) + 1, ~ i > 0 \\ d_{a, b}(i, j - 1) + 1, ~ j > 0 \\ d_{a, b}(i - 1, j - 1) + 1_{a_i \neq b_j}, ~ i, j > 0 \\ d_{a, b}(i - 2, j - 2) + 1_{a_i \neq b_j}, ~ i, j > 1 ~ \wedge ~ a_i = b_{j-1} ~ \wedge ~ a_{i-1} = b_j \end{cases}$

The $$d_{a, b}$$ is the function that computes the DL distance between two strings $$a$$ and $$b$$ and the $$i$$ and the $$j$$ control the length of the prefix we are looking at: the function works recursively and you drop characters from the end of the strings $$a$$ and $$b$$ as you go down.

For example, when $$i = 0$$, that means you have 0 characters left of the string $$a$$ and when $$j = 0$$ that means you have 0 characters left of the string $$b$$.

The $$1_{a_i \neq b_j}$$ translates into int(a[i] != b[j]), which is 1 if the two characters are different and 0 if they are the same.

We can turn the math into code as seen below.

... ]]>