Today I learned how to use Hypothesis to do confident code refactoring.
Hypothesis is a Python library that you can use when testing your code. In a nutshell, Hypothesis will generate test cases automatically for you.
In case you already know what that means, Hypothesis enables property-based testing, which is a “testing paradigm”. In case you are interested, I've written about getting started with Hypothesis before.
One of the cool things about Hypothesis is that you can use it when you are refactoring code and you want to be as certain as possible that you are not breaking anything. Let me give you a concrete example.
Recently I played a bit with the Damerau-Levenshtein distance and I implemented it in Python. The code (taken from this TIL article) looked like this:
from functools import lru_cache
@lru_cache
def dl(a, b):
edit_distances = []
if len(a) == len(b) == 0:
edit_distances.append(0)
if len(a) > 0:
edit_distances.append(dl(a[:-1], b) + 1)
if len(b) > 0:
edit_distances.append(dl(a, b[:-1]) + 1)
if len(a) > 0 and len(b) > 0:
edit_distances.append(dl(a[:-1], b[:-1]) + (a[-1] != b[-1]))
if len(a) > 1 and len(b) > 1 and a[-1] == b[-2] and a[-2] == b[-1]:
edit_distances.append(dl(a[:-2], b[:-2]) + (a[-1] != b[-1]))
return min(edit_distances)
However, this was just a basic implementation that translated the mathematical formula that the Wikipedia page showed. I wanted to have a go at rewriting this in a better way. I wanted to refactor the code.
So, I did. I came up with this alternative implementation:
from functools import lru_cache
@lru_cache
def dl(a, b):
if not a or not b:
return len(a) + len(b)
levenshstein = min(
dl(a[:-1], b) + 1,
dl(a, b[:-1]) + 1,
dl(a[:-1], b[:-1]) + (a[-1] != b[-1]),
)
if a[:-1] and b[:-1] and a[-1] == b[-2] and b[-1] == a[-2]:
return min(
levenshstein,
dl(a[:-2], b[:-2]) + (a[-1] != b[-1]),
)
return levenshstein
Now, the question is: are these two functions the same? Do the two functions compute the same thing?
Enter: Hypothesis!
Because Hypothesis generates random test cases for you, what you can do is create a test where Hypothesis generates two random strings, we feed the two strings to the two alternative implementations, and then we check if they return the same result!
In essence, if dl
and dl2
are the two functions, you just need to write this:
# dl.py
from functools import lru_cache
from hypothesis import given
from hypothesis.strategies import text
@lru_cache
def dl(a, b):
...
@lru_cache
def dl2(a, b):
...
@given(text(max_size=15), text(max_size=15))
def test_dl_match(a, b):
assert dl(a, b) == dl2(a, b)
Then, you can run your tests – for example, with pytest dl.py
– and wait for Hypothesis's verdict.
If the test passes, that's because the two functions probably compute the same thing.
In this case, the test passes, so the two functions likely are the same and my refactor is OK!
Notice that this doesn't guarantee that the two functions are correct!
I just tested whether they compute the same results.
So, if dl
is correct, it's likely that dl2
is also correct; if dl
is incorrect, it's likely that dl2
is also incorrect.
So, now you know. If you want to do a large refactor of a function, have the two versions side by side, use Hypothesis to generate the arguments for them, and compare the results of the two.
That's it for now! Stay tuned and I'll see you around!
+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book “Pydon'ts” teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here 🐍🚀.