Today I learned how to use Hypothesis to do confident code refactoring.<\/p>\n\n

Hypothesis<\/a> is a Python library that you can use when testing your code.\nIn a nutshell, Hypothesis will generate test cases automatically for you.<\/p>\n

In case you already know what that means, Hypothesis enables property-based testing, which is a “testing paradigm”.\nIn case you are interested, I've written about getting started with Hypothesis<\/a> before.<\/p>\n

Recently I played a bit with the Damerau-Levenshtein distance<\/a> and I implemented it in Python.\nThe code (taken from this TIL article<\/a>) looked like this:<\/p>\n

`from functools import lru_cache\n\n@lru_cache\ndef dl(a, b):\n edit_distances = []\n\n if len(a) == len(b) == 0:\n edit_distances.append(0)\n\n if len(a) > 0:\n edit_distances.append(dl(a[:-1], b) + 1)\n\n if len(b) > 0:\n edit_distances.append(dl(a, b[:-1]) + 1)\n\n if len(a) > 0 and len(b) > 0:\n edit_distances.append(dl(a[:-1], b[:-1]) + (a[-1] != b[-1]))\n\n if len(a) > 1 and len(b) > 1 and a[-1] == b[-2] and a[-2] == b[-1]:\n edit_distances.append(dl(a[:-2], b[:-2]) + (a[-1] != b[-1]))\n\n return min(edit_distances)<\/code><\/pre>\n`

However, this was just a basic implementation that translated the mathematical formula that the Wikipedia page showed.\nI wanted to have a go at rewriting this in a better way.\nI wanted to

refactor<\/em> the code.<\/p>\n

So, I did.\nI came up with this alternative implementation:<\/p>\n

`from functools import lru_cache\n\n@lru_cache\ndef dl(a, b):\n if not a or not b:\n return len(a) + len(b)\n\n levenshstein = min(\n dl(a[:-1], b) + 1,\n dl(a, b[:-1]) + 1,\n dl(a[:-1], b[:-1]) + (a[-1] != b[-1]),\n )\n\n if a[:-1] and b[:-1] and a[-1] == b[-2] and b[-1] == a[-2]:\n return min(\n levenshstein,\n dl(a[:-2], b[:-2]) + (a[-1] != b[-1]),\n )\n\n return levenshstein<\/code><\/pre>\n`

Now, the question is:\nare these two functions the same?\nDo the two functions compute the same thing?<\/p>\n

Enter: Hypothesis!<\/p>\n

## Verifying a code refactor with Hypothesis<\/a><\/h2>\n

Because Hypothesis generates random test cases for you, what you can do is create a test where Hypothesis generates two random strings, we feed the two strings to the two alternative implementations, and then we check if they return the same result!<\/p>\n

In essence, if

`dl<\/code> and`

`dl2<\/code> are the two functions, you just need to write this:<\/p>\n`

`# dl.py\nfrom functools import lru_cache\n\nfrom hypothesis import given\nfrom hypothesis.strategies import text\n\n@lru_cache\ndef dl(a, b):\n ...\n\n@lru_cache\ndef dl2(a, b):\n ...\n\n@given(text(max_size=15), text(max_size=15))\ndef test_dl_match(a, b):\n assert dl(a, b) == dl2(a, b)<\/code><\/pre>\n`

Then, you can run your tests – for example, with

`pytest dl.py<\/code> – and wait for Hypothesis's verdict.\nIf the test passes, that's because the two functions`

probably<\/em> compute the same thing.<\/p>\n

In this case, the test passes, so the two functions likely are the same and my refactor is OK!<\/p>\n

Notice that this doesn't guarantee that the two functions are correct!...<\/p>","summary":"Today I learned how to use Hypothesis to do confident code refactoring.","date_modified":"2024-08-10T20:53:02+02:00","tags":["hypothesis","programming","python","testing"],"image":"\/user\/pages\/02.blog\/04.til\/076.hypothesis-for-code-refactoring\/thumbnail.webp"}]}