Structural pattern matching is coming in Python 3.10 and
the previous Pydon't explored some interesting use cases
for the new match
statement.
This article explores situations for which match
isn't the answer.
(If you are new here and have no idea what a Pydon't is, you may want to read the Pydon't Manifesto.)
Structural pattern matching is coming to Python, and while it may look
like a plain switch statement like many other languages have,
Python's match
statement was not introduced to serve as a simple
switch statement.
In this article I explored plenty of use cases
for the new match
statement, and in this blog post I will try
to explore some use cases for which a match
is not the answer.
This article will assume you know how structural pattern matching
works in Python, so if you are unsure how that works feel free
to read my “Pattern matching tutorial for Pythonic code”.
At the time of writing, Python 3.10 is still a pre-release, so you have to look in the right place if you want to download Python 3.10 and play with it.
As per the Zen of Python,
“There should be one-- and preferably only one --obvious way to do it.”
The introduction of the match
statement may seem to violate
this principle...
However, you have to remember that the point of the match
statement
is not to serve as a basic switch
statement, as we already have
plenty of alternatives for that – if match
was supposed to be a simple
switch, then it would probably be called “switch” and not “match”.
No, the point of the match
statement is to do structural pattern
matching, so there's plenty of basic types of matching and casing
that can be done with the traditional tools that we have been using up
until Python 3.10.
Below I will share some of these tools with you, and I'll try to describe the situations in which they are helpful. Of course, this is much easier to understand with code so there will be plenty of code examples along the way.
if
statementThe Collatz conjecture is a mathematical conjecture that says that the following function terminates for any positive integer given as input:
def collatz_path(n):
path = [n]
while n != 1:
match n % 2:
case 0:
n //= 2
case 1:
n = 3*n + 1
path.append(n)
return path
Which gives the following two example outputs:
>>> collatz_path(8)
[8, 4, 2, 1]
>>> collatz_path(15)
[15, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, 5, 16, 8, 4, 2, 1]
If we look at the usage of match
above, we see it basically served
as a simple switch to match either 0
or 1
, the only two values
that the operation n % 2
could result in for a positive integer n
.
Notice that if we use a plain if
we can write exactly the same code
and save one line of code:
def collatz_path(n):
path = [n]
while n != 1:
if n % 2:
n = 3*n + 1
else:
n //= 2
path.append(n)
return path
We saved one line of code and reduced the maximum depth of our indentation:
with the match
we had code that was indented four times, whereas the
implementation with the if
only has three levels of depth.
When you only have a couple of options and you are checking for explicit equality,
a short and sweet if
statement is most likely the way to go.
Sometimes you will feel like you have to list a series of cases and corresponding values, so that you can map one to the other. However, it might be the case that you could make your life much simpler by looking for an alternative algorithm or formula and implementing that instead. I'll show you an example.
In case you never heard of it, Rule 30 is an “elementary cellular automaton”. You can think of it as a rule that receives three bits (three zeroes/ones) and produces a new bit, depending on the three bits it received. Automatons are really, really, interesting, but discussing them is past the point of this article. Let us just look at a possible implementation of the “Rule 30” automaton:
def rule30(bits):
match bits:
case 0, 0, 0:
return 0
case 0, 0, 1:
return 1
case 0, 1, 0:
return 1
case 0, 1, 1:
return 1
case 1, 0, 0:
return 1
case 1, 0, 1:
return 0
case 1, 1, 0:
return 0
case 1, 1, 1:
return 0
This seems like a sensible use of the match
statement, except that we just wrote
16 lines of code...
Ok, you are right, let us put together the rules that return the same values,
that should make the code shorter:
def rule30(bits):
match bits:
case 0, 0, 0 | 1, 0, 1 | 1, 1, 0 | 1, 1, 1:
return 0
case 0, 0, 1 | 0, 1, 0 | 0, 1, 1 | 1, 0, 0:
return 1
Yup, much better.
But now we have four options on each case
, and I have to squint to figure out
where each option starts and ends, and the long strings of zeroes and ones aren't
really that pleasant to the eye...
Can we make it better..?
With just a little bit of research you can find out that the “Rule 30” can be written
as a closed formula that depends on the three input bits, which means we don't
have to match
the input bits with all the possible inputs, we can just compute the output:
def rule30(bits):
p, q, r = bits
return (p + q + r + q*r) % 2
You might argue that this formula obscures the relationship
between the several inputs and their outputs.
You are right in principle, but having the explicit “Rule 30” written
out as a match
doesn't tell you much about why each input maps to
each output either way, so why not make it short and sweet?
There are many cases in which you just want to take a value in and map it to something else. As an example, take this piece of code that takes an expression and writes it in prefix notation:
import ast
def prefix(tree):
match tree:
case ast.Expression(expr):
return prefix(expr)
case ast.Constant(value=v):
return str(v)
case ast.BinOp(lhs, op, rhs):
match op:
case ast.Add():
sop = "+"
case ast.Sub():
sop = "-"
case ast.Mult():
sop = "*"
case ast.Div():
sop = "/"
case _:
raise NotImplementedError()
return f"{sop} {prefix(lhs)} {prefix(rhs)}"
case _:
raise NotImplementedError()
print(prefix(ast.parse("1 + 2 + 3", mode="eval"))) # + + 1 2 3
print(prefix(ast.parse("2**3 + 6", mode="eval")) # + * 2 3 6
# Final one prints '- + 1 * 2 3 / 5 7', take a moment to grok it.
print(prefix(ast.parse("1 + 2*3 - 5/7", mode="eval")))
Notice the inner match
to convert the op
inside a BinOp
to a string?
For starters, that nested match
takes up too much vertical space
and distracts us from what really matters, which is the traversal of
the recursive structure of the tree.
This means we could actually refactor that bit as a utility function:
import ast
def op_to_str(op):
match op:
case ast.Add():
sop = "+"
case ast.Sub():
sop = "-"
case ast.Mult():
sop = "*"
case ast.Div():
sop = "/"
case _:
raise NotImplementedError()
return sop
def prefix(tree):
match tree:
case ast.Expression(expr):
return prefix(expr)
case ast.Constant(value=v):
return str(v)
case ast.BinOp(lhs, op, rhs):
return f"{op_to_str(op)} {prefix(lhs)} {prefix(rhs)}"
case _:
raise NotImplementedError()
print(prefix(ast.parse("1 + 2 + 3", mode="eval"))) # + + 1 2 3
print(prefix(ast.parse("2**3 + 6", mode="eval")) # + * 2 3 6
# Final one prints '- + 1 * 2 3 / 5 7', take a moment to grok it.
print(prefix(ast.parse("1 + 2*3 - 5/7", mode="eval")))
This makes it easier to read and interpret the prefix
function,
but now we have another problem that really annoys me:
a simple but long function, the op_to_str
function.
For every type of operator you support, your function grows by two lines...
If you replace the match
with a chain of if
and elif
statements
you only save one line at the top...
The fix I suggested in the original article
was using a dictionary to map the type of op
to its string representation:
def op_to_str(op):
ops = {
ast.Add: "+",
ast.Sub: "-",
ast.Mult: "*",
ast.Div: "/",
}
return ops.get(op.__class__, None)
This usage pattern of a dictionary is quite common in Python,
using the get
method to compute the mapping of a value
to another value.
In case you are wondering, you can use the second argument of the
get
function to provide for a default value, which might be useful
if the dictionary hasn't listed every single possible value
or in case you want to have a fallback value.
getattr
Another useful mechanism that we have available is the
getattr
function,
which is part of a trio of Python built-in functions:
hasattr
, getattr
and setattr
.
I will be writing about this trio in a future Pydon't;
be sure to subscribe to the Pydon't newsletter so you don't miss it!
For now, I'll just show you briefly what getattr
can do for you.
I am writing an APL interpreter called RGSPL, and there is a function
named visit_F
where I need to map APL symbols like +
, -
, and ⍴
to the corresponding Python
function that implements it.
These Python functions, implementing the behaviour of the symbols, live in the functions.py
file.
If I were using a match
statement, here is what this visit_F
could look like:
import functions
def visit_F(self, func):
"""Fetch the callable function."""
name = func.token.type.lower() # Get the name of the symbol.
match name:
case "plus":
function = functions.plus
case "minus":
function = functions.minus
case "reshape":
function = functions.reshape
case _:
function = None
if function is None:
raise Exception(f"Could not find function {name}.")
return function
This is a similar problem to the one I showed above, where we wanted to get a string for each type of operator we got, so this could actually be written with the dictionary mapping. I invite you to do it, as a little exercise.
However, here's the catch: I have still a long way to go
in my RGSPL project, and I already have a couple dozen
of those primitives, so my match
statement would be around
40 lines long, if I were using that solution,
or 20 lines long if I were using the dictionary solution,
with a key, value pair per line.
Thankfully, Python's getattr
can be used to get
an attribute from an object, if I have the name of that
attribute.
It is no coincidence that the value of the name
variable
above is supposed to be exactly the same as the name
of the function defined inside functions.py
:
import functions
getattr(functions, "plus", None) # returns functions.plus
getattr(functions, "reshape", None) # returns functions.reduce
getattr(functions, "fasfadf", None) # returns None
With the getattr
function that Python provides, my visit_F
stays
with a constant size, regardless of how many functions I add to the functions.py
file:
def visit_F(self, func):
"""Fetch the callable function."""
name = func.token.type.lower() # Get the name of the symbol.
function = getattr(functions, name, None)
if function is None:
raise Exception(f"Could not find function {name}.")
return function
The getattr
function can also be used to get attributes from
an instance of a class, e.g.,
class Foo:
def __init__(self, a, b):
self.a = a
self.b = b
foo = Foo(3, 4)
print(getattr(foo, "a")) # prints 3
bar = Foo(10, ";")
print(getattr(bar, "b")) # prints ';'
This goes to show that it is always nice to know the tools you have at your disposal. Not everything has very broad use cases, but that also means that the more specialised tools are the ones that make the most difference when they are brought in.
Speaking of knowing your tools, the last use case in this article for which
match
is a bad alternative is related to calling different functions
when your data has different types.
If you have programming experience in a programming language like Java, you will be familiar with the concept of overloading a function: you implement the same function several times, but you get to specify the behaviour of the function for different types of arguments and/or number of arguments.
For example, you might want to implement a function to pretty-print a series of different types of objects:
def pretty_print(arg):
if isinstance(arg, complex):
print(f"{arg.real} + {arg.imag}i")
elif isinstance(arg, (list, tuple)):
for i, elem in enumerate(arg):
print(i, elem)
elif isinstance(arg, dict):
for key, value in arg.items():
print(f"{key}: {value}")
else:
print(arg)
Which then works like so:
>>> pretty_print(3)
3
>>> pretty_print([2, 5])
0 2
1 5
>>> pretty_print(3+4j)
3.0 + 4.0i
You can see that the branching introduced by the if
statement
is merely to separate the different types that the arg
could have,
and while the handling logic might be different, the final purpose
is always the same: to pretty-print an object.
But what if the code to handle each type of argument took 10 or 20
lines?
You would be getting a really long function with what would essentially
be embedded subfunctions.
You can separate all these subfunctions by making use of the functools.singledispatch
decorator:
import functools
@functools.singledispatch
def pretty_print(arg):
print(arg)
@pretty_print.register(complex)
def _(arg):
print(f"{arg.real} + {arg.imag}i")
@pretty_print.register(list)
@pretty_print.register(tuple)
def _(arg):
for i, elem in enumerate(arg):
print(i, elem)
@pretty_print.register(dict)
def _(arg):
for key, value in arg.items():
print(f"{key}: {value}")
And this can then be used exactly like the original function:
>>> pretty_print(3)
3
>>> pretty_print([2, 5])
0 2
1 5
>>> pretty_print(3+4j)
3.0 + 4.0i
The pretty_print
example isn't the best example because you
spend as many lines decorating as in defining the actual subfunctions,
but this shows you the pattern that you can now be on the lookout for.
You can read more about singledispatch
in the docs.
Here's the main takeaway of this article, for you, on a silver platter:
“The new
match
statement is great, but that does not mean thematch
statement will be the best alternative always and, in particular, thematch
statement is generally being misused if you use it as a simple switch.”
This Pydon't showed you that:
match
isn't necessarily always the best way to implement control flow;match
statements could be vanilla if
statements;dict.get
and getattr
can also be used to fetch
different values depending on the matching key; andfunctools.singledispatch
when you need to execute different
subfunctions when the input has different types.If you liked this Pydon't be sure to leave a reaction below and share this with your friends and fellow Pythonistas. Also, don't forget to subscribe to the newsletter so you don't miss a single Pydon't!
+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book “Pydon'ts” teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here 🐍🚀.
getattr
, https://docs.python.org/3/library/functions.html#getattr;functools.singledispatch
, https://docs.python.org/3/library/functools.html#functools.singledispatch;