In this Pydon't we explore what Boolean short-circuiting
for the and
and or
operators is, and how to use this
functionality to write more expressive code.
(If you are new here and have no idea what a Pydon't is, you may want to read the Pydon't Manifesto.)
In this Pydon't we will take a closer look at how and
and or
really
work and at a couple of really neat things you can do because of the
way they are defined.
In particular, we will look at
and
and or
return values from their operands,
and not necessarily True
or False
;and
and or
extends to all
and any
; andFor this Pydon't, I will assume you are familiar with what “Truthy”
and “Falsy” values are in Python.
If you are not familiar with this concept, or if you would like just a
quick reminder of how this works, go ahead and read the
“Truthy, Falsy, and bool
” Pydon't.
You can now get your free copy of the ebook “Pydon'ts – Write elegant Python code” on Gumroad to help support the series of “Pydon't” articles 💪.
and
and or
operatorsIf we take a look at the docs,
here is how or
is defined:
“
x or y
returnsy
ifx
is false, otherwise it returnsx
.”
Equivalently, but written with an if
expression,
(x or y) == (y if not x else x)
This may not seem like it is worth spending a thought on,
but already right at this point we can see something very interesting:
even though we look at the truthy or falsy value of x
,
what we return are the values associated with x
/y
,
and not a Boolean value.
For example, look at the program below and think about what it outputs:
if 3 or 5:
print("Yeah.")
else:
print("Nope.")
If you thought it should print “Yeah.”, you are right!
Notice how 3 or 5
was the condition of the if
statement
and it evaluated to True
, which is why the statement under if
got executed.
Now, look at the program below and think about what it outputs:
print(3 or 5)
What do you think it outputs?
If you think the output should be True
, you are wrong!
The program above outputs 3
:
>>> 3 or 5
3
Let's go back to something I just said:
“Notice how
3 or 5
was the condition of theif
statement and it evaluated toTrue
, which is why the statement underif
got executed.”
The wording of this statement is wrong, but the error in it is fairly subtle. If you spotted it before I pointed it out, give yourself a pat in the back, you deserve it. So, what did I say wrong?
3 or 5
does not evaluate to True
!
It evaluates to 3
, which is truthy and therefore
tells the if
to execute its statements.
Returning True
or a truthy value is something significantly
different.
A similar thing happens with and
.
As per the docs, and
can be defined as follows:
“
x and y
returnsx
ifx
is false, otherwise it returnsy
.”
We can also rewrite this as
(x and y) == (x if not x else y)
Take your time to explore this for a bit,
just like we explored x or y
above.
You might be asking why this distinction is relevant.
It is mostly relevant because of the following property:
and
and or
only evaluate the right operand if the left operand is
not enough to determine the result of the operation.
This is what short-circuiting is:
not evaluating the whole expression (stopping short of evaluating it)
if we already have enough information to determine the final outcome.
This short-circuiting feature, together with the fact that
the boolean operators and
and or
return the values of the operands
and not necessarily a Boolean, means we can do some really neat things
with them.
or
False or y
or
evaluates to True
if any of its operands is truthy.
If the left operand to or
is False
(or falsy, for that matter)
then the or
operator has to look to its right operand in order
to determine the final result.
Therefore, we know that an expression like
val = False or y
will have the value of y
in it,
and in an if
statement or in a while
loop,
it will evaluate the body of the construct only if y
is truthy:
>>> y = 5 # truthy value.
>>> if False or y:
... print("Got in!")
... else:
... print("Didn't get in...")
...
Got in!
>>> y = [] # falsy value.
>>> if False or y:
... print("Got in 2!")
... else:
... print("Didn't get in 2...")
...
Didn't get in 2...
Let this sit with you:
if the left operand to or
is False
or falsy, then we need
to look at the right operand to determine the value of the or
.
True or y
On the other hand, if the left operand to or
is True
,
we do not need to take a look at y
because we already know
the final result is going to be True
.
Let us create a simple function that returns its argument unchanged but that produces a side-effect of printing something to the screen:
def p(arg):
print(f"Inside `p` with arg={arg}")
return arg
Now we can use p
to take a look at the things that Python
evaluates when trying to determine the value of x or y
:
>>> p(False) or p(3)
Inside `p` with arg=False
Inside `p` with arg=3
3
>>> p(True) or p(3)
Inside `p` with arg=True
True
Notice that, in the second example, p
only did one print
because it never reached the p(3)
.
or
expressionsNow we tie everything together.
If the left operand to or
is False
or falsy, we know that
or
has to look at its right operand and will, therefore,
return the value of its right operand after evaluating it.
On the other hand, if the left operand is True
or truthy,
or
will return the value of the left operand without
even evaluating the right operand.
and
We now do a similar survey, but for and
.
False and y
and
gives True
if both its operands are True
.
Therefore, if we have an expression like
val = False and y
do we need to know what y
is in order to figure out what val
is?
No, we do not, because regardless of whether y
is True
or
False
, val
is always False
:
>>> False and True
False
>>> False and False
False
If we take the False and y
expressions from this example and compare
them with the if
expression we wrote earlier, which was
(x and y) == (x if not x else y)
we see that, in this case, x
was substituted by False
,
and, therefore, we have
(False and y) == (False if not False else y)
Now, the condition inside that if
expression reads
not False
which we know evaluates to True
, meaning that the if
expression
never returns y
.
If we consider any left operand that can be False
or falsy,
we see that and
will never look at the right operand:
>>> p([]) and True # [] is falsy
Inside `p` with arg=[]
[]
>>> p(0) and 3242 # 0 is falsy
Inside `p` with arg=0
0
>>> p({}) and 242 # {} is falsy
Inside `p` with arg={}
{}
>>> p(0) and p(0) # both are falsy, but only the left matters
Inside `p` with arg=0
0
True and y
Now, I invite you to take a moment to work through the same reasoning,
but with expressions of the form True and y
.
In doing so, you should figure out that the result of
such an expression is always the value of y
,
because the left operand being True
, or any other truthy value,
doesn't give and
enough information.
and
expressionsNow we tie everything together.
If the left operand to and
is False
or falsy,
we know the expression returns the value of the left operand
regardless of the right operand, and therefore we do not even evaluate
the right operand.
On the other hand, if the left operand to and
is True
,
then and
will evaluate the right operand and return its value.
Instead of memorising rules about what sides get evaluated when,
just remember that both and
and or
will evaluate as many
operands as needed to determine the overall Boolean result,
and will then return the value of the last side that they evaluated.
As an immediate conclusion, the left operand is always evaluated, as you might imagine.
If you understand that, then it is just a matter of you knowing
how and
and or
work from the Boolean perspective.
all
and any
The built-in functions all
and any
also short-circuit,
as they are simple extensions of the behaviours provided by
and
and or
, respectively.
all
wants to make sure that all the values of its argument
are truthy, so as soon as it finds a falsy value, it knows
it's game over.
That's why the docs say all
is equivalent to
the following code:
def all(it):
for elem in it:
if not elem:
return False
return True
Similarly, any
is going to do its best to look for some
value that is truthy.
Therefore, as soon as it finds one, any
knows it has achieved
its purpose and does not need to evaluate the other values.
Can you write an implementation of any
that is similar to the
above implementation of all
and that also short-circuits?
A previous Pydon't has shown you
that comparison operators can be chained arbitrarily,
and those are almost equivalent to a series of comparisons
separated with and
, except that the subexpressions are only
evaluated once, to prevent wasting resources.
Therefore, because we are also using an and
in the background,
chained comparisons can also short-circuit:
# 1 > 2 is False, so there's no need to look at p(2) < p(3)
>>> p(1) > p(2) < p(3)
Inside `p` with arg=1
Inside `p` with arg=2
False
Now that we have taken a look at how all of these things work, we will see how to put them to good use in actual code.
One of the most basic usages of short-circuiting is to save time.
When you have a while
loop or an if
statement with multiple
statements, you may want to include the faster expressions before
the slower ones, as that might save you some time if the result
of the first expression ends up short-circuiting.
Consider this example that should help me get my point across:
imagine you are writing a function that
creates a helper .txt
file but only if it is a .txt
file
and if it does not exist yet.
With this preamble, your function needs to do two things:
.txt
;What do you feel is faster?
Checking if the file ends in .txt
or looking for it in the whole
filesystem?
I would guess checking for the .txt
ending is simpler,
so that's the expression I would put first in the code:
import pathlib
def create_txt_file(filename):
path = pathlib.Path(filename)
if filename.suffix == ".txt" and not path.exists():
# Create the file but leave it empty.
with path.open():
pass
This means that, whenever filename
does not respect the .txt
format, the function can exit right away and doesn't even
need to bother the operating system with asking if the file
exists or not.
Now let me show you a real example of an if
statement that uses
short-circuiting in this way, saving some time.
For this, let us take a look at a function from the
base64
module,
that we take from the Python Standard Library:
# From Lib/base64.py in Python 3.9.2
def b64decode(s, altchars=None, validate=False):
"""Decode the Base64 encoded bytes-like object or ASCII string s.
[docstring cut for brevity]
"""
s = _bytes_from_decode_data(s)
if altchars is not None:
altchars = _bytes_from_decode_data(altchars)
assert len(altchars) == 2, repr(altchars)
s = s.translate(bytes.maketrans(altchars, b'+/'))
if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s): # <--
raise binascii.Error('Non-base64 digit found')
return binascii.a2b_base64(s)
This b64decode
function takes a string (or a bytes-like object)
that is assumed to be in base 64 and decodes it.
Here is a quick demo of that:
>>> import base64
>>> s = b"Base 64 encoding and decoding."
>>> enc = base64.b64encode(s)
>>> enc
b'QmFzZSA2NCBlbmNvZGluZyBhbmQgZGVjb2Rpbmcu'
>>> base64.b64decode(enc)
b'Base 64 encoding and decoding.'
Now, look at the if
statement that I marked with a comment:
if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s):
pass
validate
is an argument to b64decode
that tells the function if we should
validate the string that we want to decode or not,
and then the re.fullmatch()
function call does that validation,
ensuring that the string to decode only contains valid base 64 characters.
In case we want to validate the string and the validation fails,
we enter the if
statement and raise an error.
Notice how we first check if the user wants to validate the string
and only then we run the regular expression match.
We would obtain the exact same result if we changed the order of the operands
to and
, but we would be spending much more time than needed.
To show that, let us try both cases!
Let's build a string with 1001 characters, where only the last one is invalid.
Let us compare how much time it takes to run the boolean expression
with the regex validation before and after the Boolean validate
.
import timeit
# Code that sets up the variables we need to evaluate the expression that we
# DO NOT want to be taken into account for the timing.
setup = """
import re
s = b"a"*1000 + b"*"
validate = False
"""
# with short-circuiting: 0.01561140s on my machine.
print(timeit.timeit("validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s)", setup))
# without short-circuiting: 27.4744187s on my machine.
print(timeit.timeit("not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s) and validate", setup))
Notice that short-circuiting speeds up these comparisons by a factor of ~1750.
The
timeit
module is great and I recommend you take a peek
at its docs.
Here, we use it to run that Boolean expression repeatedly
(one million times, to be more specific).
Of course we could try longer or shorter strings, we could try strings that pass the validation and we could also try strings that fail the validation at an earlier stage, but this is just a small example that shows how short-circuiting can be helpful.
if
statementsShort-circuiting can, and should, be used to keep if
statements
as flat as possible.
A typical usage pattern is when we want to do some validation if certain conditions are met.
Keeping the previous b64decode
example in mind, that previous if
statement
could've been written like so:
# Modified from Lib/base64.py in Python 3.9.2
def b64decode(s, altchars=None, validate=False):
"""Decode the Base64 encoded bytes-like object or ASCII string s.
[docstring cut for brevity]
"""
s = _bytes_from_decode_data(s)
if altchars is not None:
altchars = _bytes_from_decode_data(altchars)
assert len(altchars) == 2, repr(altchars)
s = s.translate(bytes.maketrans(altchars, b'+/'))
# Do we want to validate the string?
if validate: # <--
# Is the string valid?
if not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s): # <--
raise binascii.Error('Non-base64 digit found')
return binascii.a2b_base64(s)
Now we took the actual validation and nested it, so that we have two separate checks: one tests if we need to do validation and the other one does the actual validation. What is the problem with this? From a fundamentalist's point of view, you are clearly going against the Zen of Python, that says
“Flat is better than nested.”
But from a practical point of view, you are also increasing the vertical
space that your function takes up by having a ridiculous if
statement
hang there.
What if you have multiple conditions that you need to check for?
Will you have a nested if
statement for each one of those?
This is exactly what short-circuiting is useful for! Only running the second part of a Boolean expression if it is relevant!
Another typical usage pattern shows up when you have something you need
to check, for example you need to check if a variable names
is a list
containing strings or you need to check if a given argument term
is smaller than zero.
It may happen that, in that context, it is not a good idea to do those checks
immediately:
names
might not be a list or might be empty; orterm
might be of a different type and, therefore,
might be incomparable to zero.Here is a concrete example of what I mean:
# From Lib/asynchat in Python 3.9.2
def set_terminator(self, term):
"""Set the input delimiter.
Can be a fixed string of any length, an integer, or None.
"""
if isinstance(term, str) and self.use_encoding:
term = bytes(term, self.encoding)
elif isinstance(term, int) and term < 0:
raise ValueError('the number of received bytes must be positive')
self.terminator = term
This is a helper function from within the
asynchat
module.
We don't need to know what is happening outside of this function to
understand the role that short-circuiting has in the elif
statement.
If the term
variable is smaller than 0
, then we want to raise
a ValueError
to complain, but the previous if
statement shows
that term
might also be a string.
If term
is a string, then comparing it with 0 raises another
ValueError
, so what we do is start by checking a necessary
precondition to term < 0
:
term < 0
only makes sense if term
is an integer, so we start
by evaluating isinstance(term, int)
and only then running the comparison.
Let me show you another example from the
enum
module:
# From Lib/enum.py in Python 3.9.2
def _create_(cls, class_name, names, *, module=None, qualname=None, type=None, start=1):
"""
Convenience method to create a new Enum class.
"""
# [cut for brevity]
# special processing needed for names?
if isinstance(names, str):
names = names.replace(',', ' ').split()
if isinstance(names, (tuple, list)) and names and isinstance(names[0], str):
original_names, names = names, []
last_values = []
for count, name in enumerate(original_names):
value = first_enum._generate_next_value_(name, start, count, last_values[:])
last_values.append(value)
names.append((name, value))
# [cut for brevity]
The longer if
statement contains three expressions separated by and
s,
and the first two expressions are there to make sure that the final one,
isinstance(names[0], str)
makes sense. You can read along the statement and thing about what it means if execution reaches that point:
if isinstance(names, (tuple, list)) and names and isinstance(names[0], str):
#^ lets start checking this `if` statement.
if isinstance(names, (tuple, list)) and names and isinstance(names[0], str):
# ^
# we only need to take a look at the right-hand side of this `and` if `names`
# is either a tuple or a list.
if isinstance(names, (tuple, list)) and names and isinstance(names[0], str):
# ^
# at this point, I've checked if `names` is a list or a tuple and I have
# checked if it is truthy or falsy (i.e., checked if it is empty or not).
# I only need to look at the right-hand side of this `and` if `names`
# is NOT empty.
if isinstance(names, (tuple, list)) and names and isinstance(names[0], str):
# ^
# If I'm evaluating this expression, it is because `names` is either a
# list or a tuple AND it is not empty, therefore I can index safely into it
# with `names[0]`.
This flat if
statement is much better than the completely nested version:
if isinstance(names, (tuple, list)):
if names:
if isinstance(names[0], str):
pass
Of course, you might need the nested version if, at different points, you might need to do different things depending on what happens. For example, suppose you want to raise an error if the list/tuple is empty. In that case, you would need the nested version:
if isinstance(names, (tuple, list)):
if names:
if isinstance(names[0], str):
pass
else:
raise ValueError("Empty names :(")
Can you understand why this if statement I just wrote is different from the two following alternatives?
# Can I put `and names` together with the first check?
if isinstance(names, (tuple, list)) and names:
if isinstance(names[0], str):
pass
else:
raise ValueError("Empty names..? :(")
# What if I put it together with the second `isinstance` check?
if instance(names, (tuple, list)):
if names and isinstance(names[0], str):
pass
else:
raise ValueError("Empty names..? :(")
If this is a silly exercise for you, sorry about that!
I just want you to be aware of the fact that when you have
many Boolean conditions, you need to be careful when checking
specific configurations of what is True
and what is False
.
If you've been skimming this article, just pay attention to this section right here.
This, right here, is my favourite use of short-circuiting.
Short-circuiting with the Boolean operator or
can be used to assign default
values to variables.
How does this work?
This uses or
and its short-circuiting functionality to assign a default
value to a variable if the current value is falsy.
Here is an example:
greet = input("Type your name >> ") or "there"
print(f"Hello, {greet}!")
Try running this example and press Enter without
typing anything.
If you do that, input
returns an empty string ""
,
which is falsy.
Therefore, the operator or
sees the falsy value on its left
and needs to evaluate the right operand to determine the final
value of the expression.
Because it evaluates the right operand, it is the right
value that is returned, and "there"
is assigned to greet
.
Now that we've seen how this mechanism to assign default values works, let us take a look at a couple of usage examples from the Python Standard Library.
We start with a simple example from the collections
module,
specifically from the implementation of the
ChainMap
object:
# From Lib/collections/__init__.py in Python 3.9.2
class ChainMap(_collections_abc.MutableMapping):
''' A ChainMap groups multiple dicts (or other mappings) together
[docstring cut for brevity]
'''
def __init__(self, *maps):
'''Initialize a ChainMap by setting *maps* to the given mappings.
If no mappings are provided, a single empty dictionary is used.
'''
self.maps = list(maps) or [{}] # always at least one map
This ChainMap
object allows you to combine multiple mappings
(for example, dictionaries) into a single mapping that combines
all the keys and values.
>>> import collections
>>> a = {"A": 1}
>>> b = {"B": 2, "A": 3}
>>> cm = collections.ChainMap(a, b)
>>> cm["A"]
1
>>> cm["B"]
2
The assignment that we see in the source code ensures that
self.maps
is a list of, at least, one empty mapping.
If we give no mapping at all to ChainMap
, then list(maps)
evaluates to []
, which is falsy, and forces the or
to look
at its right operand, returning [{}]
:
this produces a list with a single dictionary that has nothing inside.
I'll share another example with you, now. This example might look like the same as the one above, but there is a nice subtlety here.
First, the code:
# From Lib/cgitb.py in Python 3.9.2
class Hook:
"""A hook to replace sys.excepthook that shows tracebacks in HTML."""
def __init__(self, display=1, logdir=None, context=5, file=None,
format="html"):
self.display = display # send tracebacks to browser if true
self.logdir = logdir # log tracebacks to files if not None
self.context = context # number of source code lines per frame
self.file = file or sys.stdout # place to send the output
self.format = format
This code comes from the
cgitb
module and defines sys.stdout
to be the default value for the self.file
variable.
The definition of the __init__
function has file=None
as a keyword
argument also with a default value of None
,
so why don't we just write file=sys.stdout
in the first place?
The problem is that sys.stdout
can be a mutable object,
and therefore, using file=sys.stdout
as a keyword argument with a default
value is not going to work as you expect.
This is easier to demonstrate with a list as the default argument,
although the principle is the same:
>>> def append(val, l=[]):
... l.append(val)
... print(l)
...
>>> append(3, [1, 2])
[1, 2, 3]
>>> append(5)
[5]
>>> append(5)
[5, 5]
>>> append(5)
[5, 5, 5]
Notice the three consecutive calls append(5)
.
We would expect the three calls to behave the same way,
but because a list is a mutable object, the three consecutive
calls to append
add the values to the default value itself,
that started out as an empty list but keeps growing.
I'll write about mutability in more detail in future Pydon'ts, so be sure to subscribe to not miss that future Pydon't.
As the final usage example of short-circuiting, I'll share something really neat with you.
If you use assignment expressions and the walrus operator :=
together with generator expressions, we can use the fact that
all
and any
also short-circuit in order to look for “witnesses”
in a sequence of elements.
If we have a predicate function predicate
(a function that returns
a Boolean value) and if we have a sequence of values, items
,
we could use
any(predicate(item) for item in items)
to check if any element(s) in items
satisfy the predicate
function.
If we modify that to be
any(predicate(witness := item) for item in items)
Then, in case any item
satisfies the predicate function, witness
will hold its value!
For example, if items
contains many integers, how do we figure out
if there are any odd numbers in there and how do we print the first one?
items = [14, 16, 18, 20, 35, 41, 100]
any_found = False
for item in items:
any_found = item % 2
if any_found:
print(f"Found odd number {item}.")
break
# Prints 'Found odd number 35.'
This is one alternative. What other alternatives can you come up with?
Now, compare all those with the following:
items = [14, 16, 18, 20, 35, 41, 100]
is_odd = lambda x: x % 2
if any(is_odd(witness := item) for item in items):
print(f"Found odd number {witness}.")
# Prints 'Found odd number 35.'
Isn't this neat?
Here's the main takeaway of this Pydon't, for you, on a silver platter:
“Be mindful when you order the left and right operands to the
and
andor
expressions, so that you can make the most out of short-circuiting.”
This Pydon't showed you that:
and
and or
return the value of one of its operands,
and not necessarily a Boolean value;and
only evaluates the right operand if the left operand
is truthy;or
only evaluates the right operand if the left operand
is falsy;all
and any
also short-circuit;and
operator;if
statements can, sometimes, be flattened
and simplified if we use short-circuiting with the correct ordering
of the conditions;:=
,
can be used to find a witness value with respect to a predicate
function.If you liked this Pydon't be sure to leave a reaction below and share this with your friends and fellow Pythonistas. Also, don't forget to subscribe to the newsletter so you don't miss a single Pydon't!
+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book “Pydon'ts” teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here 🐍🚀.
all
, https://docs.python.org/3/library/functions.html#all [last accessed 26-05-2021];any
, https://docs.python.org/3/library/functions.html#any [last accessed 26-05-2021];base64
, https://docs.python.org/3/library/base64.html [last accessed 01-06-2021];asynchat
, https://docs.python.org/3/library/asynchat.html [last accessed 01-06-2021];enum
, https://docs.python.org/3/library/enum.html [last accessed 01-06-2021];collections.ChainMap
, https://docs.python.org/3/library/collections.html#collections.ChainMap [last accessed 01-06-2021];cgitb
, https://docs.python.org/3/library/cgitb.html [last accessed 01-06-2021];