(If you are new here and have no idea what a Pydon't is, you may want to read the Pydon't Manifesto.)
In this Pydon't we will take a closer look at how and and or really
work and at a couple of really neat things you can do because of the
way they are defined.
In particular, we will look at
and and or return values from their operands,
and not necessarily True or False;and and or extends to all and any; andFor this Pydon't, I will assume you are familiar with what “Truthy”
and “Falsy” values are in Python.
If you are not familiar with this concept, or if you would like just a
quick reminder of how this works, go ahead and read the
“Truthy, Falsy, and bool” Pydon't.
You can get all the Pydon'ts as a free ebook with over +400 pages and hundreds of tips. Download the ebook “Pydon'ts – write elegant Python code” here.
and and or operatorsIf we take a look at the docs,
here is how or is defined:
“
x or yreturnsyifxis false, otherwise it returnsx.”
Equivalently, but written with an if expression,
(x or y) == (y if not x else x)
This may not seem like it is worth spending a thought on,
but already right at this point we can see something very interesting:
even though we look at the truthy or falsy value of x,
what we return are the values associated with x/y,
and not a Boolean value.
For example, look at the program below and think about what it outputs:
if 3 or 5:
print("Yeah.")
else:
print("Nope.")
If you thought it should print “Yeah.”, you are right!
Notice how 3 or 5 was the condition of the if statement
and it evaluated to True, which is why the statement under if got executed.
Now, look at the program below and think about what it outputs:
print(3 or 5)
What do you think it outputs?
If you think the output should be True, you are wrong!
The program above outputs 3:
>>> 3 or 5
3
Let's go back to something I just said:
“Notice how
3 or 5was the condition of theifstatement and it evaluated toTrue, which is why the statement underifgot executed.”
The wording of this statement is wrong, but the error in it is fairly subtle. If you spotted it before I pointed it out, give yourself a pat in the back, you deserve it. So, what did I say wrong?
3 or 5 does not evaluate to True!
It evaluates to 3, which is truthy and therefore
tells the if to execute its statements.
Returning True or a truthy value is something significantly
different.
A similar thing happens with and.
As per the docs, and can be defined as follows:
“
x and yreturnsxifxis false, otherwise it returnsy.”
We can also rewrite this as
(x and y) == (x if not x else y)
Take your time to explore this for a bit,
just like we explored x or y above.
You might be asking why this distinction is relevant.
It is mostly relevant because of the following property:
and and or only evaluate the right operand if the left operand is
not enough to determine the result of the operation.
This is what short-circuiting is:
not evaluating the whole expression (stopping short of evaluating it)
if we already have enough information to determine the final outcome.
This short-circuiting feature, together with the fact that
the boolean operators and and or return the values of the operands
and not necessarily a Boolean, means we can do some really neat things
with them.
orFalse or yor evaluates to True if any of its operands is truthy.
If the left operand to or is False (or falsy, for that matter)
then the or operator has to look to its right operand in order
to determine the final result.
Therefore, we know that an expression like
val = False or y
will have the value of y in it,
and in an if statement or in a while loop,
it will evaluate the body of the construct only if y is truthy:
>>> y = 5 # truthy value.
>>> if False or y:
... print("Got in!")
... else:
... print("Didn't get in...")
...
Got in!
>>> y = [] # falsy value.
>>> if False or y:
... print("Got in 2!")
... else:
... print("Didn't get in 2...")
...
Didn't get in 2...
Let this sit with you:
if the left operand to or is False or falsy, then we need
to look at the right operand to determine the value of the or.
True or yOn the other hand, if the left operand to or is True,
we do not need to take a look at y because we already know
the final result is going to be True.
Let us create a simple function that returns its argument unchanged but that produces a side-effect of printing something to the screen:
def p(arg):
print(f"Inside `p` with arg={arg}")
return arg
Now we can use p to take a look at the things that Python
evaluates when trying to determine the value of x or y:
>>> p(False) or p(3)
Inside `p` with arg=False
Inside `p` with arg=3
3
>>> p(True) or p(3)
Inside `p` with arg=True
True
Notice that, in the second example, p only did one print
because it never reached the p(3).
or expressionsNow we tie everything together.
If the left operand to or is False or falsy, we know that
or has to look at its right operand and will, therefore,
return the value of its right operand after evaluating it.
On the other hand, if the left operand is True or truthy,
or will return the value of the left operand without
even evaluating the right operand.
andWe now do a similar survey, but for and.
False and yand gives True if both its operands are True.
Therefore, if we have an expression like
val = False and y
do we need to know what y is in order to figure out what val is?
No, we do not, because regardless of whether y is True or
False, val is always False:
>>> False and True
False
>>> False and False
False
If we take the False and y expressions from this example and compare
them with the if expression we wrote earlier, which was
(x and y) == (x if not x else y)
we see that, in this case, x was substituted by False,
and, therefore, we have
(False and y) == (False if not False else y)
Now, the condition inside that if expression reads
not False
which we know evaluates to True, meaning that the if expression
never returns y.
If we consider any left operand that can be False or falsy,
we see that and will never look at the right operand:
>>> p([]) and True # [] is falsy
Inside `p` with arg=[]
[]
>>> p(0) and 3242 # 0 is falsy
Inside `p` with arg=0
0
>>> p({}) and 242 # {} is falsy
Inside `p` with arg={}
{}
>>> p(0) and p(0) # both are falsy, but only the left matters
Inside `p` with arg=0
0
True and yNow, I invite you to take a moment to work through the same reasoning,
but with expressions of the form True and y.
In doing so, you should figure out that the result of
such an expression is always the value of y,
because the left operand being True, or any other truthy value,
doesn't give and enough information.
and expressionsNow we tie everything together.
If the left operand to and is False or falsy,
we know the expression returns the value of the left operand
regardless of the right operand, and therefore we do not even evaluate
the right operand.
On the other hand, if the left operand to and is True,
then and will evaluate the right operand and return its value.
Instead of memorising rules about what sides get evaluated when,
just remember that both and and or will evaluate as many
operands as needed to determine the overall Boolean result,
and will then return the value of the last side that they evaluated.
As an immediate conclusion, the left operand is always evaluated, as you might imagine.
If you understand that, then it is just a matter of you knowing
how and and or work from the Boolean perspective.
all and any
The built-in functions all and any also short-circuit,
as they are simple extensions of the behaviours provided by
and and or, respectively.
all wants to make sure that all the values of its argument
are truthy, so as soon as it finds a falsy value, it knows
it's game over.
That's why the docs say all is equivalent to
the following code:
def all(it):
for elem in it:
if not elem:
return False
return True
Similarly, any is going to do its best to look for some
value that is truthy.
Therefore, as soon as it finds one, any knows it has achieved
its purpose and does not need to evaluate the other values.
Can you write an implementation of any that is similar to the
above implementation of all and that also short-circuits?
A previous Pydon't has shown you
that comparison operators can be chained arbitrarily,
and those are almost equivalent to a series of comparisons
separated with and, except that the subexpressions are only
evaluated once, to prevent wasting resources.
Therefore, because we are also using an and in the background,
chained comparisons can also short-circuit:
# 1 > 2 is False, so there's no need to look at p(2) < p(3)
>>> p(1) > p(2) < p(3)
Inside `p` with arg=1
Inside `p` with arg=2
False
Now that we have taken a look at how all of these things work, we will see how to put them to good use in actual code.
One of the most basic usages of short-circuiting is to save time.
When you have a while loop or an if statement with multiple
statements, you may want to include the faster expressions before
the slower ones, as that might save you some time if the result
of the first expression ends up short-circuiting.
Consider this example that should help me get my point across:
imagine you are writing a function that
creates a helper .txt file but only if it is a .txt file
and if it does not exist yet.
With this preamble, your function needs to do two things:
.txt;What do you feel is faster?
Checking if the file ends in .txt or looking for it in the whole
filesystem?
I would guess checking for the .txt ending is simpler,
so that's the expression I would put first in the code:
import pathlib
def create_txt_file(filename):
path = pathlib.Path(filename)
if filename.suffix == ".txt" and not path.exists():
# Create the file but leave it empty.
with path.open():
pass
This means that, whenever filename does not respect the .txt
format, the function can exit right away and doesn't even
need to bother the operating system with asking if the file
exists or not.
Now let me show you a real example of an if statement that uses
short-circuiting in this way, saving some time.
For this, let us take a look at a function from the
base64
module,
that we take from the Python Standard Library:
# From Lib/base64.py in Python 3.9.2
def b64decode(s, altchars=None, validate=False):
"""Decode the Base64 encoded bytes-like object or ASCII string s.
[docstring cut for brevity]
"""
s = _bytes_from_decode_data(s)
if altchars is not None:
altchars = _bytes_from_decode_data(altchars)
assert len(altchars) == 2, repr(altchars)
s = s.translate(bytes.maketrans(altchars, b'+/'))
if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s): # <--
raise binascii.Error('Non-base64 digit found')
return binascii.a2b_base64(s)
This b64decode function takes a string (or a bytes-like object)
that is assumed to be in base 64 and decodes it.
Here is a quick demo of that:
>>> import base64
>>> s = b"Base 64 encoding and decoding."
>>> enc = base64.b64encode(s)
>>> enc
b'QmFzZSA2NCBlbmNvZGluZyBhbmQgZGVjb2Rpbmcu'
>>> base64.b64decode(enc)
b'Base 64 encoding and decoding.'
Now, look at the if statement that I marked with a comment:
if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s):
pass
validate is an argument to b64decode that tells the function if we should
validate the string that we want to decode or not,
and then the re.fullmatch() function call does that validation,
ensuring that the string to decode only contains valid base 64 characters.
In case we want to validate the string and the validation fails,
we enter the if statement and raise an error.
Notice how we first check if the user wants to validate the string
and only then we run the regular expression match.
We would obtain the exact same result if we changed the order of the operands
to and, but we would be spending much more time than needed.
To show that, let us try both cases!
Let's build a string with 1001 characters, where only the last one is invalid.
Let us compare how much time it takes to run the boolean expression
with the regex validation before and after the Boolean validate.
import timeit
# Code that sets up the variables we need to evaluate the expression that we
# DO NOT want to be taken into account for the timing.
setup = """
import re
s = b"a"*1000 + b"*"
validate = False
"""
# with short-circuiting: 0.01561140s on my machine.
print(timeit.timeit("validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s)", setup))
# without short-circuiting: 27.4744187s on my machine.
print(timeit.timeit("not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s) and validate", setup))
Notice that short-circuiting speeds up these comparisons by a factor of ~1750.
The
timeit
module is great and I recommend you take a peek
at its docs.
Here, we use it to run that Boolean expression repeatedly
(one million times, to be more specific).
Of course we could try longer or shorter strings, we could try strings that pass the validation and we could also try strings that fail the validation at an earlier stage, but this is just a small example that shows how short-circuiting can be helpful.
if statementsShort-circuiting can, and should, be used to keep if statements
as flat as possible.
A typical usage pattern is when we want to do some validation if certain conditions are met.
Keeping the previous b64decode example in mind, that previous if statement
could've been written like so:
# Modified from Lib/base64.py in Python 3.9.2
def b64decode(s, altchars=None, validate=False):
"""Decode the Base64 encoded bytes-like object or ASCII string s.
[docstring cut for brevity]
"""
s = _bytes_from_decode_data(s)
if altchars is not None:
altchars = _bytes_from_decode_data(altchars)
assert len(altchars) == 2, repr(altchars)
s = s.translate(bytes.maketrans(altchars, b'+/'))
# Do we want to validate the string?
if validate: # <--
# Is the string valid?
if not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s): # <--
raise binascii.Error('Non-base64 digit found')
return binascii.a2b_base64(s)
Now we took the actual validation and nested it, so that we have two separate checks: one tests if we need to do validation and the other one does the actual validation. What is the problem with this? From a fundamentalist's point of view, you are clearly going against the Zen of Python, that says
“Flat is better than nested.”
But from a practical point of view, you are also increasing the vertical
space that your function takes up by having a ridiculous if statement
hang there.
What if you have multiple conditions that you need to check for?
Will you have a nested if statement for each one of those?
This is exactly what short-circuiting is useful for! Only running the second part of a Boolean expression if it is relevant!
Another typical usage pattern shows up when you have something you need
to check, for example you need to check if a variable names is a list
containing strings or you need to check if a given argument term
is smaller than zero.
It may happen that, in that context, it is not a good idea to do those checks
immediately:
names might not be a list or might be empty; orterm might be of a different type and, therefore,
might be incomparable to zero.Here is a concrete example of what I mean:
# From Lib/asynchat in Python 3.9.2
def set_terminator(self, term):
"""Set the input delimiter.
Can be a fixed string of any length, an integer, or None.
"""
if isinstance(term, str) and self.use_encoding:
term = bytes(term, self.encoding)
elif isinstance(term, int) and term < 0:
raise ValueError('the number of received bytes must be positive')
self.terminator = term
This is a helper function from within the
asynchat
module.
We don't need to know what is happening outside of this function to
understand the role that short-circuiting has in the elif statement.
If the term variable is smaller than 0, then we want to raise
a ValueError to complain, but the previous if statement shows
that term might also be a string.
If term is a string, then comparing it with 0 raises another
ValueError, so what we do is start by checking a necessary
precondition to term < 0:
term < 0 only makes sense if term is an integer, so we start
by evaluating isinstance(term, int) and only then running the comparison.
Let me show you another example from the
enum
module:
# From Lib/enum.py in Python 3.9.2
def _create_(cls, class_name, names, *, module=None, qualname=None, type=None, start=1):
"""
Convenience method to create a new Enum class.
"""
# [cut for brevity]
# special processing needed for names?
if isinstance(names, str):
names = names.replace(',', ' ').split()
if isinstance(names, (tuple, list)) and names and isinstance(names[0], str):
original_names, names = names, []
last_values = []
for count, name in enumerate(original_names):
value = first_enum._generate_next_value_(name, start, count, last_values[:])
last_values.append(value)
names.append((name, value))
# [cut for brevity]
The longer if statement contains three expressions separated by ands,
and the first two expressions are there to make sure that the final one,
isinstance(names[0], str)
makes sense. You can read along the statement and thing about what it means if execution reaches that point:
if isinstance(names, (tuple, list)) and names and isinstance(names[0], str):
#^ lets start checking this `if` statement.
if isinstance(names, (tuple, list)) and names and isinstance(names[0], str):
# ^
# we only need to take a look at the right-hand side of this `and` if `names`
# is either a tuple or a list.
if isinstance(names, (tuple, list)) and names and isinstance(names[0], str):
# ^
# at this point, I've checked if `names` is a list or a tuple and I have
# checked if it is truthy or falsy (i.e., checked if it is empty or not).
# I only need to look at the right-hand side of this `and` if `names`
# is NOT empty.
if isinstance(names, (tuple, list)) and names and isinstance(names[0], str):
# ^
# If I'm evaluating this expression, it is because `names` is either a
# list or a tuple AND it is not empty, therefore I can index safely into it
# with `names[0]`.
This flat if statement is much better than the completely nested version:
if isinstance(names, (tuple, list)):
if names:
if isinstance(names[0], str):
pass
Of course, you might need the nested version if, at different points, you might need to do different things depending on what happens. For example, suppose you want to raise an error if the list/tuple is empty. In that case, you would need the nested version:
if isinstance(names, (tuple, list)):
if names:
if isinstance(names[0], str):
pass
else:
raise ValueError("Empty names :(")
Can you understand why this if statement I just wrote is different from the two following alternatives?
# Can I put `and names` together with the first check?
if isinstance(names, (tuple, list)) and names:
if isinstance(names[0], str):
pass
else:
raise ValueError("Empty names..? :(")
# What if I put it together with the second `isinstance` check?
if instance(names, (tuple, list)):
if names and isinstance(names[0], str):
pass
else:
raise ValueError("Empty names..? :(")
If this is a silly exercise for you, sorry about that!
I just want you to be aware of the fact that when you have
many Boolean conditions, you need to be careful when checking
specific configurations of what is True and what is False.
If you've been skimming this article, just pay attention to this section right here.
This, right here, is my favourite use of short-circuiting.
Short-circuiting with the Boolean operator or can be used to assign default
values to variables.
How does this work?
This uses or and its short-circuiting functionality to assign a default
value to a variable if the current value is falsy.
Here is an example:
greet = input("Type your name >> ") or "there"
print(f"Hello, {greet}!")
Try running this example and press Enter without
typing anything.
If you do that, input returns an empty string "",
which is falsy.
Therefore, the operator or sees the falsy value on its left
and needs to evaluate the right operand to determine the final
value of the expression.
Because it evaluates the right operand, it is the right
value that is returned, and "there" is assigned to greet.
Now that we've seen how this mechanism to assign default values works, let us take a look at a couple of usage examples from the Python Standard Library.
We start with a simple example from the collections module,
specifically from the implementation of the
ChainMap
object:
# From Lib/collections/__init__.py in Python 3.9.2
class ChainMap(_collections_abc.MutableMapping):
''' A ChainMap groups multiple dicts (or other mappings) together
[docstring cut for brevity]
'''
def __init__(self, *maps):
'''Initialize a ChainMap by setting *maps* to the given mappings.
If no mappings are provided, a single empty dictionary is used.
'''
self.maps = list(maps) or [{}] # always at least one map
This ChainMap object allows you to combine multiple mappings
(for example, dictionaries) into a single mapping that combines
all the keys and values.
>>> import collections
>>> a = {"A": 1}
>>> b = {"B": 2, "A": 3}
>>> cm = collections.ChainMap(a, b)
>>> cm["A"]
1
>>> cm["B"]
2
The assignment that we see in the source code ensures that
self.maps is a list of, at least, one empty mapping.
If we give no mapping at all to ChainMap, then list(maps)
evaluates to [], which is falsy, and forces the or to look
at its right operand, returning [{}]:
this produces a list with a single dictionary that has nothing inside.
I'll share another example with you, now. This example might look like the same as the one above, but there is a nice subtlety here.
First, the code:
# From Lib/cgitb.py in Python 3.9.2
class Hook:
"""A hook to replace sys.excepthook that shows tracebacks in HTML."""
def __init__(self, display=1, logdir=None, context=5, file=None,
format="html"):
self.display = display # send tracebacks to browser if true
self.logdir = logdir # log tracebacks to files if not None
self.context = context # number of source code lines per frame
self.file = file or sys.stdout # place to send the output
self.format = format
This code comes from the
cgitb
module and defines sys.stdout
to be the default value for the self.file variable.
The definition of the __init__ function has file=None as a keyword
argument also with a default value of None,
so why don't we just write file=sys.stdout in the first place?
The problem is that sys.stdout can be a mutable object,
and therefore, using file=sys.stdout as a keyword argument with a default
value is not going to work as you expect.
This is easier to demonstrate with a list as the default argument,
although the principle is the same:
>>> def append(val, l=[]):
... l.append(val)
... print(l)
...
>>> append(3, [1, 2])
[1, 2, 3]
>>> append(5)
[5]
>>> append(5)
[5, 5]
>>> append(5)
[5, 5, 5]
Notice the three consecutive calls append(5).
We would expect the three calls to behave the same way,
but because a list is a mutable object, the three consecutive
calls to append add the values to the default value itself,
that started out as an empty list but keeps growing.
I'll write about mutability in more detail in future Pydon'ts, so be sure to subscribe to not miss that future Pydon't.
As the final usage example of short-circuiting, I'll share something really neat with you.
If you use assignment expressions and the walrus operator :=
together with generator expressions, we can use the fact that
all and any also short-circuit in order to look for “witnesses”
in a sequence of elements.
If we have a predicate function predicate (a function that returns
a Boolean value) and if we have a sequence of values, items,
we could use
any(predicate(item) for item in items)
to check if any element(s) in items satisfy the predicate function.
If we modify that to be
any(predicate(witness := item) for item in items)
Then, in case any item satisfies the predicate function, witness
will hold its value!
For example, if items contains many integers, how do we figure out
if there are any odd numbers in there and how do we print the first one?
items = [14, 16, 18, 20, 35, 41, 100]
any_found = False
for item in items:
any_found = item % 2
if any_found:
print(f"Found odd number {item}.")
break
# Prints 'Found odd number 35.'
This is one alternative. What other alternatives can you come up with?
Now, compare all those with the following:
items = [14, 16, 18, 20, 35, 41, 100]
is_odd = lambda x: x % 2
if any(is_odd(witness := item) for item in items):
print(f"Found odd number {witness}.")
# Prints 'Found odd number 35.'
Isn't this neat?
Here's the main takeaway of this Pydon't, for you, on a silver platter:
“Be mindful when you order the left and right operands to the
andandorexpressions, so that you can make the most out of short-circuiting.”
This Pydon't showed you that:
and and or return the value of one of its operands,
and not necessarily a Boolean value;and only evaluates the right operand if the left operand
is truthy;or only evaluates the right operand if the left operand
is falsy;all and any also short-circuit;and operator;if statements can, sometimes, be flattened
and simplified if we use short-circuiting with the correct ordering
of the conditions;:=,
can be used to find a witness value with respect to a predicate
function.If you liked this Pydon't be sure to leave a reaction below and share this with your friends and fellow Pythonistas. Also, don't forget to subscribe to the newsletter so you don't miss a single Pydon't!
Get a daily drop of Python knowledge. A short, effective tip to start writing better Python code: more idiomatic, more effective, more efficient, with fewer bugs. Subscribe here.
all, https://docs.python.org/3/library/functions.html#all [last accessed 26-05-2021];any, https://docs.python.org/3/library/functions.html#any [last accessed 26-05-2021];base64, https://docs.python.org/3/library/base64.html [last accessed 01-06-2021];asynchat, https://docs.python.org/3/library/asynchat.html [last accessed 01-06-2021];enum, https://docs.python.org/3/library/enum.html [last accessed 01-06-2021];collections.ChainMap, https://docs.python.org/3/library/collections.html#collections.ChainMap [last accessed 01-06-2021];cgitb, https://docs.python.org/3/library/cgitb.html [last accessed 01-06-2021];