This article is an in-depth analysis of Python solutions to the “Sonar Sweep” problem, which is day 1 of Advent of Code 2021.
The analysis that follows pertains to the first day of the Python Problem-Solving Bootcamp. Follow this link if you'd like to participate in the bootcamp:
In the analysis that follows you may be confronted with code that you do not understand, especially as you reach the end of the explanation of each part.
If you find functions that you didn't know before, remember to check the docs for those functions and play around with them in the REPL. This is written to be increasing in difficulty (within each part of the problem), so it is understandable if it gets harder as you keep reading. That's perfectly fine, you don't have to understand everything right now, especially because I can't know for sure what your level is.
(Adapted from here.)
You are given a report of depth measurements, like
199
200
208
210
200
207
240
269
260
263
The first order of business is to figure out how quickly the depth increases.
To do this, count the number of times a depth measurement increases from the previous measurement. (There is no measurement before the first measurement.)
In the example above, the changes are as follows:
199 (N/A - no previous measurement)
200 (increased)
208 (increased)
210 (increased)
200 (decreased)
207 (increased)
240 (increased)
269 (increased)
260 (decreased)
263 (increased)
In this example, there are 7 measurements that are larger than the previous measurement.
How many measurements are larger than the previous measurement in the input file inputs/01_sonar_sweep.txt
?
My code was supposed to arrive at the answer 1292. You can get your own input from here.
# IMPORTANT: Set this to the correct path for you!
INPUT_FILE = "inputs/01_sonar_sweep.txt"
The problem statement asks us to traverse the depth reports and to compare the current measurement with the previous one.
The underlying idea is that, when talking about a sequence (for example, a list of measurements), a relationship of “previous” translates into subtracting 1 to the index at hands. Similarly, a relationship of “next” translates into adding 1 to the index.
The only thing we need to be careful about is ensuring we stay within the boundaries of the sequence, so that doing +1
or -1
in an index still gives a valid index.
This translates directly into this solution:
with open(INPUT_FILE, "r") as f:
depths = f.readlines()
count = 0
for i in range(1, len(depths)):
if int(depths[i - 1]) < int(depths[i]): # Compare previous with the current
count += 1
print(count)
1292
If we stick to the “compare with the previous” interpretation, then the indices that matter are i - 1
(the previous item) and i
(the current item); and, therefore, i
must start at 1
.
This means we use range(1, len(depths))
.
If we go with the “compare with the next” interpretation, then the indices that matter are i
(the current item) and i + 1
(the next item).
For that, our range needs to end earlier than len(depths)
:
with open(INPUT_FILE, "r") as f:
depths = f.readlines()
count = 0
for i in range(len(depths) - 1):
if int(depths[i]) < int(depths[i + 1]): # Compare current with “the next”
count += 1
print(count)
When using a with
statement to access a file, you know that your file is automatically closed when you leave the with
statement.
You also know that the with
statement is nice because it will still close the file if, for example, your code throws an error.
That's very convenient, and a lovely reason to use the with
statement.
However, while you are inside the with
statement, the file is opened and being used by the operating system.
In Python, it is a good practice to avoid nesting unnecessary things.
In particular, put as little code inside the with
statement as possible.
In our case, because we use .readlines
to read the whole file, we can leave the with
statement immediately:
with open(INPUT_FILE, "r") as f:
depths = f.readlines()
count = 0
for i in range(len(depths) - 1):
if int(depths[i]) < int(depths[i + 1]):
count += 1
print(count)
1292
Trying to keep the indentation level as low as possible (trying to keep your code flat, instead of nested) is something that is advocated for in the “Zen of Python”. You can watch a clip of a talk of mine where I talk about this, and explain it, here (it's at minute 17:11 of the linked video).
Another frequent anti-pattern in Python is the excerpt for i in range(len(...))
.
Most of the times, that for
loop isn't what you really wanted to use.
Python has very powerful for
loops, and the for i in range(len(...))
is a pattern that we inherited from languages like C.
In Python, we tend to use built-ins like
enumerate
and
zip
.
Another hint at the fact that the loop we care about is not the range(len(...))
, is that we don't really care about the indices.
Notice how range(len(something))
gives you all the legal indices associated with something
, but what we really care about are the elements.
A slight improvement would be to recognise the enumerate
pattern:
enumerate
is a good built-in to use if, in a for
loop, you care about the current element and about the current index you are using.
In our case, we care about the current index so that we can compute the index of the neighbouring element.
So, we could try writing something like this:
with open(INPUT_FILE, "r") as f:
depths = f.readlines()
count = 0
for i, num in enumerate(depths[:-1]):
if int(num) < int(depths[i + 1]):
count += 1
print(count)
1292
In order to make this work, we are using a slice to ignore the last element from depths
.
After all, the slice [:-n]
means “drop the last n
elements”.
Another interesting thought would be to try and simplify the i + 1
part.
The built-in enumerate
accepts a start
argument that specifies where the argument starts counting:
list(enumerate("code"))
[(0, 'c'), (1, 'o'), (2, 'd'), (3, 'e')]
list(enumerate("code", start=10))
[(10, 'c'), (11, 'o'), (12, 'd'), (13, 'e')]
Therefore, one might think that we could set start=1
to avoid having to perform the sum.
If we do so, then we must be very explicit about what index we are using:
with open(INPUT_FILE, "r") as f:
depths = f.readlines()
count = 0
for next_idx, num in enumerate(depths[:-1], start=1):
if int(num) < int(depths[next_idx]):
count += 1
print(count)
1292
However, I personally don't like this.
There is something here that makes me look for a better solution, although some might say I'm just being paranoid.
But, the truth is, our if
statement is very asymmetrical right now.
The solution lies elsewhere!
The built-in zip
is more correct here, because zip
is used to pair sequences up.
But what sequences do we want to pair up..?
After all, we have a single sequence at hands!
As it turns out, the both patterns of “this item & the next one” and “this item & the previous one” are easily written with zip
.
We just have to remember that, if seq
is a sequence, then seq[1:]
means “drop the first element” and seq[:-1]
means “drop the last element”:
s = "coder"
print(s[:-1])
print(s[1:])
print(list(zip(s[:-1], s[1:])))
code
oder
[('c', 'o'), ('o', 'd'), ('d', 'e'), ('e', 'r')]
With this, we can rewrite our solution to be:
with open(INPUT_FILE, "r") as f:
depths = f.readlines()
count = 0
for prev_, next_ in zip(depths[:-1], depths[1:]):
if int(prev_) < int(next_):
count += 1
print(count)
1292
In the above, I wrote the name next_
because next
is a built-in function.
Then, I decided to use prev_
instead of prev
just for symmetry.
You can pick any other two names you prefer, or use for prev, next_ in ...
.
To simplify things a bit, especially when doing a similar thing with three or more iterables, we can actually omit the slices that are cutting from the end, because zip
stops as soon as one iterable stops.
In other words, we don't need to specify depths[:-1]
:
With that in mind, we can remove the extra slice:
with open(INPUT_FILE, "r") as f:
depths = f.readlines()
count = 0
for prev_, next_ in zip(depths, depths[1:]):
if int(prev_) < int(next_):
count += 1
print(count)
1292
int
conversionsIf you look closely at the for
loop we are writing, you will notice that most of the values in depth
are going to be passed in to the built-in int
twice.
While that's not a terrible thing, it's double the work we need.
All we need is for each number to be converted once, right?
Therefore, we can do the int
conversion a bit earlier in the process:
with open(INPUT_FILE, "r") as f:
depths = f.readlines()
depths = [int(d) for d in depths]
count = 0
for prev_, next_ in zip(depths, depths[1:]):
if prev_ < next_:
count += 1
print(count)
1292
Of course, now we have another issue of repeated work: first, we go over the whole file to read the lines with .readlines
, and then we go over the file contents to convert everything into an integer.
We can do everything at once, if we convert the lines to integers while we read them:
with open(INPUT_FILE, "r") as f:
depths = [int(line) for line in f]
count = 0
for prev_, next_ in zip(depths, depths[1:]):
if prev_ < next_:
count += 1
print(count)
1292
In case you didn't know, you can iterate over a file, which allows you to iterate over the lines. That's what allowed us to convert into integers all the lines.
On top of that, you might be interested in knowing that int
is forgiving, in that it allows the integers to be surrounded by whitespace:
13 == int("13\n") == int(" 13 ") == int("\t\t\n13 \n\t ")
True
Like I mentioned earlier, we need to consider if our input file fits into memory or not.
Up to now, we have been reading the while file at once, but we don't need to!
We just saw we can iterate over f
lazily, so we can leverage that for our own solution.
When we do that, notice that throughout our comparison loop we will need to be reading new values from the file. Thus, if the file is large and we can't read all of it at once, we need to keep it open. In other words, we have to indent our code again.
On top of that, because we are assuming the file is very big, we can no longer create the list of depths
!
Thus, we have two options:
depths
; ormap
with map(int, f)
.Using a generator expression entails converting the [ ... ]
in the list comprehension to ( ... )
:
with open(INPUT_FILE, "r") as f:
depths = (int(line) for line in f)
count = 0
for prev_, next_ in zip(depths, depths[1:]):
if prev_ < next_:
count += 1
print(count)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15904/748196112.py in <module>
3
4 count = 0
----> 5 for prev_, next_ in zip(depths, depths[1:]):
6 if prev_ < next_:
7 count += 1
TypeError: 'generator' object is not subscriptable
However, as we do so, the depths[1:]
stops working because generators are not indexable/sliceable.
This shows a weakness in our solution!
Thankfully, there are two good solutions!
Starting with Python 3.10, there is a function called itertools.pairwise
that implements the exact zip
pattern we wanted:
from itertools import pairwise
with open(INPUT_FILE, "r") as f:
depths = (int(line) for line in f)
count = 0
for prev_, next_ in pairwise(depths):
if prev_ < next_:
count += 1
print(count)
1292
If you don't have Python 3.10 (or older), you can define your own pairwise
using itertools.tee
.
This isn't beginner-level Python, so feel free to skip this bit:
from itertools import tee
def pairwise(it):
"""Mock `itertools.pairwise` for Python versions below 3.10."""
prev_, next_ = tee(it, 2) # Split `it` into two iterables.
next(next_) # Advance once.
yield from zip(prev_, next_) # Yield the pairs.
with open(INPUT_FILE, "r") as f:
depths = (int(line) for line in f)
count = 0
for prev_, next_ in pairwise(depths):
if prev_ < next_:
count += 1
print(count)
1292
In all of the solutions above we have been counting by incrementing the count
variable only when the depth test passes.
However, there is a different approach to this, inspired by the languages where Boolean values are just 0s and 1s.
Instead of checking with an if
, we just add the value of the condition to count
:
with open(INPUT_FILE, "r") as f:
depths = (int(line) for line in f)
count = 0
for prev_, next_ in pairwise(depths):
count += prev_ < next_
print(count)
1292
When prev_ < next_
is True
(and the if
statement would pass the test, incrementing count
by one), the statement count += True
increments count
by one.
When prev_ < next_
is False
(and the if
statement would not increment count
), the statement count += False
increments count
by zero.
After all, Boolean values can be treated as integers:
1 + True
2
1 + False
1
This has to be used with caution, though, and is not always advisable.
However, moving the condition to the value that is being incremented gives rise to another implementation.
After all, the pattern
accumulator = 0
for element in iterable:
accumulator += foo(element)
is equivalent to
sum(foo(element) for element in iterable)
Therefore, we can rewrite our solution to be a sum:
with open(INPUT_FILE, "r") as f:
depths = (int(line) for line in f)
count = sum(prev_ < next_ for prev_, next_ in pairwise(depths))
print(count)
1292
itertools
If you want to study itertools
a bit further, I leave you with the exercise of studying this terser solution:
from itertools import pairwise, starmap
from operator import le
with open(INPUT_FILE, "r") as f:
print(sum(starmap(le, pairwise(map(int, f)))))
1292
Hint: start by understanding why this version works:
with open(INPUT_FILE, "r") as f:
depths = (int(line) for line in f)
le = lambda l, r: l < r
print(sum(le(*args) for args in pairwise(depths)))
1292
(Adapted from here.)
Instead, consider sums of a three-measurement sliding window. Again considering the above example:
199 A
200 A B
208 A B C
210 B C D
200 C D E
207 D E F
240 E F G
269 F G H
260 G H
263 H
Start by comparing the first and second three-measurement windows. The measurements in the first window are marked A (199, 200, 208); their sum is 199 + 200 + 208 = 607. The second window is marked B (200, 208, 210); its sum is 618. The sum of measurements in the second window is larger than the sum of the first, so this first comparison increased.
Your goal now is to count the number of times the sum of measurements in this sliding window increases from the previous sum. So, compare A with B, then compare B with C, then C with D, and so on. Stop when there aren't enough measurements left to create a new three-measurement sum.
In the above example, the sum of each three-measurement window is as follows:
A: 607 (N/A - no previous sum) B: 618 (increased) C: 618 (no change) D: 617 (decreased) E: 647 (increased) F: 716 (increased) G: 769 (increased) H: 792 (increased)
In this example, there are 5 sums that are larger than the previous sum.
Consider sums of a three-measurement sliding window. How many sums are larger than the previous sum in the input file inputs/01_sonar_sweep.txt
?
My code was supposed to arrive at the answer 1262. You can get your own input from here.
If we follow a train of thought similar to that employed for the first baseline solution, we might write something like this:
with open(INPUT_FILE, "r") as f:
depths = [int(line) for line in f] # Everything as integers from the get-go
count = 0
for i in range(3, len(depths)):
left = depths[i - 3] + depths[i - 2] + depths[i - 1]
right = depths[i - 2] + depths[i - 1] + depths[i]
if left < right:
count += 1
print(count)
1262
This solution works, but it is too much trouble to the programmer: we had to do six indexing operations! Do you know what are the odds that you'll make a typo and get one index wrong? For me, that's like 103%!
This is a test you can make to check if your code is “good”: look for numeric parameters in the problem set, and imagine they were much larger. Would you still want to write your solution in the same way?
For example, the window size of 3 is a parameter of the problem statement. What if it were a window of size 20? Would you still write the sum of the indexing by hand? Probably not!
Hence, you might want to change your code to something using slices:
with open(INPUT_FILE, "r") as f:
depths = [int(line) for line in f]
count = 0
for i in range(len(depths) - 3):
# We sum the windows we build with slices:
left, right = sum(depths[i:i + 3]), sum(depths[i + 1:i + 4])
if left < right:
count += 1
print(count)
1262
This is one very reasonable train of thought. Again, because it is using explicit slices, this wouldn't work if the input file were too large.
Thus, you might want to stay loyal to the pairwise
solution we found earlier.
Maybe there is an itertools
function for windows of a given size?
Sadly, itertools
does not have such a function.
You can take a look at the custom pairwise
defined above, and try your hand at implementing a triplewise
function, that returns three consecutive elements from an iterable.
Skip the definition, if you want, but look at the example usage:
from itertools import tee
def triplewise(it):
it1, it2, it3 = tee(it, 3)
next(it2); next(it3); next(it3)
yield from zip(it1, it2, it3)
list(triplewise("coder"))
[('c', 'o', 'd'), ('o', 'd', 'e'), ('d', 'e', 'r')]
With the function triplewise
(which doesn't exist in itertools
, but exists in
more_itertools
, along with many other useful functions!), we could write a rudimentary function that keeps computing the value of the sum of the current window, and compares it to the previous sum:
with open(INPUT_FILE, "r") as f:
depths = [int(line) for line in f]
count = 0
sum_prev = float("+inf")
for triple in triplewise(depths):
sum_ = sum(triple)
if sum_prev < sum_:
count += 1
sum_prev = sum_
print(count)
1262
This works, but now we have “regressed” in a way.
Part 1 and part 2 are very similar in structure. However, for part 1 we didn't need to have an auxiliary variable that we kept updating, and now we do...
Part 1 and part 2 are exactly the same, except that part 1 had a window size of 1, and part 2 has a window size of 3. So, part 2 should be solvable with code similar to that of part 1! In part 1, we used a pairwise to compare the windows, so that's what we should be doing here:
with open(INPUT_FILE, "r") as f:
depths = [int(line) for line in f]
count = 0
sum_prev = float("+inf")
for triple_prev, triple_next in pairwise(triplewise(depths)):
if sum(triple_prev) < sum(triple_next):
count += 1
print(count)
1262
Wait, is this getting too complicated?!
Good! That's because we are overlooking something.
I suffer from this, but when we read a problem statement, we should spend a couple of minutes thinking about it...
Let's go back to the original problem statement and the example they showed:
199 A
200 A B
208 A B C
210 B C D
200 C D E
207 D E F
240 E F G
269 F G H
260 G H
263 H
How do we compare the sliding window A
with the sliding window B
?
Notice how the windows A
and B
have a huge overlap!
They overlap in the numbers 200
and 208
; it's only the beginning of A
and the end of B
that change!
This is actually easy to spot in the baseline solution with indices.
Let me include it here again, but let me realign the assignments to left
and right
:
with open(INPUT_FILE, "r") as f:
depths = [int(line) for line in f]
count = 0
for i in range(3, len(depths)):
left = depths[i - 3] + depths[i - 2] + depths[i - 1]
right = depths[i - 2] + depths[i - 1] + depths[i]
if left < right:
count += 1
print(count)
1262
Or, better yet, let me copy & paste the additions directly into the if
statement, removing the assignments altogether:
with open(INPUT_FILE, "r") as f:
depths = [int(line) for line in f]
count = 0
for i in range(3, len(depths)):
if depths[i - 3] + depths[i - 2] + depths[i - 1] < depths[i - 2] + depths[i - 1] + depths[i]:
count += 1
print(count)
1262
Why did I do this?
Because, now, I want to think about the condition of the if
statement like a mathematical inequality.
(Do you see where this is going?)
Let me just abbreviate depths
as d
because I'm lazy:
\[ \require{cancel} \begin{align} d[i - 3] + {d[i - 2]} + {d[i - 1]} &< {d[i - 2]} + {d[i - 1]} + d[i] \implies \\ d[i - 3] + \cancel{d[i - 2]} + \cancel{d[i - 1]} &< \cancel{d[i - 2]} + \cancel{d[i - 1]} + d[i] \implies \\ d[i - 3] &< d[i] \end{align}\]
So, in essence, we can omit most of the terms from the comparison because they don't matter!
They are being added to both sides of the scale (you can think of a <
as a scale checking if the left side is lighter), and therefore they do not make a difference in the balance of the scale:
with open(INPUT_FILE, "r") as f:
depths = [int(line) for line in f]
count = 0
for i in range(3, len(depths)):
if depths[i - 3] < depths[i]:
count += 1
print(count)
1262
This is a pretty good solution.
One (tiny) thing that is very worth the mention is that with this version, it becomes trivial to parametrise the window size:
WS = 3 # Window size.
with open(INPUT_FILE, "r") as f:
depths = [int(line) for line in f]
count = 0
for i in range(WS, len(depths)):
if depths[i - WS] < depths[i]:
count += 1
print(count)
1262
With the addition of the WS
variable, the problem statement can change the window size to anything they'd like, that we can adjust our solution with the same ease.
In particular, if we change WS
to 1, we get back the problem from part 1:
WS = 1
with open(INPUT_FILE, "r") as f:
depths = [int(line) for line in f]
count = 0
for i in range(WS, len(depths)):
if depths[i - WS] < depths[i]:
count += 1
print(count) # The value from part 1.
1292
The only thing left to mention is what we do if the file f
can't be loaded to memory as a whole, because it's too big.
In that case, depths
can't be defined as a list comprehension, and thus depth
won't be indexable like in the if
statement.
In that case, we need to do something similar to the pairwise
that we did before.
Before, we did a pairwise
because we wanted to compare each value to the one before it (index minus one).
Now, we want to compare each value with the one three places before it (index minus three).
Then, we can use two functions from itertools
: tee
and islice
; or we can keep track of the window by hand.
To do this by hand, we will keep track of a list whose size is one plus the window size. We start by filling this manual window, and then we keep comparing the two ends of the window:
WS = 3
with open(INPUT_FILE, "r") as f:
depths = (int(line) for line in f) # Using a lazy generator now.
# Start by putting `WS` elements inside `window`.
window = [next(depths) for _ in range(WS)]
count = 0
# When we get here, `depths` is already missing some elements from the beginning,
# so we can already start comparing depth values.
for elem in depths:
window.append(elem)
if window[0] < window[-1]:
count += 1
window.pop(0) # Get rid of the oldest element in the window.
print(count)
1262
This is a pretty decent solution.
The most confusing thing may be the way window
is initialised, with the built-in next
.
next
is a function that accepts a lazy generator and just spits out the next value of the lazy generator.
Here is an example:
gen_ints_to_10 = iter(range(10))
next(gen_ints_to_10)
0
next(gen_ints_to_10)
1
What's interesting about iterators is that, once you start going through them, you've “lost” the items you already saw.
For example, if you iterate over gen_ints_to_10
with a for
loop now, the 0
and the 1
will already be gone:
for num in gen_ints_to_10:
print(num, end=" ")
2 3 4 5 6 7 8 9
If you are comfortable with this solution, you may want to study this next solution:
from itertools import islice, tee
WS = 3
with open(INPUT_FILE, "r") as f:
depths = (int(line) for line in f)
prev_it, it = tee(depths, 2) # Split iterator in two.
next_it = islice(it, WS, None) # Skip WS elements from this iterator.
count = 0
for prev_, next_ in zip(prev_it, next_it):
if prev_ < next_:
count += 1
print(count)
1262
In one way or another, this problem showed that whenever we need to iterate over a sequence of values, we are likely to find useful tools in the itertools
module.
We were careful to try and not do too much repeated work, and we also took a look at how our code would have to be different if the input data were too much to hold in memory at once.
If you have any questions, suggestions, remarks, recommendations, corrections, or anything else, you can reach out to me on Twitter or via email to rodrigo at mathspp dot com.
+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book “Pydon'ts” teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here 🐍🚀.