This article briefly describes the iterators available in the Python module itertools and how to use them.
itertools
overviewThe Python module itertools
contains 20 tools that every Python developer should be aware of.
We divide the iterators from the module itertools
in 5 categories to make it easier to learn them and we also present a short list of the generally most useful ones.
itertools
Category | Iterators |
---|---|
Reshaping iterators | batched , chain *, groupby , islice , pairwise * |
Filtering iterators | compress , dropwhile , filterfalse , takewhile |
Combinatorial iterators | combinations , combinations_with_replacement , permutations , product * |
Infinite iterators | count , cycle , repeat |
Iterators that complement other tools | accumulate , starmap , zip_longest |
On top of the 19 iterators listed in the table above, the module itertools
also provides the function tee
, which is very powerful but often not necessary.
In “The little book of itertools
” I devote a small chapter to it because understanding and reimplementing tee
is an excellent learning and coding exercise.
itertools
In my experience, the 3 more commonly useful tools in the module itertools
are product
, chain
, and pairwise
.
product
flattens nested loopsThe iterator product
is a combinatorial iterator that is very useful when you want to flatten a series of nested for
loops.
As the prototypical example, a nested loop that traverses a two-dimensional grid can be rewritten as a single loop with product
.
So, whenever we have two or more independent, nested for
loops, like below:
for x in range(width):
for y in range(height):
# Do stuff...
We can reshape them into a single loop if we use product
:
from itertools import product
for x, y in product(range(width), range(height)):
# Do stuff...
The flat structure gives you more horizontal space to write your code and makes it easier to manage breaking out of your loop.
This is a very common use case for product
.
If you go back to old code of yours, I am sure you will be able to find places where you could rewrite a loop like this.
chain
creates a single iterable out of manyThe iterator chain
lets you chain two or more iterables together, so that you can traverse them in sequence without having to add them explicitly.
When you are dealing with iterables like lists or strings, you might argue that you would rather spend the time doing the addition instead of having to import chain
, but this doesn't always work.
Consider this snippet of code that concatenates two lists so that we can traverse them:
# Typical pattern:
first_list = [...]
second_list = [...]
full_list = first_list + second_list # + third_list + ...
for element in full_list:
# Do stuff
Using chain
, we wouldn't need the addition:
from itertools import chain
first_list = [...]
second_list = [...]
for element in chain(first_list, second_list): # Also works with 3+ iterables.
# Do stuff
This also works in situations where you can't concatenate the iterables:
first_gen = (x ** 2 for x in range(3))
second_gen = (x ** 3 for x in range(3))
# first_gen + second_gen # TypeError!
for value in chain(first_gen, second_gen):
print(value, end=" ") # 0 1 4 0 1 8
You might also be thinking about the fact that you could just use the built-in list
on gen1
and gen2
to convert them to lists, and then concatenate the lists.
This is true, but it's typically a waste of resources and it won't work when dealing with infinite iterators.
The iterator chain
also provides an auxiliary constructor called chain.from_iterable
which, put simply, flattens an iterable.
A typicaly use case would be to flatten a list of lists:
nested = [[1, 2, 3], [4], [], [5, 6]]
flat = list(chain.from_iterable(nested))
print(flat) # [1, 2, 3, 4, 5, 6]
The beauty of chain.from_iterable
is that you don't even need to convert the final result into a list if all you want is to traverse over the elements:
nested = [[1, 2, 3], [4], [], [5, 6]]
for value in chain.from_iterable(nested):
print(value, end=" ") # 1 2 3 4 5 6
pairwise
produces overlapping pairs of consecutive elementsThe iterator pairwise
accepts any iterable and produces overlapping pairs of consecutive elements.
It is, essentially, an efficient and general implementation of the pattern zip(my_list[:-1], my_list[1:])
.
Thus, pairwise
is useful for two main reasons:
The common pattern that pairwise
replaces is the following:
names = ["Harry", "Anne", "George"]
for left, right in zip(names[:-1], names[1:]):
print(f"{left} says hi to {right}")
"""Output:
Harry says hi to Anne
Anne says hi to George
"""
Using pairwise
, you won't need zip
nor the slicing:
from itertools import pairwise
names = ["Harry", "Anne", "George"]
for left, right in pairwise(names):
print(f"{left} says hi to {right}")
"""Output:
Harry says hi to Anne
Anne says hi to George
"""
The reshaping iterators in this section produce output in a different format than that of the input. You can find a simple example for each one of them below.
Signature | Docs | Brief explanation |
---|---|---|
batched(iterable, n) |
Docs | Produces tuples of length n from the given iterable, until exhausted. The last tuple might have less than n elements. |
chain(*iterables) |
Docs | Produces a single iterable out of multiple iterables. |
*chain.from_iterable(iterable) |
Docs | Flattens an iterable of iterables. |
islice(iterable, stop) |
Docs | Slices the first stop elements from the given iterable. Similar to lst[:stop] . |
*islice(iterable, start, stop[, step]) |
Docs | Slices the first stop elements from the given iterable, dropping the first start and returning only one in every step elements. Similar to lst[start:stop:step] . |
groupby(iterable, key=None) |
Docs | Creates sub-iterators of consecutive values from iterable for which the function key returns the same value. |
pairwise(iterable) |
Docs | Produces overlapping pairs of consecutive elements of iterable . Similar to zip(lst[:-1], lst[1:]) . |
batched
# Read a file 5 lines at a time.
from itertools import batched
with open(some_path, "r") as f:
for lines in batched(f, 5):
print(lines) # Process the lines.
chain
# Traverse 2+ generators in order (we can't concatenate them).
from itertools import chain
first_gen = (x ** 2 for x in range(3))
second_gen = (x ** 3 for x in range(3))
for value in chain(first_gen, second_gen):
print(value, end=" ") # 0 1 4 0 1 8
islice
# Slice generators.
from itertools import islice
squares = (x ** 2 for x in range(999_999_999))
for square in islice(squares, 10):
print(square, end=" ") # 0 1 4 9 16 25 36 49 64 81
squares = (x ** 2 for x in range(999_999_999)) # Reset
for square in islice(squares, 5, 15, 3):
print(square, end=" ") # 25 64 121 196
groupby
# Compute longest winning streak.
from itertools import groupby
game_results = "WWWLLWWWWLWWWWWWL"
longest_streak = 0
for key, streak in groupby(game_results):
if key == "W":
longest_streak = max(longest_streak, len(list(streak)))
print(longest_streak) # 6
pairwise
from itertools import pairwise
names = ["Harry", "Anne", "George"]
for left, right in pairwise(names):
print(f"{left} says hi to {right}")
"""Output:
Harry says hi to Anne
Anne says hi to George
"""
The filtering iterators accept an iterable and a predicate and will produce a subset of the elements of the original iterable. You can find a simple example for each one of them below.
Signature | Docs | Brief explanation |
---|---|---|
compress(data, selectors) |
Docs | Produces the values from the data for which the corresponding selector is Truthy. |
dropwhile(predicate, iterable) |
Docs | Drops the first consecutive run of elements in the given iterable that satisfy the given predicate. |
filterfalse(predicate, iterable) |
Docs | Complement of the built-in filter . Produces the values of the given iterable that do not satisfy the given predicate. |
takewhile(predicate, iterable) |
Docs | Complement of dropwhile . Produces the first run of consecutive values of the given iterable that satisfy the given predicate. |
compress
compress
is typically useful when you already have the selectors computed, for example because they came from a different data source.
If you have to compute them specifically for compress
, you're usually better off with the built-in filter
.
# Find possible voters.
from itertools import compress
people = ["Harry", "Anne", "George"]
can_vote = [True, True, False]
for name in compress(people, can_vote):
print(name, end=" ") # Harry Anne
dropwhile
from itertools import dropwhile
# Top chess grandmasters and ratings (July 2024)
grandmasters = [
("Magnus Carlsen", 2832),
("Hikaru Nakamura", 2802),
("Fabiano Caruana", 2796),
("Arjun Erigaisi", 2778),
("Ian Nepomniachtchi", 2770),
]
# Drop grandmasters with rating above 2800:
for gm in dropwhile(lambda gm: gm[1] > 2800, grandmasters):
print(gm[0], end=", ") # Fabiano Caruana, Arjun Erigaisi, Ian Nepomniachtchi,
filterfalse
# Find people who are too young to vote.
from itertools import filterfalse
people = [
("Harry", 17),
("Anne", 21),
("George", 5),
]
def can_vote(person):
return person[1] >= 18
for name, _ in filterfalse(can_vote, people):
print(name, end=", ") # Harry, George,
takewhile
from itertools import takewhile
# Top chess grandmasters and ratings (July 2024)
grandmasters = [
("Magnus Carlsen", 2832),
("Hikaru Nakamura", 2802),
("Fabiano Caruana", 2796),
("Arjun Erigaisi", 2778),
("Ian Nepomniachtchi", 2770),
]
# Take grandmasters with rating above 2800:
for gm in takewhile(lambda gm: gm[1] > 2800, grandmasters):
print(gm[0], end=", ") # Magnus Carlsen, Hikaru Nakamura,
The combinatorial iterators in this section combine the elements of one or more iterables in different ways and these iterators typically have a mathematical connotation. You can find a simple example for each one of them after the table.
Despite being a combinatorial iterator, product
is probably the most universally useful iterator of the whole module!
If you haven't yet, take a look at how you can use the iterator product
to flatten nested loops..
Signature | Docs | Brief explanation |
---|---|---|
combinations(iterable, r) |
Docs | Produce tuples of length r of the elements of the given iterable where the elements are sorted with respect to their original positions. |
combinations_with_replacement(iterable, r) |
Docs | Same thing as combinations , but each value can be repeated arbitrarily many times. |
permutations(iterable, r=None) |
Docs | Produces all permutations of r elements of the given iterable. |
product(*iterables, repeat=1) |
Docs | Produces tuples combining all elements from all the given iterables. The iterable(s) can be repeated any number of times. |
The combinatorial iterators use the position of the elements as a key when uniqueness needs to be taken into account. In other words, the actual values themselves are never compared between themselves.
combinations
# Possible flavours for 2-scoop ice creams (no repetition)
from itertools import combinations
flavours = ["chocolate", "vanilla", "strawberry"]
for scoops in combinations(flavours, 2):
print(scoops)
"""Output:
('chocolate', 'vanilla')
('chocolate', 'strawberry')
('vanilla', 'strawberry')
"""
combinations_with_replacement
# Possible flavours for 2-scoop ice creams (repetition allowed)
from itertools import combinations_with_replacement
flavours = ["chocolate", "vanilla", "strawberry"]
for scoops in combinations_with_replacement(flavours, 2):
print(scoops)
"""Output:
('chocolate', 'chocolate')
('chocolate', 'vanilla')
('chocolate', 'strawberry')
('vanilla', 'vanilla')
('vanilla', 'strawberry')
('strawberry', 'strawberry')
"""
permutations
# Order in which the 2 scoops can be served (no repetition)
from itertools import permutations
flavours = ["chocolate", "vanilla", "strawberry"]
for scoops in permutations(flavours, 2):
print(scoops)
"""Output:
('chocolate', 'vanilla')
('chocolate', 'strawberry')
('vanilla', 'chocolate')
('vanilla', 'strawberry')
('strawberry', 'chocolate')
('strawberry', 'vanilla')
"""
product
# All the different ice-cream orders I could make
from itertools import product
possible_scoops = [2, 3]
possibly_served_on = ["cup", "cone"]
for scoop_n, served_on in product(possible_scoops, possibly_served_on):
print(f"{scoop_n} scoops served on a {served_on}.")
"""Output:
2 scoops served on a cup.
2 scoops served on a cone.
3 scoops served on a cup.
3 scoops served on a cone.
"""
The infinite iterators in this section produce potentially infinite iterators.
These are typically used in conjunction with other iterators, for example with zip
.
You can find a simple example for each one of them after the table.
Signature | Docs | Brief explanation |
---|---|---|
count(start=0, step=1) |
Docs | Same as the built-in range , but without a stopping point. |
cycle(iterable) |
Docs | Iterates endlessly over the items in the given iterable. |
repeat(object[, times]) |
Docs | Creates an iterator that repeats the object given endlessly, or the specified number of times. |
count
# Unique ID generator.
from itertools import count
ID_GENERATOR = count()
class Sandwich:
def __init__(self):
self.sandwich_id = next(ID_GENERATOR)
print(Sandwich().sandwich_id) # 0
print(Sandwich().sandwich_id) # 1
cycle
# Create a layered sandwich.
from itertools import cycle
ingredients = cycle(["tomato", "cheese", "chicken"])
layers = 5
print("<bread", end=" ")
for _, ingredient in zip(range(layers), ingredients):
print(ingredient, end=" ")
print("bread>")
# <bread tomato cheese chicken tomato cheese bread>
repeat
# Repeatedly produce the same object.
from itertools import repeat
bread_dispenser = repeat("bread")
people = ["Harry", "Anne", "George"]
for person, bread in zip(people, bread_dispenser):
print(f"{person}, here's some {bread}, make yourself a sandwich.")
"""Output:
Harry, here's some bread, make yourself a sandwich.
Anne, here's some bread, make yourself a sandwich.
George, here's some bread, make yourself a sandwich.
"""
The iterators listed here complement other iterators from the language (for example, just how filter
and filterfalse
complement each other).
You can find a simple example for each one of them after the table.
Signature | Complements | Docs | Brief explanation |
---|---|---|---|
accumulate(iterable[, function, *, initial=None]) |
functools.reduce |
Docs | Just like functools.reduce , but accumulates the intermediate values. |
starmap(function, iterable) |
map |
Docs | Like map(lambda args: function(args), iterable) . |
zip_longest(*iterables, fillvalue=None) |
zip |
Docs | Like zip , but stops on the longest iterable instead of the shortest one, filling empty positions with the value specified. |
accumulate
The iterator accumulate
works in a similar way to
functools.reduce
.
While reduce
only produces the final value of the reduction, the iterator accumulate
provides the intermediate values as well.
# Partial products to see investment growth over time.
from functools import reduce
from itertools import accumulate
from operator import mul
interest_rates = [1.005, 1.005, 1.008, 1.01, 1.01, 1.02]
initial_investment = 1000
# Same as `math.prod`:
print(reduce(mul, interest_rates, initial_investment)) # ~1059.34
print(list(
accumulate(
interest_rates,
mul,
initial=initial_investment,
)
)) # ~ [1000, 1005, 1010.02, 1018.11, 1028.29, 1038.57, 1059.34]
starmap
# Useful when arguments are packed but function expects different arguments.
from itertools import starmap
to_compute = [
(2, 3), # 8
(2, 4), # 16
(2, 5), # 32
(3, 2), # 9
(3, 3), # 27
]
print(list(
starmap(pow, to_compute) # [8, 16, 32, 9, 27]
))
# Compare to:
bases = [2, 2, 2, 3, 3]
exponents = [3, 4, 5, 2, 3]
print(list(
map(pow, bases, exponents) # [8, 16, 32, 9, 27]
))
zip_longest
# Go over multiple iterables until all are exhausted.
from itertools import repeat, zip_longest
# Available ingredients:
bread = repeat("bread", 4)
mayo = repeat("mayo", 2)
chicken = repeat("chicken", 4)
for ingredients in zip_longest(bread, mayo, chicken, fillvalue=""):
print(f"Here's a sandwich with {' '.join(ingredients)}.")
"""Output:
Here's a sandwich with bread mayo chicken.
Here's a sandwich with bread mayo chicken.
Here's a sandwich with bread chicken.
Here's a sandwich with bread chicken.
"""
tee
The function tee
is brilliant because it seems to implement something that goes agains the very own definition of iterators.
An iterator provides a stream of data that can only be consumed once, but tee(iterable, n=2)
can be used to produce as many independent iterators over a single source of data as you may want.
In “The little book of itertools
” we explore tee
in depth and we reimplement it.
For this overview of the module itertools
, it suffices to tell you that it is unlikely that you will need it.
Before pairwise
was introduced in Python 3.10, tee
would provide a good way to implement it:
from itertools import tee
def pairwise(iterable):
first, second = tee(iterable, 2)
next(second)
yield from zip(first, second)
itertools
is an excellent module to studyThe module itertools
is an excellent module to study, not only because it provides many useful tools that you can use in your code, but also because reimplementing itertools
in Python is an excellent exercise for when you are learning how to work with generators and iterators.
I invite you to take a look at “The little book of itertools
”, a very short book where I walk you through reimplementing itertools
effectively in Python, with explanations of all of the things you need to know, with proposed solutions to compare your work against, and with automated tests to help you ensure you're on the right track.
>> Take a look at “The little book of itertools
” here.
+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book “Pydon'ts” teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here 🐍🚀.
itertools
, https://docs.python.org/3/library/itertools.html [last accessed 23-07-2024];