This short article shows how to chunk iterables into pieces of a fixed length.

If you need to chunk a list or another iterable into groups of n items, you can do that with the built-in zip:

>>> def chunk(iterable, chunk_size):
...     return list(zip(*[iter(iterable)] * chunk_size))
...
>>> chunk(range(10), 5)
[(0, 1, 2, 3, 4), (5, 6, 7, 8, 9)]

You can also add a third argument to process each chunk after it is created:

>>> chunk("Hello, world", 3)
[('H', 'e', 'l'), ('l', 'o', ','), (' ', 'w', 'o'), ('r', 'l', 'd')]

# Redefine `chunk`:
>>> def chunk(iterable, chunk_size, on_chunk=None):
...     on_chunk = on_chunk or (lambda x: x)
...     return [on_chunk(chunk) for chunk in zip(*[iter(iterable)] * chunk_size)]
...
>>> chunk("Hello, world", 3, "".join)
['Hel', 'lo,', ' wo', 'rld']

The code above has a limitation, though: it ignores the last elements if the chunk size doesn't divide evenly into the length of the iterable. Sometimes this is ok... Sometimes, it is not.

If you use itertools.zip_longest, you get the opposite behaviour:

>>> chunk(range(8), 5)
[(0, 1, 2, 3, 4)]  # where are 5, 6, and 7??

>>> from itertools import zip_longest
>>> def chunk_longest(iterable, chunk_size):
...     return list(zip_longest(*[iter(iterable)] * chunk_size))
...

>>> chunk_longest(range(8), 5)
[(0, 1, 2, 3, 4), (5, 6, 7, None, None)]

By using zip_longest, you get padding with the value None. You could also customise this padding.

Finally, if you want your chunks to always have the same size, you can use strict=True in zip! This will make your chunk function error if the chunk size doesn't divide evenly! However, this only works in Python 3.10+ because that is when zip(..., strict=True) was added.

>>> def chunk_strict(iterable, chunk_size):
...     return list(zip(*[iter(iterable)] * chunk_size, strict=True))
...

# 2 divides into 6
>>> chunk_strict(range(6), 2)
[(0, 1), (2, 3), (4, 5)]

# 5 does NOT divide into 8
>>> chunk_strict(range(8), 5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in chunk_strict
ValueError: zip() argument 4 is shorter than arguments 1-3

This article was generated automatically from this thread I published on Twitter @mathsppblog. Then it was edited lightly.

Come take a course!

The next cohort of the Intermediate Python Course starts soon.

Grab your spot now and learn the Python skills you've been missing!

Previous Post Next Post

Blog Comments powered by Disqus.