A short article with all you need to know about sequence indexing in Python – and a bit more.
(If you are new here and have no idea what a Pydon't is, you may want to read the Pydon't Manifesto.)
Sequences in Python, like strings, lists, and tuples, are objects that support indexing: a fairly simple operation that we can use to access specific elements. This short article will cover the basics of how sequence indexing works and then give you some tips regarding anti-patterns to avoid when using indices in your Python code.
In this article you will:
First and foremost, I am talking about sequence indexing here to distinguish the type of indexing you do to access the values of a dictionary, where you use keys to index into the dictionary and retrieve its values. In this article we will be talking about using integers to index linear sequences, that is, sequences that we can traverse from one end to the other, in an ordered fashion.
A very simple example of such a sequence is a string:
>>> s = "Indexing is easy!" >>> s 'Indexing is easy!'
To index a specific character of this string I just use square brackets and the integer
that corresponds to the character I want.
Python is 0-indexed, which means it starts counting indices at 0.
Therefore, the very first element of a sequence can be obtained with
In our example, this should give a capital
>>> s = "Indexing is easy!" >>> s 'I'
Then, each following character is obtained by increasing the index by 1:
>>> s = "Indexing is easy!" >>> s 'n' >>> s 'd' >>> s 'e'
Here is a figure that shows how to look at a sequence and figure out which index corresponds to each element:
Imagine vertical bars that separate consecutive elements, and then number each of those vertical bars, starting with the leftmost bar. Each element gets the index associated with the bar immediately to its left:
Because indices start at 0, the last legal index to a sequence is the index that is equal to the length of the sequence, minus one:
>>> s = "Indexing is easy!" >>> len(s) 17 >>> s '!' >>> s Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: string index out of range
As you can see above, if you use an index that is too large
(read: greater than or equal to the length of the sequence)
Python will raise an
IndexError, warning you about your
usage of an integer that is too large for that specific indexing operation.
If the last legal index is the length of the sequence minus 1, then there is an obvious way to access the last item of a sequence:
>>> s = "Indexing is easy!" >>> s[len(s)-1] '!' >>> l = [12, 45, 11, 89, 0, 99] >>> l[len(l)-1] 99
However, Python provides this really interesting feature where you can use negative indices to count from the end of the sequence. In order to figure out which negative index corresponds to which element, think about writing the sequence to the left of itself:
Then you just have to continue the numbering from the right to the left, therefore making use of negative numbers:
From the figure above you can see that the index
-1 refers to the last element of the sequence,
-2 refers to the second to last, etc:
>>> s = "Indexing is easy!" >>> s[-1] '!' >>> s[-2] 'y'
We can also take a look at all the negative indices that work for our specific sequence:
Another way to look at negative indices is to pretend there is a
len(s) to their left:
|Negative index||Corresponding positive index|
And a couple of examples:
>>> s = "Indexing is easy!" >>> s[-5] 'e' >>> s[len(s)-5] 'e' >>> s[-13] 'x' >>> s[len(s)-13] 'x' >>> len(s) 17 >>> s[-17] 'I' >>> s[len(s)-17] 'I'
Having seen the basic syntax for indexing, there are a couple of indices that would be helpful if you were able to read them immediately for what they are, without having to think about them:
||First element of
||Second element of
||Last element of
||Second to last element of
Just a quick note on something that I trip over every now and then.
Python has many useful built-ins and built-in data types. Of them, strings, lists and tuples are indexable with integers. Sets are not.
You should also be careful about things that you think
are like lists, but really are not.
None of these are indexable, none of these have a
len value, etc.
Pay attention to that!
>>> l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> e = enumerate(l) >>> e Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'enumerate' object is not subscriptable >>> z = zip(l) >>> z # ... TypeError: 'zip' object is not subscriptable >>> m = map(str, l) >>> m # ... TypeError: 'map' object is not subscriptable
Because of the way both
range and indices work,
one can understand that
range(len(s)) will generate
all the legal indices for
>>> s = "Indexing is easy!" >>> list(range(len(s))) # use list() to print the values [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] >>> s 'I' >>> s '!'
A consequence of this is that beginners, and people who are
more distracted or used to other programming languages,
end up employing a very common anti-pattern in
To exemplify this, suppose we wanted to write a fairly naïve program to find the unique letters in our string. Here is what the anti-pattern would look like:
>>> s = "Indexing is easy!" >>> uniques =  >>> for idx in range(len(s)): ... if s[idx] not in uniques: ... uniques.append(s[idx]) ... >>> uniques ['I', 'n', 'd', 'e', 'x', 'i', 'g', ' ', 's', 'a', 'y', '!']
This is a naïve solution to the problem of “find unique characters”,
you probably want to use a Python
set for a more efficient implementation :)
The problem here is that the
is being done in a roundabout way:
we have access to a sequence (the string) that we could
iterate over, but instead we find its length,
so that we can use
range to compute its legal indices,
which we then iterate over,
only to then access the elements of the sequence through their indices.
This way of writing
for loops is similar to the way one would
for loops in other programming languages, if you were
to iterate over the elements of an array.
However, we are using Python, not any other language.
One of the things I enjoy the most about Python's
is that you can access directly the consecutive elements of
Hence, we can actually rewrite our
for loop slightly,
but in a way that makes it much more elegant:
>>> s = "Indexing is easy!" >>> uniques =  >>> for letter in s: ... if letter not in uniques: ... uniques.append(letter) ... >>> uniques ['I', 'n', 'd', 'e', 'x', 'i', 'g', ' ', 's', 'a', 'y', '!']
What I really like about these types of loops is that if your
variables are named correctly, the statements express your intent
for letter in s: is read as
letterin (the string)
This type of
for loop iterates directly over the values you
care about, which is often what you want.
If you care about the indices,
then be my guest and use
Another anti-pattern to be on the lookout for happens
when you need to work with the indices and the values.
In that case, you probably want to use the
I tell you all about that function in a Pydon't of its own,
so go check that if you haven't.
When you are dealing with sequences and with indices
for those sequences, you may end up needing to perform some
calculations to compute new indices that interest you.
For example, suppose you want the middle element of a string
and you don't know about
>>> s = "Indexing is easy!" >>> s[len(s)/2] # len(s)/2 isn't an integer!! Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: string indices must be integers >>> len(s)/2 8.5 >>> import math >>> s[math.floor(len(s)/2)] ' ' >>> s[len(s)//2] # Pro-tip: the operation // is ideal here ' '
Where am I going with this?
Take a look at the expression you just used:
Maybe it is me getting old, but I struggle a bit to read that
because of the
 enclosing the expression which then has
a couple of
() that I also have to parse, to figure out
what goes where.
If you have large expressions to compute indices
(and here, large will be subjective),
inserting those expressions directly inside
lead to long lines of code that are then complicated to read
If you have lines that are hard to understand,
then you probably need to comment them, creating even
more lines of code.
Another alternative is to create a well-named variable to hold the result of the computation of the new index:
>>> s = "Indexing is easy!" >>> mid_char_idx = math.floor(len(s)/2) >>> s[mid_char_idx] ' '
For this silly example, notice that the new variable name
is almost as long as the expression itself!
s[mid_char_idx] is very, very, easy to read
and does not need any further comments.
So, if you have large expressions to compute indices, think twice before using them to index directly into the sequence at hands and consider using an intermediate variable with a descriptive name.
You will find yourself often working with small groups of data, for example pairs of things that you keep together in a small list for ease of use. For example, the first and last names of a person:
>>> names = ["Mary", "Doe"]
Now you have this little function that creates a formal or informal greeting for a given name:
>>> names = ["Mary", "Doe"] >>> def greet(names, formal): ... if formal: ... return "Hello Miss " + names ... else: ... return "Hey there " + names ... >>> greet(names, True) 'Hello Miss Doe' >>> greet(names, False) 'Hey there Mary'
Something you might consider and that adds a bit of clarity
to your code is unpacking the
names before you reach the
def greet(names, formal): first, last = names if formal: return "Hello Miss " + last else: return "Hey there " + first
Why would this be preferable, if I just added a line of code?
It makes the intent of the code much more obvious.
Just from looking at the function as is, you can see
from the first line
first, last = names that
names is supposed to be a pair with the first and last names
of a person and then the
if: ... else: ... is very, very
easy to follow because we see immediately that we want to
if we need a
formal greeting, and otherwise
else) we use the
Furthermore, the action of unpacking (like so:)
first, last = names
greet function to expect pairs as the
because a list with less or more elements will raise an error:
>>> first, last = ["Mary", "Anne", "Doe"] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: too many values to unpack (expected 2)
We are assuming we really are working with pairs,
so if the
greet function gets something that is not a pair,
this error is useful in spotting a problem in our code.
Maybe someone didn't understand how to use the function and
called it with the first name of the person?
>>> greet("Mary", True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in greet ValueError: too many values to unpack (expected 2)
This would help you find a location where the
function was not being properly used.
Here's the main takeaway of this article, for you, on a silver platter:
“Indexing is simple and powerful, but sometimes when indexing looks like the answer, there is another Python feature waiting to be used.”
This Pydon't showed you that:
forloop to access the elements of a sequence is an anti-pattern in Python;
If you liked this Pydon't be sure to leave a reaction below and share this with your friends and fellow Pythonistas. Also, don't forget to subscribe to the newsletter so you don't miss a single Pydon't!
Espero que tenhas aprendido algo novo! Se sim, considera seguir as pisadas dos leitores que me pagaram uma fatia de pizza 🍕. O teu pequeno contributo ajuda-me a manter este projeto grátis e livre de anúncios aborrecidos.