In this Pydon't we cover advanced topics related to sequence slicing, like (negative) steps, more idiomatic sequence slicing, slice assignment, and slice deletion.
(If you are new here and have no idea what a Pydon't is, you may want to read the Pydon't Manifesto.)
In the previous Pydon't we looked at using sequence slicing to manipulate sequences, such as strings or lists. In this Pydon't we will continue on the subject of slicing sequences, but we will look at the more advanced topics. In particular, in this Pydon't you will
As it turns out, there is A LOT to say about sequence slicing, so I will have to split this Pydon't yet again, and next time we will finish off the subject of slicing sequences with:
The next stop in your journey to mastering slicing in Python is knowing about the lesser-used third parameter in the slice syntax: the step.
The step is a third integer that you can add to your slicing syntax
that allows you to pick evenly spaced elements from the sequence.
If s
is a sequence, then you've seen how to do slicing with the
syntax s[start:end]
, and the step comes after those two parameters:
s[start:end:step]
.
The easiest way to understand how the step
parameter works is by
omitting the start
and end
parameters, like so: s[::step]
.
Here are a couple of examples:
>>> "a0b0c0d0"[::2]
'abcd'
>>> "a00b00c00d00"[::3]
'abcd'
>>> "a000b000c000d000"[::4]
'abcd'
As you can see, in all of the examples above,
we used the step
parameter to skip all the zeroes in between the letters.
step
parameter is 2
, you split the sequence in groups of 2
and pick the first element of each group.step
parameter is 3
, you split the sequence in groups of 3
and pick the first element of each group.step
parameter is 4
, you split the sequence in groups of 4
and pick the first element of each group.You get the idea, right?
Once again, notice that the step
parameter in slicing is closely related
to the third (optional) argument that the built-in range
function accepts.
When you also specify start and stop positions, Python will first figure out the section of the sequence that is encompassed by the start and stop parameters, and only then it uses the step to pick the correct elements from the sequence.
For example, can you explain yourself this result:
>>> s = 'Slicing is easy!'
>>> s[2:14:2]
'iigi a'
What happens first is that s[2:14]
tells Python that we only want to work
with a part of our original string:
>>> s = 'Slicing is easy!'
>>> s[2:14]
'icing is eas'
Then the step parameter kicks in and tells Python to only pick a few elements, like the figure below shows:
That is why we get the result that we got:
>>> s = 'Slicing is easy!'
>>> s[2:14:2]
'iigi a'
# We can also first get the substring and then use the `step`...
>>> sub = s[2:14]
>>> sub[::2]
'iigi a'
# ... or do both consecutively without an intermediate variable:
>>> s[2:14][::2]
'iigi a'
Give it one more go and try to figure out what the result of
>>> s = "Slicing is easy!"
>>> s[2:14:3]
# ..?
is.
To help you with that, here's a figure that schematises the slice above:
This is why we get
>>> s = "Slicing is easy!"
>>> s[2:14:3]
'inie'
Much like with the start and stop parameters, the step parameter can be omitted.
This is what we do when we write a slice of the form [start:stop]
:
by not including the step
parameter we use its default value, which is 1.
This is the same as writing [start:stop:]
,
which is also the same as writing [start:stop:1]
.
In short, all these three slices are equivalent:
s[start:stop]
;s[start:stop:]
; ands[start:stop:1]
.Here is just one example:
>>> s = "Slicing is easy!"
>>> s[2:14]
'icing is eas'
>>> s[2:14:]
'icing is eas'
>>> s[2:14:1]
'icing is eas'
This is exactly the same as with range
.
When the third argument isn't specified, it is taken to be 1 by default.
We have seen how a positive step parameter behaves,
now we will see how a negative one does.
This is where things really get confusing, and at this point it really is
easier to understand how the slicing works if you are comfortable with
how range
works with three arguments.
When you specify a slice with s[start:stop:step]
,
you will get back the elements of s
that are in the indices pointed to
by range(start, stop, step)
.
If step
is negative, then the range
function will be counting
from start
to stop
backwards.
This means that start
needs to be larger than stop
,
otherwise there is nothing to count.
For example, range(3, 10)
gives the integers 3 to 9.
If you want the integers 9 to 3 you can use the step -1,
but you also need to swap the start
and stop
arguments.
Not only that, but you also need to tweak them a bit.
The start
argument is the first number that is included
in the result and the stop
argument is the first number
that isn't, so if you want the integers from 9 to 3,
counting down, you need the start
argument to be 9
and the stop
argument to be 2
:
>>> list(range(3, 10))
[3, 4, 5, 6, 7, 8]
>>> list(range(3, 10, -1)) # You can't start at 3 and count *down* to 10.
[]
>>> list(range(10, 3, -1)) # Start at 10 and stop right before 3.
[10, 9, 8, 7, 6, 5, 4]
>>> list(range(9, 2, -1)) # Start at 9 and stop right before 2
[9, 8, 7, 6, 5, 4, 3]
If you are a bit confused, that is normal.
Take your time to play around with range
and get a feel for how this works.
Using the range
results from above and the figure below,
you should be able to figure out why these slices return these values:
>>> s[3:10]
'cing is'
>>> s[3:10:-1]
''
>>> s[10:3:-1]
' si gni'
>>> s[9:2:-1]
'si gnic
Use the range
results above and this figure to help you out:
If you want to use a negative range that is different from -1
,
the same principle applies: the start
parameter of your string
slice should be larger than the stop
parameter (so that you can count down
from start
to stop
) and then the absolute value will tell you how
many elements you skip at a time.
Take your time to work these results out:
>>> s = 'Slicing is easy!'
>>> s[15:2:-1]
'!ysae si gnic'
>>> s[15:2:-2]
'!ses nc'
>>> s[15:2:-3]
'!asgc'
>>> s[15:2:-4]
'!e c'
An important remark is due: while range
accepts negative integers
as the start
and end
arguments and interprets those as the actual
negative numbers, remember that slicing also accepts negative numbers
but those are interpreted in the context of the sequence you are slicing.
What is the implication of this?
It means that if step
is negative in the slice s[start:stop:step]
,
then start
needs to refer to an element that is to the right of
the element referred to by stop
.
I will give you an explicit example of the type of confusion that the above remark is trying to warn you about:
>>> s = 'Slicing is easy!'
>>> list(range(2, -2, -1))
[2, 1, 0, -1]
>>> s[2:-2:-1]
''
>>> s[2]
'i'
>>> s[-2]
'y'
Notice how range(2, -2, -1)
has four integers in it but s[2:-2:-1]
is an empty slice.
Why is that?
Because s[2]
is the first “i” in s
, while s[-2]
is the “y”
close to the end of the string.
Using a step of -1
would have us go from the “i” to the “y”,
but going right to left...
If you start at the “i” and go left, you reach the beginning of the string,
not the “y”.
Perhaps another way to help you look at this is if you recall
that s[-2]
is the same as s[len(s)-2]
,
which in this specific case is s[14]
.
If we take the piece of code above and replace all the -2
with 14
,
it should become clearer why the slice is empty:
>>> s = 'Slicing is easy!'
>>> list(range(2, 14, -1))
[]
>>> s[2:14:-1]
''
>>> s[2]
'i'
>>> s[14]
'y'
Another possible way to get you more comfortable with these
negative steps is if you notice the relationship between
slices with a step of the form -n
and two consecutive
slices with steps -1
and n
:
>>> s = 'Slicing is easy!'
>>> s[14:3:-2]
'ya igi'
>>> s[14:3:-1]
'ysae si gni'
>>> s[14:3:-1][::2]
'ya igi'
We can take this even further, and realise that the start and stop parameters are used to shorten the sequence, and that the step parameter is only then used to skip elements:
>>> s = 'Slicing is easy!'
>>> s[14:3:-2]
'ya igi'
>>> s[4:15] # Swap `start` and `stop` and add 1...
'ing is easy'
>>> s[4:15][::-1] # ...then reverse...
'ysae si gni'
>>> s[4:15][::-1][::2] # ...then pick every other element.
'ya igi'
For the sake of completeness, let's just briefly mention what happens if you use 0 as the step parameter, given that we have taken a look at strictly positive steps and strictly negative steps:
>>> s = "Slicing is easy!"
>>> s[::0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: slice step cannot be zero
Using 0 as the step gives a ValueError
,
and that is all there is to it.
Did you know that the Python Standard Library (PSL) has around 6000 usages of sequence slicing, but less than 500 of those make use of the step parameter? That means that, in the PSL, only around 8.33% of the slicing operations make use of the step. (Rough figures for the PSL of Python 3.9.2 on my Windows machine.)
If I had to guess, I would say there are two main reasons that explain why only a “small” percentage of all the slices make use of the step parameter:
For those reasons, it is recommendable that you do not get overexcited about slices and force your data to be in such a way that slices are the best way to get to the data.
For example, do not store colour names and their hexadecimal values
in an alternating fashion just so that you can use [::2]
and [1::2]
to access them.
However, if – for some reason – you receive data in this format,
it is perfectly acceptable for you to split the data with two slices:
# Assume we got `colours` in this format from an API or some other place...
>>> colours = ["red", "#ff0000", "green", "#00ff00", "blue", "#0000ff"]
>>> names = colours[::2]
>>> names
['red', 'green', 'blue']
>>> hexs = colours[1::2]
>>> hexs
['#ff0000', '#00ff00', '#0000ff']
Slices with three parameters tend to be dense and hard to parse with your eyes,
given that they are enclosed in []
and then have ::
separating the parameters.
If you write a slice of the form s[a:b:c]
,
you can expect the readers of your code to have to pause for a bit and understand
what is going on.
For that matter, when you write a long or complex slice, first consider reworking
the code so that you don't have to write a long or complex slice.
But if you do end up writing one, you should probably comment
your slice explaining what is going on.
I had a look at how the Python Standard Library makes use of slicing with three
parameters, and I found this nice example taken from the source code of the
dataclasses
module:
# From Lib/dataclasses.py, Python 3.9.2
def _process_class(cls, init, repr, eq, order, unsafe_hash, frozen):
# [code deleted for brevity]
# Find our base classes in reverse MRO order, and exclude
# ourselves. In reversed order so that more derived classes
# override earlier field definitions in base classes. As long as
# we're iterating over them, see if any are frozen.
any_frozen_base = False
has_dataclass_bases = False
for b in cls.__mro__[-1:0:-1]:
# ...
# [code deleted for brevity]
Notice that on top of that for
loop using a slice,
there are 4 lines of comments, and 3 of them are addressing what that
slice is doing: why the step parameter is -1
, because we want to
“find our base classes in reverse order”,
and then it explains why we the start and stop parameters are -1
and 0
,
respectively, as it says “and exclude ourselves”.
If we try that slice with a simpler sequence, we can see that [-1:0:-1]
does in fact reverse a sequence while skipping the first element:
>>> s = "Slicing is easy!"
>>> s[-1:0:-1]
'!ysae si gnicil'
Having taken a look at many different ways to slice and dice sequences, it is now time to mention a very important nuance about sequence slicing: when we create a slice, we are effectively creating a copy of the original sequence. This isn't necessarily a bad thing. For example, there is one idiomatic slicing operation that makes use of this behaviour.
I brought this up because it is important that you are aware of these subtleties, so that you can make informed decisions about the way you write your code.
An example of when this copying behaviour might be undesirable is when you have a really large list and you were considering using a slice to iterate over just a portion of that list. In this case, maybe using the slice will be a waste of resources because all you want is to iterate over a specific section of the list, and then you are done; you don't actually need to have that sublist later down the road.
In this case, what you might want to use is the
islice
function from the
itertools
module, that creates an iterator that allows you to iterate
over the portion of the list that you care about.
Iterators are another awesome feature in Python, and I'll be exploring them in future Pydon'ts, so stay tuned for that!
A simple way for you to verify that slicing creates copies of the sliced sequences is as follows:
>>> l = [1, 2, 3, 4]
>>> l2 = l
>>> l.append(5) # Append 5 to l...
>>> l2 # ... notice that l2 also got the new 5,
# so l2 = l did NOT copy l.
[1, 2, 3, 4, 5]
>>> l3 = l[2:5] # Slice l into l3.
>>> l3
[3, 4, 5]
>>> l[3] = 42 # Change a value of l...
>>> l
[1, 2, 3, 42, 5] # ... the 4 was replaced by 42...
>>> l3
[3, 4, 5] # ... but l3 still contains the original 4.
Let us continue down this journey of mastering sequence slicing. So far we have been using slices to extract parts of our sequences, but slices can also be used to manipulate the contents of our sequences!
Manipulating sequences with slices can only be done for certain types of sequences, namely mutable sequences; that is, sequences that we can alter. Prime examples of mutable sequences are lists, and prime examples of immutable sequences are strings.
Say that l
is a list.
We are used to “regular” assignment,
>>> l = [1, 2, 3, 4]
and we are used to assigning to specific indices:
>>> l[2] = 30
>>> l
[1, 2, 30, 4]
So how about assigning to slices as well? That is perfectly fine!
>>> l[:2] = [10, 20] # Replace the first 2 elements of l.
>>> l
[10, 20, 30, 4]
>>> l[1::2] = [200, 400] # Swap elements in odd positions.
>>> l
[10, 200, 30, 400]
The two short examples above showed how to replace some elements with the same number of elements. However, with simpler slices you can also change the size of the original slice:
>>> l = [1, 2, 3, 4]
>>> l[:2] = [0, 0, 0, 0, 0]
>>> l
[0, 0, 0, 0, 0, 3, 4]
When you have a slicing assignment like that, you should read it as
“replace the slice on the left with the new sequence on the right”,
so the example above reads “swap the first two elements of l
with five zeroes”.
Notice that, if you use “extended slices” (slices with the step parameter), then the number of elements on the left and on the right should match:
>>> l = [1, 2, 3, 4]
>>> l[::2] # This slice has two elements in it...
[1, 3]
>>> l[::2] = [0, 0, 0, 0, 0] # ... and we try to replace those with 5 elements.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: attempt to assign sequence of size 5 to extended slice of size 2
The fact that you can assign to slices allows you to write some pretty beautiful things, if you ask me.
For example, as I was exploring the Python Standard Library,
I came across a slicing assignment gem inside the
urljoin
function.
urljoin
from the urllib.parse
module,
takes a base path and a relative path, and tries to combine the two
to create an absolute path.
Here is an example:
>>> import urllib.parse
>>> urllib.parse.urljoin("https://mathspp.com/blog/", "pydonts/zip-up")
'https://mathspp.com/blog/pydonts/zip-up'
I'm using urllib.parse.urljoin
to take the base URL for my blog and stitch
that together with a relative link that takes me to one of the Pydon'ts I have published.
Now let me show you part of the source code of that function:
# From Lib/urllib/parse.py in Python 3.9.2
def urljoin(base, url, allow_fragments=True):
"""Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter."""
# [code deleted for brevity]
# for rfc3986, ignore all base path should the first character be root.
if path[:1] == '/':
segments = path.split('/')
else:
segments = base_parts + path.split('/')
# filter out elements that would cause redundant slashes on re-joining
# the resolved_path
segments[1:-1] = filter(None, segments[1:-1])
Notice the slice assignment to segments[1:-1]
?
That segments
list contains the different portions of the two
URLs I give the urljoin
function, and then the
filter
function
is used to filter out the parts of the URL that are empty.
Let me edit the source code of urljoin
to add two print statements to it:
# From Lib/urllib/parse.py in Python 3.9.2
def urljoin(base, url, allow_fragments=True):
"""Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter."""
# [code deleted for brevity]
# for rfc3986, ignore all base path should the first character be root.
if path[:1] == '/':
segments = path.split('/')
else:
segments = base_parts + path.split('/')
# filter out elements that would cause redundant slashes on re-joining
# the resolved_path
print(segments)
segments[1:-1] = filter(None, segments[1:-1])
print(segments)
Now let me run the same example:
>>> import urllib.parse
>>> urllib.parse.urljoin("https://mathspp.com/blog/", "pydonts/zip-up")
['', 'blog', '', 'pydonts', 'zip-up'] # First `print(segments)`
['', 'blog', 'pydonts', 'zip-up'] # <----------- segments has one less '' in it!
'https://mathspp.com/blog/pydonts/zip-up'
We can take the result of the first print and run the filter by hand:
>>> segments = ['', 'blog', '', 'pydonts', 'zip-up']
>>> segments[1:-1]
['blog', '', 'pydonts']
>>> list(filter(None, segments[1:-1]))
['blog', 'pydonts']
>>> segments[1:-1] = filter(None, segments[1:-1])
>>> segments
['', 'blog', 'pydonts', 'zip-up']
So this was a very interesting example usage of slice assignment. It is likely that you won't be doing something like this very frequently, but knowing about it means that when you do, you will be able to write that piece of code beautifully.
If you can assign to slices, what happens if you assign the empty list []
to a slice?
>>> l = [1, 2, 3, 4]
>>> l[:2] = [] # Replace the first two elements with the empty list.
>>> l
[3, 4]
If you assign the empty list to a slice, you are effectively deleting those elements
from the list.
You can do this by assigning the empty list, but you can also use the del
keyword
for the same effect:
>>> l = [1, 2, 3, 4]
>>> del l[:2]
>>> l
[3, 4]
Now that we have learned some more cool features on sequence slicing, it is time to see how these features are better used and how they show up in their more idiomatic forms.
Idiomatic code is code that you can take a look at and read it for what it does, as a whole, without having to reason about each little piece individually... Think of it like "normal" reading: when you were learning, you had to read character by character, but now you grasp words as a whole. With enough practice, these idiomatic pieces of code become recognisable as a whole as well.
A simple slice that you may want to keep on the back of your mind is the slice that lets you
access all the elements in the even positions of a sequence.
That slice is [::2]
:
>>> l = ["even", "odd", "even", "odd", "even"]
>>> l[::2]
['even', 'even', 'even']
Similarly, l[1::2]
gives you the odd positions:
>>> l = ["even", "odd", "even", "odd", "even"]
>>> l[1::2]
['odd', 'odd']l = ["even", "odd", "even", "odd", "even"]
l[1::2]
s[::-1]
A slice with no start and stop parameters and a -1
in the step is a very common slicing pattern.
In fact, there are approximately 100 of these slices in the Python Standard Library,
which is roughly one third of all the slices that make use of the step parameter.
s[::-1]
should be read as “the sequence s
, but reversed”.
Here is a simple example:
>>> s = "Slicing is easy!"
>>> s[::-1]
'!ysae si gnicilS'
What is noteworthy here, and related to the previous remark about slices creating copies,
is that sometimes you don't want to copy the whole thing to reverse your sequence;
for example, if all you want to do is iterate over the sequence in reverse order.
When that is the case, you might want to just use the
reversed
built-in function.
This function takes a sequence and allows you to iterate over the sequence in reverse order,
without paying the extra memory cost of actually copying the whole sequence.
l[:]
or l[::]
If a slice makes a copy, that means that a slice is a very clean way to copy a sequence!
The slices [:]
and [::]
select whole sequences, so those are primes ways to copy
a sequence – for example, a list – when you really want to create copies.
Deep and shallow copies, the distinction between things that are passed by reference and things that are passed by value, etc, is a big discussion in itself.
It is easy to search the Python Standard Library for usage examples of this idiom
(and for the ones before as well), so I will just leave you with one,
from the argparse
module, that contains a helper function named _copy_items
(I deleted its comments):
# From Lib/argparse.py in Python 3.9.2
def _copy_items(items):
if items is None:
return []
if type(items) is list:
return items[:]
import copy
return copy.copy(items)
Notice how the idiom fits in so nicely with the function name:
the function says it copies the items.
What does the function do?
If the items
argument is a list, then it returns a copy of it!
So l[:]
and l[::]
should be read as “a copy of l
”.
This idiom also explains the thumbnail image in the beginning of the article.
del l[:]
Another idiom that makes use of the slice [:]
, but with something extra,
is the idiom to delete the contents of a list.
Think of l[:]
as “opening up l
”, and then del l[:]
reads
“open up l
to delete its contents”.
This is the same as doing l[:] = []
but it is not the same
as doing l = []
nor is it the same as doing del l
.
It is easy to see why del l
is different from the others:
del l
means that the name l
is no longer in use:
>>> l = [1, 2, 3, 4]
>>> del l
>>> l
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'l' is not defined
whereas the idiom just clears the list up:
>>> l = [1, 2, 3, 4]
>>> del l[:]
>>> l
[]
What might be trickier to understand is why del l[:]
and l[:] = []
are different from l = []
.
I'll show you an example that shows they are clearly different,
and then I will leave it up to you to decide whether or not you want
to burn enough neurons to understand what is going on.
First, let me use l[:] = ...
>>> l = l_shallow = [1, 2, 3]
>>> l_shallow is l
True
>>> j = []
>>> l[:] = j
>>> l
[]
>>> l_shallow
[]
>>> l is j
False
>>> l_shallow is l
True
and now let me compare it with l = ...
>>> l = l_shallow = [1, 2, 3]
>>> l_shallow is l
True
>>> j = []
>>> l = j
>>> l
[]
>>> l_shallow
[1, 2, 3]
>>> l is j
True
>>> l_shallow is l
False
You can see above that the results of comparisons like l is j
and l_shallow is l
,
as well as the contents of l_shallow
, change in the two examples.
Therefore, the two things cannot be the same.
What is going on?
Well, deep and shallow copies, and references to mutable objects, and the like, are at fault!
I'll defer a more in-depth discussion of this for a later Pydon't,
as this one has already become quite long.
Just remember, l[:] = []
and del l[:]
can be read as “delete the contents of l
”.
Here's the main takeaway of this Pydon't, for you, on a silver platter:
“Slices are really powerful and they are an essential tool to master for when you work with sequences, like strings and lists.”
This Pydon't showed you that:
1
;range
function;0
is not a valid step parameter for a slice;[start:stop:step]
can be really hard to read;del
keyword to delete slices of mutable
sequences, or you can also assign the empty sequence to those slices
for the same effect;s[::2]
and s[1::2]
are “elements in even positions of s
”
and “elements in odd positions of s
”, respectively;s[::-1]
is “s
, but reversed”;l[:]
and l[::]
are “a copy of l
”; anddel l[:]
is “delete the contents of l
” or “empty l
”,
which is not the same as doing l = []
.If you liked this Pydon't be sure to leave a reaction below and share this with your friends and fellow Pythonistas. Also, don't forget to subscribe to the newsletter so you don't miss a single Pydon't!
+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book “Pydon'ts” teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here 🐍🚀.
slice
, https://docs.python.org/3/library/functions.html#slice [last accessed 20-04-2021];range
, https://docs.python.org/3/library/functions.html#func-range [last accessed 03-05-2021];filter
, https://docs.python.org/3/library/functions.html#filter [last accessed 11-05-2021];reversed
, https://docs.python.org/3/library/functions.html#reversed [last accessed 11-05-2021];dataclasses
, https://docs.python.org/3/library/dataclasses.html [11-05-2021];itertools
, islice
, https://docs.python.org/3/library/itertools.html#itertools.islice [11-05-2021];urllib.parse
, urljoin
, https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urljoin [11-05-2021];