Learn how strings are ordered and compared in Python.
How are strings ordered in Python ๐?
How do the operators <, <=, >=, and >, work for strings?
You can find a couple of examples below.
Can you tell what the outputs will be?
>>> "bar" > "acorn"
# ???
>>> "dice" < "dolphin"
# ???
>>> "car" > "carnivore"
# ???
>>> "Rice" <= "corn"
# ???
>>> "10" < "2.5"
# ???
>>> ".py" < "_py"
# ???
Have you ever seen a dictionary? The book kind?
That book with thousands of words and their meanings.
How are words ordered in there?
In alphabetical order, right?
First, we have all the words starting with A.
Then, all the words starting with B.
And so on...
That's why I like looking at str1 < str2
and reading:
โDoes str1
come before str2
in the dictionary?โ
And I look at str1 > str2
and I read:
โDoes str1
come after str2
in the dictionary?"
With this in mind, what are the results of the comparisons below?
>>> "bar" > "acorn"
# ???
>>> "dice" < "dolphin"
# ???
>>> "car" > "carnivore"
# ???
So, "bar" > "acorn"
is True
.
What about "dice" > "dolphin"
?
Think about it.
They start with the same letter, D.
But then, "dice" has an I and "dolphin" has an O.
I comes before O, so "dice" comes before "dolphin".
What about "car" > "carnivore"
?
But then, one word ends and the other continues...
What comes first in the dictionary?
The short one!
So "car" > "carnivore"
is actually false.
Now we are ready to tackle the next set of examples:
>>> "Rice" <= "corn"
# ???
>>> "10" < "2.5"
# ???
>>> ".py" < "_py"
# ???
What is the result of "Rice" <= "corn"
?
Well, now we can't just think about dictionaries.
Why is that?
Because "Rice"
and "corn"
are capitalised differently.
So, we need to know what comes first:
The thing that comes first is actually the upper case R!
Why?
Python can compare any two strings.
Even strings that don't have letters, for example.
And all those strings need to be comparable to each other.
So, the solution that people came up with is to attach an integer to each character.
Think of it like an id.
Then, when comparing characters, we compare the associated ids instead.
In reality, the id of each character is its Unicode code point...
So, in other words, Python didn't come up with random ids for all the characters.
It actually borrows those ids from the Unicode standard.
How can you check the codepoint of a character?
With the ord
built-in:
>>> ord("R")
82
>>> ord("c")
99
>>> ord("R") < ord("c")
True
>>> "Rice" < "corn"
True
With this in mind, you should be able to answer the next examples.
Now, let us tackle the comparison "10" < "2.5"
.
In the Unicode standard, the digits 0 to 9 have consecutive code points.
So, when comparing "10"
to "2.5"
, we start with comparing the "1"
and the "2"
:
The 1 comes before the 2 in the Unicode standard, so "10" < "2.5"
evaluates to True
.
Attention: this shows that comparing strings that contain numbers is different from comparing the numbers themselves.
This can be misleading, because some times the results agree:
>>> "34" < "47" # ... but "34" < "4" is False
True
>>> "-56" > "-105" # ... but "-56" > "-58" is False
True
>>> "2.5" < "23.4" # ... but "2.5" < "10" is False
True
TL;DR:
ord
returns the code point of a char.But wait!
There is one example left!
Can you tell me what the result is?
And can you justify it with the help of the built-in ord
?
Give it a shot ๐
This article was generated automatically from this thread I published on Twitter @mathsppblog.
+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book โPydon'tsโ teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here ๐๐.