Learn how to find text patterns and replace them with dynamic content using regex.
The other day I was working on my problems book and I needed to update a markdown file to number the headings. In short, I had a markdown file like this:
# Problems
## Dancing triangle
...
## Bag full of numbers
...
## Quarrel in the Shire
...
I wanted to find all H2 headings and number them, so I'd end up with this file:
# Problems
## 01 β Dancing triangle
...
## 02 β Bag full of numbers
...
## 03 β Quarrel in the Shire
...
I thought about doing this manually but I'd need to do this to 2 documents, each with thousands of lines and 64 headings, so I decided to use a bit of Python and regex to do it.
The module re
, from the Python standard library, lets you do string replacements with the function re.sub
, which needs 3 parameters:
re.sub
is like the string method str.replace
on steroids!
In my case, the pattern I was looking for was this:
^## (.*)$
This looks for a ##
at the beginning of the line, followed by a space, and then the (.*)$
matches everything else until the end of the line.
We can see this in action if we use re.findall
and a dummy string:
>>> import re
>>> string = '# Problems\n\n## Dancing triangle\n\n...\n\n## Bag full of numbers\n\n...\n\n## Quarrel in the Shire\n\n...'
>>> re.findall("^## (.*)$", string, flags=re.MULTILINE)
[
'Dancing triangle',
'Bag full of numbers',
'Quarrel in the Shire'
]
The flag re.MULTILINE
was used so that the anchors ^
and $
matche the beginning and end of each line, respectively, instead of the beginning and end of the string.
(Go to regex101 (an online regex playground), paste the regular expression ^## (.*)$
in the top, middle bar, and copy and paste the first version of the # Problems
markdown in the big, central text area.)
The next thing I wanted was to be able to replace each title with a number followed by β
and then itself!
By using group references, adding the β
before the title is easy:
>>> print(
... re.sub("^## (.*)$", r"## xx β \1", string, flags=re.MULTILINE)
... )
# Problems
## xx β Dancing triangle
...
## xx β Bag full of numbers
...
## xx β Quarrel in the Shire
...
The βdifficultβ part is adding the number that must be incremented each time we add it.
Thankfully, the function re.sub
has a trick up its sleeve!
The second parameter of the function re.sub
, which is the replacement, can be a function.
This function must have a single parameter, which is the match object, and must return a string, which is the replacement for the given match.
Thus, by using a counter variable, I can add these increasing IDs:
>>> counter = 0
>>> def replacer(match):
... global counter
... counter += 1
... return f"## {counter:02} β {match.group(1)}"
...
>>> print(
... re.sub("^## (.*)$", replacer, string, flags=re.MULTILINE)
... )
# Problems
## 01 β Dancing triangle
...
## 02 β Bag full of numbers
...
## 03 β Quarrel in the Shire
...
And that was it! In 5 minutes, I was done! I numbered 128 chapters with a regular expression and a couple of lines of Python code and saved myself roughly 60 minutes of tedious manual work!
If you've ever used regular expressions to help you with anything, let me know in the comments below!
+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book βPydon'tsβ teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here ππ.
re.sub
, http://docs.python.org/3/library/re.html#re.sub [last accessed 09-01-2024];