
    
        
        
        
                
        
        
        
                
        
        
        
                
        
        
        
                
        
        
        
                
        
        
        
                
        
        
        
                
        
        
        
            
{"version":"https:\/\/jsonfeed.org\/version\/1","title":"mathspp.com feed","home_page_url":"https:\/\/mathspp.com\/blog\/tags\/regex","feed_url":"https:\/\/mathspp.com\/blog\/tags\/regex.json","description":"Stay up-to-date with the articles on mathematics and programming that get published to mathspp.com.","author":{"name":"Rodrigo Gir\u00e3o Serr\u00e3o"},"items":[{"title":"Remove extra spaces","date_published":"2026-03-02T13:39:00+01:00","id":"https:\/\/mathspp.com\/blog\/remove-extra-spaces","url":"https:\/\/mathspp.com\/blog\/remove-extra-spaces","content_html":"<p>Learn how to remove extra spaces from a string using regex, string splitting, a fixed point, and <code>itertools.groupby<\/code>.<\/p>\n\n<p>In this article you'll learn about three different ways in which you can remove extra spaces from the middle of a string.\nThat is, you'll learn how to go from a string like<\/p>\n<pre><code class=\"language-py\">string = \"This is  a   perfectly    normal     sentence.\"<\/code><\/pre>\n<p>to a string like<\/p>\n<pre><code class=\"language-py\">string = \"This is a perfectly normal sentence.\"<\/code><\/pre>\n<h2 id=\"the-best-solution-to-remove-extra-spaces-from-a-string\">The best solution to remove extra spaces from a string<a href=\"#the-best-solution-to-remove-extra-spaces-from-a-string\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The best solution for this task, which is both readable and performant, uses the regex module <code>re<\/code>:<\/p>\n<pre><code class=\"language-py\">import re\n\ndef remove_extra_spaces(string):\n    return re.sub(\" {2,}\", \" \", string)<\/code><\/pre>\n<p>The function <code>sub<\/code> can be used to <strong>sub<\/strong>stitute a pattern for a replacement you specify.\nThe pattern <code>\" {2,}\"<\/code> finds runs of 2 or more consecutive spaces and replaces them with a single space.<\/p>\n<h2 id=\"string-splitting\">String splitting<a href=\"#string-splitting\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>Using the string method <code>split<\/code> can also be a good approach:<\/p>\n<pre><code class=\"language-py\">def remove_extra_spaces(string):\n    return \" \".join(string.split(\" \"))<\/code><\/pre>\n<p>If you're using string splitting, you'll want to provide the space <code>\" \"<\/code> as an argument.\nIf you call <code>split<\/code> with no arguments, you'll be splitting on <em>all<\/em> whitespace, which is not what you want if you have newlines and other whitespace characters you should preserve.<\/p>\n<p>This solution is great, except it doesn't work:<\/p>\n<pre><code class=\"language-py\">print(remove_extra_spaces(string))\n# 'This is  a   perfectly    normal     sentence.'<\/code><\/pre>\n<p>The problem is that splitting on the space will produce a list with empty strings:<\/p>\n<pre><code class=\"language-py\">print(string.split(\" \"))\n# ['This', 'is', '', 'a', '', '', 'perfectly', '', '', '', 'normal', '', '', '', '', 'sentence.']<\/code><\/pre>\n<p>These empty strings will be joined back together and you'll end up with the same string you started with.\nFor this to work, you'll have to filter the empty strings first:<\/p>\n<pre><code class=\"language-py\">def remove_extra_spaces(string):\n    return \" \".join(filter(None, string.split(\" \")))<\/code><\/pre>\n<p>Using <code>filter(None, ...)<\/code> filters out the <a href=\"\/blog\/pydonts\/truthy-falsy-and-bool\">Falsy<\/a> strings, so that the final joining operation only joins the strings that matter.<\/p>\n<p>This solution has a problem, though, in that it will completely remove any leading or trailing whitespace, which may or may not be a problem.<\/p>\n<p>The two solutions presented so far &mdash; using regular expressions and string splitting &mdash; are pretty reasonable.\nBut they're also boring.\nYou'll now learn about two other solutions.<\/p>\n<h2 id=\"replacing-spaces-until-you-hit-a-fixed-point\">Replacing spaces until you hit a fixed point<a href=\"#replacing-spaces-until-you-hit-a-fixed-point\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>You can think about the task of removing extra spaces as the task of replacing extra spaces by the empty string.\nAnd if you think about doing string replacements, you should think about the string method <code>replace<\/code>.<\/p>\n<p>You can't do something like <code>string.replace(\" \", \"\")<\/code>, otherwise you'd remove <em>all<\/em> spaces, so you have to be a bit more careful:<\/p>\n<pre><code class=\"language-py\">def remove_extra_spaces(string):\n    while True:\n        new_string = string.replace(\"  \", \" \")\n        if new_string == string:\n            break\n        string = new_string\n    return string<\/code><\/pre>\n<p>You can replace two consecutive spaces by a single space, and you repeat this operation until nothing changes in your string.<\/p>\n<p>The idea of running a function until its output doesn't change is common enough in maths that they call...<\/p>","summary":"Learn how to remove extra spaces from a string using regex, string splitting, a fixed point, and itertools.groupby.","date_modified":"2026-03-02T16:25:02+01:00","tags":["programming","python","algorithms","regex"],"image":"\/user\/pages\/02.blog\/remove-extra-spaces\/thumbnail.webp"},{"title":"TIL #131 \u2013 Change casing in search &amp; replace","date_published":"2025-09-03T00:51:00+02:00","id":"https:\/\/mathspp.com\/blog\/til\/change-casing-in-search-and-replace","url":"https:\/\/mathspp.com\/blog\/til\/change-casing-in-search-and-replace","content_html":"<p>Today I learned you can change the casing of matched groups when doing a search &amp; replace in VS Code with regex.<\/p>\n\n<p>VS Code has a search &amp; replace feature that lets you use regex to look for patterns and then reference groups in the replacement...\nBut it lets you do something else that's really cool.<\/p>\n<h2 id=\"changing-casing-with-special-sequences\">Changing casing with special sequences<a href=\"#changing-casing-with-special-sequences\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>When you are replacing groups, you can use special sequences to change the casing of the group you're inserting, according to the following table:<\/p>\n<table>\n<thead>\n<tr>\n<th>Sequence<\/th>\n<th>Effect<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><code>\\u<\/code><\/td>\n<td>Uppercase the first letter<\/td>\n<\/tr>\n<tr>\n<td><code>\\U<\/code><\/td>\n<td>Uppercase the whole group<\/td>\n<\/tr>\n<tr>\n<td><code>\\l<\/code><\/td>\n<td>Lowercase the first letter<\/td>\n<\/tr>\n<tr>\n<td><code>\\L<\/code><\/td>\n<td>Lowercase the whole group<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The picture below shows an example of a search &amp; replace operation where I looked for the text \u201call in one go\u201d.\nI enclosed that in a regex group and I'm replacing it with the pattern <code>\\U$1<\/code>, which means the replacement would be the all-uppercase string \u201cALL IN ONE GO\u201d:<\/p>\n<figure class=\"image-caption\"><img title=\"Search and replace with \\U.\" alt=\"Screenshot of VS Code previewing a search and replace operation where the special sequence \\U was used to transform the text being replaced.\" src=\"\/user\/pages\/02.blog\/04.til\/131.change-casing-in-search-and-replace\/_uppercase.webp\"><figcaption class=\"\">Search and replace with \\U.<\/figcaption><\/figure>\n<p>Pretty nifty, right?<\/p>\n<h2 id=\"the-same-thing-in-python\">The same thing in Python<a href=\"#the-same-thing-in-python\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The module <code>re<\/code> in Python supports searching and replacing with the function <code>re.sub<\/code> but it doesn't let you do the same case-changing operations with special sequences.\nInstead, you have to use the fact that <a href=\"https:\/\/mathspp.com\/blog\/dynamic-string-replacements-with-regex\"><code>re.sub<\/code> supports dynamic string replacements<\/a> and then implement the logic yourself.<\/p>\n<p>First, you can't use the string methods <code>upper<\/code> and <code>lower<\/code> directly for <code>\\U<\/code> and <code>\\L<\/code>; you have to grab the text from the object <code>Match<\/code>.\nYou also have to pick the string apart to implement the <code>\\u<\/code> and <code>\\l<\/code>:<\/p>\n<pre><code class=\"language-py\">def all_upper(match):  # \\U\n    return match.group(0).upper()\n\ndef first_upper(match):  # \\u\n    s = match.group(0)\n    return s[0].upper() + s[1:]\n\ndef all_lower(match):  # \\L\n    return match.group(0).lower()\n\ndef first_lower(match):  # \\l\n    s = match.group(0)\n    return s[0].lower() + s[1:]<\/code><\/pre>\n<p>Here's an example:<\/p>\n<pre><code class=\"language-py\"># E.g., same behaviour as \\U$0 in VS Code:\nre.sub(\n    \"all in one go\",  # pattern to search for\n    all_upper,  # dynamic replacement\n    \"... all in one go ...\",  # source text\n)  # -&gt; '... ALL IN ONE GO ...'<\/code><\/pre>\n<table>\n<thead>\n<tr>\n<th>VS Code sequence<\/th>\n<th>Python function<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><code>\\u<\/code><\/td>\n<td><code>first_upper<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>\\U<\/code><\/td>\n<td><code>all_upper<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>\\l<\/code><\/td>\n<td><code>first_lower<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>\\L<\/code><\/td>\n<td><code>all_lower<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>","summary":"Today I learned you can change the casing of matched groups when doing a search &amp; replace in VS Code with regex.","date_modified":"2025-10-20T22:34:56+02:00","tags":["regex","productivity","vscode","python"],"image":"\/user\/pages\/02.blog\/04.til\/131.change-casing-in-search-and-replace\/thumbnail.webp"},{"title":"TIL #128 \u2013 Matching prefixes and suffixes with regex","date_published":"2025-07-23T13:17:00+02:00","id":"https:\/\/mathspp.com\/blog\/til\/matching-prefixes-and-suffixes-with-regex","url":"https:\/\/mathspp.com\/blog\/til\/matching-prefixes-and-suffixes-with-regex","content_html":"<p>Today I learned how to use <code>\\b<\/code> and <code>\\B<\/code> to match prefixes and suffixes with regex.<\/p>\n\n<h2 id=\"matching-prefixes-and-suffixes-with-regular-expressions\">Matching prefixes and suffixes with regular expressions<a href=\"#matching-prefixes-and-suffixes-with-regular-expressions\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The special characters <code>\\b<\/code> and <code>\\B<\/code> go hand in hand in regular expressions:<\/p>\n<ul>\n<li><code>\\b<\/code> matches at word boundaries; and<\/li>\n<li><code>\\B<\/code> matches inside words.<\/li>\n<\/ul>\n<p>For these two characters, the default \u201cword characters\u201d are alphanumeric characters and the underscore.<\/p>\n<p>By combining <code>\\b<\/code> and <code>\\B<\/code> at the beginning or end of a pattern, you get to match standalone words, prefixes, suffixes, and infixes!<\/p>\n<p>The table below shows some examples of sentences that all contain the substring <code>\"legal\"<\/code> along the rows.\nThe columns show whether different patterns that use the special characters <code>\\b<\/code> and <code>\\B<\/code> would match against those sentences.<\/p>\n<table>\n<thead>\n<tr>\n<th><\/th>\n<th><code>r\"legal\"<\/code><\/th>\n<th><code>r\"\\blegal\\b\"<\/code><\/th>\n<th><code>r\"\\blegal\\B\"<\/code><\/th>\n<th><code>r\"\\Blegal\\b\"<\/code><\/th>\n<th><code>r\"\\Blegal\\B\"<\/code><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><code>\"Criticism is legal.\"<\/code><\/td>\n<td>\u2705<\/td>\n<td>\u2705<\/td>\n<td>\u274c<\/td>\n<td>\u274c<\/td>\n<td>\u274c<\/td>\n<\/tr>\n<tr>\n<td><code>\"He's legally blind.\"<\/code><\/td>\n<td>\u2705<\/td>\n<td>\u274c<\/td>\n<td>\u2705<\/td>\n<td>\u274c<\/td>\n<td>\u274c<\/td>\n<\/tr>\n<tr>\n<td><code>\"Theft is illegal.\"<\/code><\/td>\n<td>\u2705<\/td>\n<td>\u274c<\/td>\n<td>\u274c<\/td>\n<td>\u2705<\/td>\n<td>\u274c<\/td>\n<\/tr>\n<tr>\n<td><code>\"He obtained that illegally.\"<\/code><\/td>\n<td>\u2705<\/td>\n<td>\u274c<\/td>\n<td>\u274c<\/td>\n<td>\u274c<\/td>\n<td>\u2705<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This was the tip 97 I sent to <a href=\"\/drops\">the Python drops \ud83d\udc0d\ud83d\udca7<\/a> newsletter, so if you'd like to get a daily drop of Python knowledge, make sure to <a href=\"\/drops\">sign-up now<\/a>!<\/p>","summary":"Today I learned how to use \\b and \\B to match prefixes and suffixes with regex.","date_modified":"2025-10-20T22:34:56+02:00","tags":["regex","programming","python"],"image":"\/user\/pages\/02.blog\/04.til\/128.matching-prefixes-and-suffixes-with-regex\/thumbnail.webp"},{"title":"Regex in real life","date_published":"2025-06-08T17:34:00+02:00","id":"https:\/\/mathspp.com\/blog\/regex-in-real-life","url":"https:\/\/mathspp.com\/blog\/regex-in-real-life","content_html":"<p>This article shows how I have used regex in real life for all sorts of tasks.<\/p>\n\n<p>I love keeping track of real-life examples of the usage of certain features and regex is one of those things, so in this article you will find real-life examples of how\/where I used regular expressions to do all sorts of tasks.<\/p>\n<p>As I use regex in new and interesting ways, I will update this article to include those examples.<\/p>\n<h2 id=\"mass-reformatting-hints-to-some-exercises\">Mass-reformatting hints to some exercises<a href=\"#mass-reformatting-hints-to-some-exercises\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>I was working on some exercises for <a href=\"\/courses\/intermediate-python-course\">my intermediate Python course<\/a> and they all had this format:<\/p>\n<pre><code class=\"language-markdown\">#### Exercise&nbsp;?\nProblem statement goes here.\n\n*Hint: exercise hint goes here.*<\/code><\/pre>\n<p>I wanted to reformat this so that the hints weren't shown by default, so I wanted to have this:<\/p>\n<pre><code class=\"language-markdown\">#### Exercise&nbsp;?\nProblem statement goes here.\n\n&lt;details&gt;\n&lt;summary&gt;Hint:&lt;\/summary&gt;\n\nExercise hint goes here.\n\n&lt;\/details&gt;<\/code><\/pre>\n<p>I could easily search for <code>\\*Hint: (.*)\\*<\/code>, giving me all the lines with hints, and just as easily I could replace that with the <code>&lt;details&gt;<\/code>\/<code>&lt;summary&gt;<\/code> tags...\nBut then I would still have hundreds of lines to fix, because the first line of the problem hint was in lowercase in the original text and I wanted it uppercase now, since the hint is now a standalone sentence...<\/p>\n<p>Thankfully, <a href=\"\/blog\/til\/change-casing-in-search-and-replace\">VS Code allows you to do search &amp; replace operations that modify the casing of the matches<\/a>, so that's what I ended up doing.\nI didn't even need to bring out Python to do this. &#128517;<\/p>\n<h2 id=\"compiling-a-numbered-list-of-tip-titles\">Compiling a numbered list of tip titles<a href=\"#compiling-a-numbered-list-of-tip-titles\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>For <a href=\"\/drops\">my newsletter Python drops &#128013;&#128167;<\/a>, I keep a list of all previous tips titles so that people can see what they're missing out on.\nTo compile this list, I run a short script that traverses the directory where the tips are and finds their number and title.<\/p>\n<p>This is the rough directory structure:<\/p>\n<pre><code>|- 0001-zip_strict_true\n   |- tip.md\n|- 0002-string_casefold_case-insensitive_comparison\n   |- tip.md\n|- 0003_type_unions_with_pipe\n   |- tip.md\n...<\/code><\/pre>\n<p>Each directory contains some other files, but I only care about the files <code>**\/tip.md<\/code> for the purposes of this specific task.\nI want to check each file <code>tip.md<\/code> and extract the tip number and title.<\/p>\n<p>As an example, this is what the file <code>0003_type_unions_with_pipe\/tip.md<\/code> looks like:<\/p>\n<pre><code class=\"language-txt\">---\nthemes:\n    - \"typing\/type hints\"\n    - \"`isinstance`\"\n    - \"vertical bar `|`\"\n    - \"type unions\"\n---\n\n## 3 &ndash; Type unions with the vertical bar in `isinstance`\n\n...<\/code><\/pre>\n<p>Assuming all files <code>tip.md<\/code> have a title in the same format, I can use regex to extract them:<\/p>\n<pre><code class=\"language-py\">TITLE_PATTERN = re.compile(r\"(?m)^## (?P&lt;nr&gt;\\d{1,4}) &ndash; (?P&lt;title&gt;.*)$\")<\/code><\/pre>\n<p>This neat little pattern uses a couple of nice regex features.<\/p>\n<p>By using the inline flag <code>(?m)<\/code>, I make sure that the special characters <code>^<\/code> and <code>$<\/code> match the beginning and end of every new line, respectively, instead of only matching the beginning and end of the string.<\/p>\n<p>I also use named groups with the syntax <code>(?P&lt;group_name&gt;...)<\/code>, which allows me to refer to the parts of the match that I care about by name, instead of having to figure...<\/p>","summary":"This article shows how I have used regex in real life for all sorts of tasks.","date_modified":"2025-09-03T02:19:00+02:00","tags":["regex","python","programming","slice of life"],"image":"\/user\/pages\/02.blog\/regex-in-real-life\/thumbnail.webp"},{"title":"TIL #112 \u2013 re.Match.groupdict","date_published":"2025-01-22T13:56:00+01:00","id":"https:\/\/mathspp.com\/blog\/til\/re-match-groupdict","url":"https:\/\/mathspp.com\/blog\/til\/re-match-groupdict","content_html":"<p>Today I learned how I can use the method 'groupdict' from a regex match to get a dictionary with all named groups.<\/p>\n\n<h2 id=\"re-match-groupdict\"><code>re.Match.groupdict<\/code><a href=\"#re-match-groupdict\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>Python regular expressions support named groups, which are introduced with the flag <code>?P&lt;name&gt;<\/code>:<\/p>\n<pre><code class=\"language-py\">import re\n\ndate = re.compile(\n    r\"\"\"(?x)  # Use ?x for verbose regex\n    (?P&lt;year&gt;\\d{4})  # year = 4 digits, e.g., 2025\n    -\n    (?P&lt;month&gt;\\d{1,2})  # month = 1 or 2 digits, e.g., 01 or 1\n    -\n    (?P&lt;day&gt;\\d{1,2})  # day = 1 or 2 digits\n    \"\"\"\n)\n\nmatch = date.match(\"2025-01-22\")\nprint(match.group(\"year\"))  # 2025<\/code><\/pre>\n<p>If you have named groups, you can then use the method <code>groupdict<\/code> to get a dictionary with all groups and their matches:<\/p>\n<pre><code class=\"language-py\">match = date.match(\"2025-01-22\")\nprint(match.groupdict())  # {'year': '2025', 'month': '01', 'day': '22'}<\/code><\/pre>\n<p>This is the counterpart to <code>groups<\/code> that produces a tuple with all groups in the order they appear:<\/p>\n<pre><code class=\"language-py\">match = date.match(\"2025-01-22\")\nprint(match.groups())  # ('2025', '01', '22')<\/code><\/pre>\n<p>While <code>groups<\/code> shows the values of all groups, regardless of whether they're named or not, <code>groupdict<\/code> will only show named groups:<\/p>\n<pre><code class=\"language-py\">date2 = re.compile(r\"(\\d{4})-(\\d{1,2})-(\\d{1,2})\")\nmatch = date2.match(\"2025-01-22\")\nprint(match.groups())  # ('2025', '01', '22')\nprint(match.groupdict())  # {}<\/code><\/pre>","summary":"Today I learned how I can use the method &#039;groupdict&#039; from a regex match to get a dictionary with all named groups.","date_modified":"2025-10-20T22:34:56+02:00","tags":["programming","python","regex"],"image":"\/user\/pages\/02.blog\/04.til\/112.re-match-groupdict\/thumbnail.webp"},{"title":"Problem #066 \u2013 regex crossword","date_published":"2024-08-27T12:00:00+02:00","id":"https:\/\/mathspp.com\/blog\/problems\/regex-crossword","url":"https:\/\/mathspp.com\/blog\/problems\/regex-crossword","content_html":"<p>Can you solve this crossword where all hints are regular expressions?<\/p>\n\n<h2 id=\"problem-statement\">Problem statement<a href=\"#problem-statement\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<figure class=\"image-caption\"><img title=\"The regex crossword puzzle grid.\" alt=\"Puzzle in the shape of an hexagonal tiled region where cells are supposed to be filled in with characters from the alphabet according to hints given as regular expressions.\" src=\"\/user\/pages\/02.blog\/03.problems\/p066-regex-crossword\/_puzzle.webp\"><figcaption class=\"\">The regex crossword puzzle grid.<\/figcaption><\/figure>\n<p>At <a href=\"https:\/\/ep2024.europython.eu\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">EuroPython 2024<\/a> I watched a lightning talk about the art of puzzle solving.\nIn it, the speaker showed the regex crossword puzzle that you can see above, <a href=\"https:\/\/puzzles.mit.edu\/2013\/coinheist.com\/rubik\/a_regular_crossword\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">that I took from this URL<\/a>.\nYou are supposed to fill the hexagonal grid with letters so that all of the regular expressions shown match.<\/p>\n<p>If you don't know regular expressions, the puzzle only uses a simple subset of the syntax:<\/p>\n<ul>\n<li>literal character matches;<\/li>\n<li>wildcard matches with <code>.<\/code>;<\/li>\n<li>alternatives with <code>|<\/code>;<\/li>\n<li>the quantifiers <code>?<\/code>, <code>+<\/code>, and <code>*<\/code>;<\/li>\n<li>character sets with <code>[...]<\/code> and negated character sets with <code>[^...]<\/code>; and<\/li>\n<li>groups with <code>()<\/code> and group references with <code>\\1<\/code>, <code>\\2<\/code>, etc.<\/li>\n<\/ul>\n<p>You can look this up and then you will be able to solve the challenge.\nYou can also use <a href=\"https:\/\/regex101.com\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the site regex101<\/a> to help you check what each regular expression means.<\/p>\n<div class=\"notices blue\">\n<p>Give it some thought!<\/p>\n<\/div>\n<h2 id=\"solution\">Solution<a href=\"#solution\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>In the spirit of the puzzle hunting community, I will not share my solved grid here.\nIf you need help, feel free to <a href=\"mailto:rodrigo@mathspp.com\" class=\"mailto\">email me<\/a> and we can talk it over.<\/p>\n<p>P.S. if I understood correctly, in the puzzle hunting community you're not supposed to just fill in the grid.\nIn some way, somehow, you are supposed to be able to extract an English word or phrase from that filled puzzle without any extra information, and then you check that you got it correctly by <a href=\"https:\/\/puzzles.mit.edu\/2013\/coinheist.com\/rubik\/a_regular_crossword\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">\u201cchecking your answer spoiler-free\u201d in the website where I took the regex crossword from<\/a>\nI am still stuck in that step!<\/p>","summary":"Can you solve this crossword where all hints are regular expressions?","date_modified":"2025-07-23T16:49:02+02:00","tags":["mathematics","regex"],"image":"\/user\/pages\/02.blog\/03.problems\/p066-regex-crossword\/thumbnail.webp"},{"title":"Dynamic string replacements with regex","date_published":"2024-01-09T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/dynamic-string-replacements-with-regex","url":"https:\/\/mathspp.com\/blog\/dynamic-string-replacements-with-regex","content_html":"<p>Learn how to find text patterns and replace them with dynamic content using regex.<\/p>\n\n<p>The other day I was working on <a href=\"https:\/\/mathspp.gumroad.com\/l\/puzzles-riddles-problems\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">my problems book<\/a> and I needed to update a markdown file to number the headings.\nIn short, I had a markdown file like this:<\/p>\n<pre><code class=\"language-md\"># Problems\n\n## Dancing triangle\n\n...\n\n## Bag full of numbers\n\n...\n\n## Quarrel in the Shire\n\n...<\/code><\/pre>\n<p>I wanted to find all H2 headings and number them, so I'd end up with this file:<\/p>\n<pre><code class=\"language-md\"># Problems\n\n## 01 &ndash; Dancing triangle\n\n...\n\n## 02 &ndash; Bag full of numbers\n\n...\n\n## 03 &ndash; Quarrel in the Shire\n\n...<\/code><\/pre>\n<p>I thought about doing this manually but I'd need to do this to 2 documents, each with thousands of lines and 64 headings, so I decided to use a bit of Python and regex to do it.<\/p>\n<p>The module <code>re<\/code>, from the Python standard library, lets you do string replacements with <a href=\"https:\/\/docs.python.org\/3\/library\/re.html#re.sub\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the function <code>re.sub<\/code><\/a>, which needs 3 parameters:<\/p>\n<ol><li>the pattern we're looking for;<\/li>\n<li>the replacement for the pattern; and<\/li>\n<li>the string we're searching &amp; replacing in.<\/li>\n<\/ol><p><code>re.sub<\/code> is like the string method <code>str.replace<\/code> on steroids!<\/p>\n<p>In my case, the pattern I was looking for was this:<\/p>\n<pre><code>^## (.*)$<\/code><\/pre>\n<p>This looks for a <code>##<\/code> at the beginning of the line, followed by a space, and then the <code>(.*)$<\/code> matches everything else until the end of the line.<\/p>\n<p>We can see this in action if we use <code>re.findall<\/code> and a dummy string:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; import re\n&gt;&gt;&gt; string = '# Problems\\n\\n## Dancing triangle\\n\\n...\\n\\n## Bag full of numbers\\n\\n...\\n\\n## Quarrel in the Shire\\n\\n...'\n&gt;&gt;&gt; re.findall(\"^## (.*)$\", string, flags=re.MULTILINE)\n[\n    'Dancing triangle',\n    'Bag full of numbers',\n    'Quarrel in the Shire'\n]<\/code><\/pre>\n<p>The flag <code>re.MULTILINE<\/code> was used so that the anchors <code>^<\/code> and <code>$<\/code> matche the beginning and end of each line, respectively, instead of the beginning and end of the string.<\/p>\n<p>(Go to <a href=\"https:\/\/regex101.com\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">regex101<\/a> (an online regex playground), paste the regular expression <code>^## (.*)$<\/code> in the top, middle bar, and copy and paste the first version of the <code># Problems<\/code> markdown in the big, central text area.)<\/p>\n<p>The next thing I wanted was to be able to replace each title with a number followed by <code>&ndash;<\/code> and then itself!\nBy using group references, adding the <code>&ndash;<\/code> before the title is easy:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; print(\n...     re.sub(\"^## (.*)$\", r\"## xx &ndash; \\1\", string, flags=re.MULTILINE)\n... )\n# Problems\n\n## xx &ndash; Dancing triangle\n\n...\n\n## xx &ndash; Bag full of numbers\n\n...\n\n## xx &ndash; Quarrel in the Shire\n\n...<\/code><\/pre>\n<p>The &ldquo;difficult&rdquo; part is adding the number that must be incremented each time we add it.\nThankfully, the function <code>re.sub<\/code> has a trick up its sleeve!<\/p>\n<p>The second parameter of the function <code>re.sub<\/code>, which is the replacement, can be a <em>function<\/em>.\nThis function must have a single parameter, which is the match object, and must return a string, which is the replacement for the given match.\nThus, by using a counter variable, I can add these increasing IDs:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; counter...<\/code><\/pre>","summary":"Learn how to find text patterns and replace them with dynamic content using regex.","date_modified":"2025-11-22T09:33:35+01:00","tags":["programming","python","regex","slice of life"],"image":"\/user\/pages\/02.blog\/dynamic-string-replacements-with-regex\/thumbnail.webp"},{"title":"Finding all strings a regular expression can find","date_published":"2017-11-20T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/regex-printer","url":"https:\/\/mathspp.com\/blog\/regex-printer","content_html":"<p>A <a href=\"https:\/\/en.wikipedia.org\/wiki\/Regular_expression\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">regular expression<\/a>, without much rigor, is a very compact way of representing several different strings. Given a regular expression (regex), can I find out all the strings the regex can find?<\/p>\n\n<figure class=\"image-caption\"><img title=\"Photo by Aaron Burden on Unsplash\" alt=\"page of a book with a paragraph highlighted\" src=\"\/images\/8\/8\/6\/5\/1\/8865141b562f6ac1143c768912109de36cf835d5-hightlighted-book.jpg\"><figcaption class=\"\">Photo by Aaron Burden on Unsplash<\/figcaption><\/figure><p>One common use of regular expressions is to look for strings that have a certain structure in a bigger string (say a text). As an example, the regular expression <code>abc(d|e)<\/code> can be used to look for the strings \"abcd\" and \"abce\", where the character <code>|<\/code> denotes we have to make a choice (see image above). Thus <code>cat|dog<\/code> would match the strings \"cat\" and \"dog\". There are other special symbols that have meanings and purposes.<\/p>\n<p><img alt=\"two screenshots of my program\" src=\"\/user\/pages\/02.blog\/regex-printer\/regex.webp\"><\/p>\n<p>One very interesting question that arises is: given a regular expression, what are the strings matched by it? To answer that question I wrote a small Python program, that I called <code>regexPrinter<\/code>, that prints all strings matched by a given regular expression! In order to manage that task, I chose a subset of the regex syntax that I wanted to be able to print and also decided that whenever a piece of a pattern was infinite, at a given point the program would just print \"...\" to denote that infinity. This way, for any regex given, the program always stops.<\/p>\n<p>The <code>regexPrinter<\/code> supports:<\/p>\n<ul><li>the <code>*<\/code> operator, that denotes that <span class=\"mathjax mathjax--inline\">\\(0\\)<\/span> or more repetitions are to be matched. For example, <code>ah*<\/code> matches \"a\", \"ah\", \"ahh\", ...;<\/li>\n<li>the <code>+<\/code> operator that denotes that <span class=\"mathjax mathjax--inline\">\\(1\\)<\/span> or more repetitions are to be matched. For example, <code>(hue)+<\/code> matches \"hue\", \"huehue\", \"huehuehue\", ...;<\/li>\n<li>the <code>?<\/code> operator that denotes either <span class=\"mathjax mathjax--inline\">\\(0\\)<\/span> or <span class=\"mathjax mathjax--inline\">\\(1\\)<\/span> occurrences of the preceding pattern. For example, <code>woo(hoo)?<\/code> matches \"woo\" and \"woohoo\";<\/li>\n<li>the <code>{a:b}<\/code> operator that matches no less than <span class=\"mathjax mathjax--inline\">\\(a\\)<\/span> and no more than <span class=\"mathjax mathjax--inline\">\\(b\\)<\/span> repetitions of the preceding pattern. As an example, <code>su{1:3}re<\/code> matches the strings \"sure\", \"suure\" and \"suuure\";<\/li>\n<li>the <code>|<\/code> operator that denotes a choice. <code>cat|kat<\/code> matches \"cat\" and \"kat\" and <code>thank(s| you)<\/code> matches \"thanks\" and \"thank you\";<\/li>\n<li>the <code>[]<\/code> denote that only one pattern from the ones given are to be matched. For example <code>[abc]<\/code> matches \"a\", \"b\" and \"c\";<\/li>\n<li>the parentheses <code>()<\/code> that are used to group things. One thing to note is that the quantifiers <code>*+?{:}<\/code> all have higher precedence than string concatenation e.g., <code>ab?<\/code> is interpreted as <code>a(b?)<\/code> and <em>not<\/em> <code>(ab)?<\/code>.<\/li>\n<\/ul><p>Please bear in mind that any character with no special meaning is interpreted literally, except inside the grouping operator <code>[]<\/code>, where every character is interpreted literally. That is, <code>[ab*]<\/code> matches \"a\", \"b\" and \"*\", while <code>b&amp;='<\/code> will match the string \"b&amp;='\".<\/p>\n<p><img alt=\"another screenshot of my program\" src=\"\/user\/pages\/02.blog\/regex-printer\/regex2.webp\"><\/p>\n<p>The code for the program can be found <a href=\"https:\/\/github.com\/RodrigoGiraoSerrao\/projects\/blob\/master\/misc\/regexPrinter.py\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">here<\/a> on GitHub and all it takes is vanilla Python 3 to run. Just run the script and you will be prompted to insert regular expressions. The techniques I used were very, very similar to the ones I used to create <a href=\"https:\/\/mathspp.com\/blog\/creating-programming-language-from-scratch\">my toy programming language<\/a>, as I saw in <a href=\"https:\/\/ruslanspivak.com\/lsbasi-part1\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">this<\/a> blog series. The way I went about writing this...<\/p>","summary":"In this post I explain briefly what a regex is and show something interesting I did with them.","date_modified":"2024-08-13T12:47:46+02:00","tags":["python","grammars","computation theory","parsers","regex"],"image":"\/user\/pages\/02.blog\/regex-printer\/hightlighted-book.jpg"}]}
