{"version":"https:\/\/jsonfeed.org\/version\/1","title":"mathspp.com feed","home_page_url":"https:\/\/mathspp.com\/blog\/tags\/regex","feed_url":"https:\/\/mathspp.com\/blog\/tags\/regex.json","description":"Stay up-to-date with the articles on mathematics and programming that get published to mathspp.com.","author":{"name":"Rodrigo Gir\u00e3o Serr\u00e3o"},"items":[{"title":"Problem #066 \u2013 regex crossword","date_published":"2024-08-27T12:00:00+02:00","id":"https:\/\/mathspp.com\/blog\/problems\/regex-crossword","url":"https:\/\/mathspp.com\/blog\/problems\/regex-crossword","content_html":"<p>Can you solve this crossword where all hints are regular expressions?<\/p>\n\n<h1 id=\"problem-statement\">Problem statement<a href=\"#problem-statement\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<figure class=\"image-caption\"><img title=\"The regex crossword puzzle grid.\" alt=\"Puzzle in the shape of an hexagonal tiled region where cells are supposed to be filled in with characters from the alphabet according to hints given as regular expressions.\" src=\"\/user\/pages\/02.blog\/03.problems\/p066-regex-crossword\/_puzzle.webp\"><figcaption class=\"\">The regex crossword puzzle grid.<\/figcaption><\/figure>\n<p>At <a href=\"https:\/\/ep2024.europython.eu\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">EuroPython 2024<\/a> I watched a lightning talk about the art of puzzle solving.\nIn it, the speaker showed the regex crossword puzzle that you can see above, <a href=\"https:\/\/puzzles.mit.edu\/2013\/coinheist.com\/rubik\/a_regular_crossword\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">that I took from this URL<\/a>.\nYou are supposed to fill the hexagonal grid with letters so that all of the regular expressions shown match.<\/p>\n<p>If you don't know regular expressions, the puzzle only uses a simple subset of the syntax:<\/p>\n<ul>\n<li>literal character matches;<\/li>\n<li>wildcard matches with <code>.<\/code>;<\/li>\n<li>alternatives with <code>|<\/code>;<\/li>\n<li>the quantifiers <code>?<\/code>, <code>+<\/code>, and <code>*<\/code>;<\/li>\n<li>character sets with <code>[...]<\/code> and negated character sets with <code>[^...]<\/code>; and<\/li>\n<li>groups with <code>()<\/code> and group references with <code>\\1<\/code>, <code>\\2<\/code>, etc.<\/li>\n<\/ul>\n<p>You can look this up and then you will be able to solve the challenge.\nYou can also use <a href=\"https:\/\/regex101.com\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the site regex101<\/a> to help you check what each regular expression means.<\/p>\n<div class=\"notices blue\">\n<p>Give it some thought!<\/p>\n<\/div>\n<h1 id=\"solution\">Solution<a href=\"#solution\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>In the spirit of the puzzle hunting community, I will not share my solved grid here.\nIf you need help, feel free to <a href=\"mailto:rodrigo@mathspp.com\" class=\"mailto\">email me<\/a> and we can talk it over.<\/p>\n<p>P.S. if I understood correctly, in the puzzle hunting community you're not supposed to just fill in the grid.\nIn some way, somehow, you are supposed to be able to extract an English word or phrase from that filled puzzle without any extra information, and then you check that you got it correctly by <a href=\"https:\/\/puzzles.mit.edu\/2013\/coinheist.com\/rubik\/a_regular_crossword\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">\u201cchecking your answer spoiler-free\u201d in the website where I took the regex crossword from<\/a>\nI am still stuck in that step!<\/p>","summary":"Can you solve this crossword where all hints are regular expressions?","date_modified":"2024-08-27T19:27:02+02:00","tags":["mathematics","regex"],"image":"\/user\/pages\/02.blog\/03.problems\/p066-regex-crossword\/thumbnail.webp"},{"title":"Dynamic string replacements with regex","date_published":"2024-01-09T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/dynamic-string-replacements-with-regex","url":"https:\/\/mathspp.com\/blog\/dynamic-string-replacements-with-regex","content_html":"<p>Learn how to find text patterns and replace them with dynamic content using regex.<\/p>\n\n<h1 id=\"dynamic-string-replacements-with-regex\">Dynamic string replacements with regex<a href=\"#dynamic-string-replacements-with-regex\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>The other day I was working on <a href=\"https:\/\/mathspp.gumroad.com\/l\/problems\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">my problems book<\/a> and I needed to update a markdown file to number the headings.\nIn short, I had a markdown file like this:<\/p>\n<pre><code class=\"language-md\"># Problems\n\n## Dancing triangle\n\n...\n\n## Bag full of numbers\n\n...\n\n## Quarrel in the Shire\n\n...<\/code><\/pre>\n<p>I wanted to find all H2 headings and number them, so I'd end up with this file:<\/p>\n<pre><code class=\"language-md\"># Problems\n\n## 01 &ndash; Dancing triangle\n\n...\n\n## 02 &ndash; Bag full of numbers\n\n...\n\n## 03 &ndash; Quarrel in the Shire\n\n...<\/code><\/pre>\n<p>I thought about doing this manually but I'd need to do this to 2 documents, each with thousands of lines and 64 headings, so I decided to use a bit of Python and regex to do it.<\/p>\n<p>The module <code>re<\/code>, from the Python standard library, lets you do string replacements with <a href=\"https:\/\/docs.python.org\/3\/library\/re.html#re.sub\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the function <code>re.sub<\/code><\/a>, which needs 3 parameters:<\/p>\n<ol><li>the pattern we're looking for;<\/li>\n<li>the replacement for the pattern; and<\/li>\n<li>the string we're searching &amp; replacing in.<\/li>\n<\/ol><p><code>re.sub<\/code> is like the string method <code>str.replace<\/code> on steroids!<\/p>\n<p>In my case, the pattern I was looking for was this:<\/p>\n<pre><code>^## (.*)$<\/code><\/pre>\n<p>This looks for a <code>##<\/code> at the beginning of the line, followed by a space, and then the <code>(.*)$<\/code> matches everything else until the end of the line.<\/p>\n<p>We can see this in action if we use <code>re.findall<\/code> and a dummy string:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; import re\n&gt;&gt;&gt; string = '# Problems\\n\\n## Dancing triangle\\n\\n...\\n\\n## Bag full of numbers\\n\\n...\\n\\n## Quarrel in the Shire\\n\\n...'\n&gt;&gt;&gt; re.findall(\"^## (.*)$\", string, flags=re.MULTILINE)\n[\n    'Dancing triangle',\n    'Bag full of numbers',\n    'Quarrel in the Shire'\n]<\/code><\/pre>\n<p>The flag <code>re.MULTILINE<\/code> was used so that the anchors <code>^<\/code> and <code>$<\/code> matche the beginning and end of each line, respectively, instead of the beginning and end of the string.<\/p>\n<p>(Go to <a href=\"https:\/\/regex101.com\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">regex101<\/a> (an online regex playground), paste the regular expression <code>^## (.*)$<\/code> in the top, middle bar, and copy and paste the first version of the <code># Problems<\/code> markdown in the big, central text area.)<\/p>\n<p>The next thing I wanted was to be able to replace each title with a number followed by <code>&ndash;<\/code> and then itself!\nBy using group references, adding the <code>&ndash;<\/code> before the title is easy:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; print(\n...     re.sub(\"^## (.*)$\", r\"## xx &ndash; \\1\", string, flags=re.MULTILINE)\n... )\n# Problems\n\n## xx &ndash; Dancing triangle\n\n...\n\n## xx &ndash; Bag full of numbers\n\n...\n\n## xx &ndash; Quarrel in the Shire\n\n...<\/code><\/pre>\n<p>The &ldquo;difficult&rdquo; part is adding the number that must be incremented each time we add it.\nThankfully, the function <code>re.sub<\/code> has a trick up its sleeve!<\/p>\n<p>The second parameter of the function <code>re.sub<\/code>, which is the replacement, can be a <em>function<\/em>.\nThis function must have a single parameter, which is the match object, and must return a string, which is the replacement for the given match.\nThus, by using a counter variable, I can add...<\/p>","summary":"Learn how to find text patterns and replace them with dynamic content using regex.","date_modified":"2024-01-10T22:41:05+01:00","tags":["programming","python","regex","slice of life"],"image":"\/user\/pages\/02.blog\/dynamic-string-replacements-with-regex\/thumbnail.webp"},{"title":"Finding all strings a regular expression can find","date_published":"2017-11-20T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/regex-printer","url":"https:\/\/mathspp.com\/blog\/regex-printer","content_html":"<p>A <a href=\"https:\/\/en.wikipedia.org\/wiki\/Regular_expression\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">regular expression<\/a>, without much rigor, is a very compact way of representing several different strings. Given a regular expression (regex), can I find out all the strings the regex can find?<\/p>\n\n<figure class=\"image-caption\"><img title=\"Photo by Aaron Burden on Unsplash\" alt=\"page of a book with a paragraph highlighted\" src=\"\/images\/8\/8\/6\/5\/1\/8865141b562f6ac1143c768912109de36cf835d5-hightlighted-book.jpg\"><figcaption class=\"\">Photo by Aaron Burden on Unsplash<\/figcaption><\/figure><p>One common use of regular expressions is to look for strings that have a certain structure in a bigger string (say a text). As an example, the regular expression <code>abc(d|e)<\/code> can be used to look for the strings \"abcd\" and \"abce\", where the character <code>|<\/code> denotes we have to make a choice (see image above). Thus <code>cat|dog<\/code> would match the strings \"cat\" and \"dog\". There are other special symbols that have meanings and purposes.<\/p>\n<p><img alt=\"two screenshots of my program\" src=\"\/user\/pages\/02.blog\/regex-printer\/regex.webp\"><\/p>\n<p>One very interesting question that arises is: given a regular expression, what are the strings matched by it? To answer that question I wrote a small Python program, that I called <code>regexPrinter<\/code>, that prints all strings matched by a given regular expression! In order to manage that task, I chose a subset of the regex syntax that I wanted to be able to print and also decided that whenever a piece of a pattern was infinite, at a given point the program would just print \"...\" to denote that infinity. This way, for any regex given, the program always stops.<\/p>\n<p>The <code>regexPrinter<\/code> supports:<\/p>\n<ul><li>the <code>*<\/code> operator, that denotes that <span class=\"mathjax mathjax--inline\">\\(0\\)<\/span> or more repetitions are to be matched. For example, <code>ah*<\/code> matches \"a\", \"ah\", \"ahh\", ...;<\/li>\n<li>the <code>+<\/code> operator that denotes that <span class=\"mathjax mathjax--inline\">\\(1\\)<\/span> or more repetitions are to be matched. For example, <code>(hue)+<\/code> matches \"hue\", \"huehue\", \"huehuehue\", ...;<\/li>\n<li>the <code>?<\/code> operator that denotes either <span class=\"mathjax mathjax--inline\">\\(0\\)<\/span> or <span class=\"mathjax mathjax--inline\">\\(1\\)<\/span> occurrences of the preceding pattern. For example, <code>woo(hoo)?<\/code> matches \"woo\" and \"woohoo\";<\/li>\n<li>the <code>{a:b}<\/code> operator that matches no less than <span class=\"mathjax mathjax--inline\">\\(a\\)<\/span> and no more than <span class=\"mathjax mathjax--inline\">\\(b\\)<\/span> repetitions of the preceding pattern. As an example, <code>su{1:3}re<\/code> matches the strings \"sure\", \"suure\" and \"suuure\";<\/li>\n<li>the <code>|<\/code> operator that denotes a choice. <code>cat|kat<\/code> matches \"cat\" and \"kat\" and <code>thank(s| you)<\/code> matches \"thanks\" and \"thank you\";<\/li>\n<li>the <code>[]<\/code> denote that only one pattern from the ones given are to be matched. For example <code>[abc]<\/code> matches \"a\", \"b\" and \"c\";<\/li>\n<li>the parentheses <code>()<\/code> that are used to group things. One thing to note is that the quantifiers <code>*+?{:}<\/code> all have higher precedence than string concatenation e.g., <code>ab?<\/code> is interpreted as <code>a(b?)<\/code> and <em>not<\/em> <code>(ab)?<\/code>.<\/li>\n<\/ul><p>Please bear in mind that any character with no special meaning is interpreted literally, except inside the grouping operator <code>[]<\/code>, where every character is interpreted literally. That is, <code>[ab*]<\/code> matches \"a\", \"b\" and \"*\", while <code>b&amp;='<\/code> will match the string \"b&amp;='\".<\/p>\n<p><img alt=\"another screenshot of my program\" src=\"\/user\/pages\/02.blog\/regex-printer\/regex2.webp\"><\/p>\n<p>The code for the program can be found <a href=\"https:\/\/github.com\/RodrigoGiraoSerrao\/projects\/blob\/master\/misc\/regexPrinter.py\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">here<\/a> on GitHub and all it takes is vanilla Python 3 to run. Just run the script and you will be prompted to insert regular expressions. The techniques I used were very, very similar to the ones I used to create <a href=\"https:\/\/mathspp.com\/blog\/creating-programming-language-from-scratch\">my toy programming language<\/a>, as I saw in <a href=\"https:\/\/ruslanspivak.com\/lsbasi-part1\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">this<\/a> blog series. The way I went about writing this...<\/p>","summary":"In this post I explain briefly what a regex is and show something interesting I did with them.","date_modified":"2024-08-13T12:47:46+02:00","tags":["python","grammars","computation theory","parsers","regex"],"image":"\/user\/pages\/02.blog\/regex-printer\/hightlighted-book.jpg"}]}