{"version":"https:\/\/jsonfeed.org\/version\/1","title":"mathspp.com feed","home_page_url":"https:\/\/mathspp.com\/blog\/tags\/compilers","feed_url":"https:\/\/mathspp.com\/blog\/tags\/compilers.json","description":"Stay up-to-date with the articles on mathematics and programming that get published to mathspp.com.","author":{"name":"Rodrigo Gir\u00e3o Serr\u00e3o"},"items":[{"title":"Building a Python compiler and interpreter \u2013 09 short-circuiting","date_published":"2023-11-24T15:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-09-short-circuiting","url":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-09-short-circuiting","content_html":"<p>In the 9th part of <a href=\"\/blog\/tag:bpci\">this series<\/a> of building a Python compiler and interpreter we will add support for Boolean operators and Boolean short-circuiting.<\/p>\n\n<h1 id=\"building-a-python-compiler-and-interpreter-09-short-circuiting\">Building a Python compiler and interpreter &ndash; 09 short-circuiting<a href=\"#building-a-python-compiler-and-interpreter-09-short-circuiting\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>This is the 9th article of the <a href=\"\/blog\/tag:bpci\">&ldquo;Building a Python compiler and interpreter&rdquo; series<\/a>, so make sure you've gone through the first eight articles before tackling this one!<\/p>\n<p>The code that serves as a starting point for this article is <a href=\"https:\/\/github.com\/mathspp\/building-a-python-compiler-and-interpreter\/tree\/v0.8.0\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the tag v0.8.0 of the code in this GitHub repository<\/a>.<\/p>\n<h2 id=\"objectives\">Objectives<a href=\"#objectives\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The objectives for this article are:<\/p>\n<ul><li>to add support for the Boolean operators <code>and<\/code> and <code>or<\/code>; and<\/li>\n<li>add support for <a href=\"\/blog\/pydonts\/boolean-short-circuiting\">Boolean short-circuiting<\/a> when using them.<\/li>\n<\/ul><p>In the previous article, when we <a href=\"\/blog\/building-a-python-compiler-and-interpreter-08-booleans\">added Boolean literals and the operator <code>not<\/code><\/a>, we finished the article by saying it was a short article because this one would be tough...\nI was so wrong!<\/p>\n<p>Turns out that our work today won't be <em>that<\/em> difficult after all.\nIt won't be trivial but it will be manageable.\nLet's get into it.<\/p>\n<h2 id=\"researching-how-python-does-it\">Researching how Python does it<a href=\"#researching-how-python-does-it\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>Implementing the Boolean operators <code>and<\/code> and <code>or<\/code> didn't sound like an unmanageable challenge, but I wanted to get it right to prepare myself for the Boolean short-circuiting.\nSo, the first step before writing <em>any<\/em> code was figuring out how Python currently does it.<\/p>\n<h3 id=\"how-python-parses-boolean-operators\">How Python parses Boolean operators<a href=\"#how-python-parses-boolean-operators\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>I started off by taking a look at how Python parses a simple conjunction with the operator <code>and<\/code>:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; import ast\n&gt;&gt;&gt; ast.dump(ast.parse(\"a and b\"))\n\"Module(\n    body=[\n        Expr(\n            value=BoolOp(\n                op=And(),\n                values=[\n                    Name(id='a', ctx=Load()),\n                    Name(id='b', ctx=Load())\n                ]\n            )\n        )\n    ],\n    type_ignores=[]\n)\"<\/code><\/pre>\n<p>That's a lot, but we can &ldquo;zoom in&rdquo; on the part that matters:<\/p>\n<pre><code class=\"language-py\">BoolOp(op=And(), values=[...])<\/code><\/pre>\n<p>So, it looks like there is a tree node specifically for Boolean operators, and then all of the values are put together in a list <code>values<\/code>.\nThis was interesting, because I expected to find a tree node that was more similar to the <code>BinOp<\/code> node.\nIn fact, in the beginning I thought about using <code>BinOp<\/code> for the Boolean operators as well.<\/p>\n<p>However, having a list of values is very helpful when we have three or more values:<\/p>\n<pre><code class=\"language-py\">BoolOp(\n    op=And(),\n    values=[\n        Name(id='a', ...),\n        Name(id='b', ...),\n        Name(id='c', ...)\n    ]\n)<\/code><\/pre>\n<p>For now, it's not yet obvious why this will be so useful, but as soon as you are done implementing Boolean short-circuiting you will understand why.<\/p>\n<p>So, we already know how we'll be parsing these operations.\nBut what about the compilation?<\/p>\n<h3 id=\"how-python-compiles-boolean-operators\">How Python compiles Boolean operators<a href=\"#how-python-compiles-boolean-operators\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>I did a similar type of research to see how Python compiles these Boolean operators and I was actually surprised!\nHere is the bytecode for <code>a and b<\/code>:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; dis.dis(\"a and b\")\n  0           0 RESUME                   0\n\n  1           2 LOAD_NAME                0 (a)\n              4 COPY                     1\n              6 POP_JUMP_IF_FALSE        3 (to 14)\n              8 POP_TOP\n             10 LOAD_NAME                1 (b)\n             12 RETURN_VALUE\n        &gt;&gt;   14 RETURN_VALUE<\/code><\/pre>\n<p>What we can see is that the operator <code>and<\/code> uses the bytecode operation <code>POP_JUMP_IF_FALSE<\/code> to jump over the evaluation of the right...<\/p>","summary":"In the 9th part of building a Python compiler and interpreter we will add support for Boolean operators and Boolean short-circuiting.","date_modified":"2024-03-19T12:56:44+01:00","tags":["bpci","compilers","interpreters","programming","python"],"image":"\/user\/pages\/02.blog\/building-a-python-compiler-and-interpreter-09-short-circuiting\/thumbnail.webp"},{"title":"Building a Python compiler and interpreter \u2013 08 Booleans","date_published":"2023-11-13T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-08-booleans","url":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-08-booleans","content_html":"<p>In the 8th part of <a href=\"\/blog\/tag:bpci\">this series<\/a> of building a Python compiler and interpreter we will add support for Boolean literals and Boolean operators.<\/p>\n\n<h1 id=\"building-a-python-compiler-and-interpreter-08-booleans\">Building a Python compiler and interpreter &ndash; 08 Booleans<a href=\"#building-a-python-compiler-and-interpreter-08-booleans\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>This is the 8th article of the <a href=\"\/blog\/tag:bpci\">&ldquo;Building a Python compiler and interpreter&rdquo; series<\/a>, so make sure you've gone through the first seven articles before tackling this one!<\/p>\n<p>The code that serves as a starting point for this article is <a href=\"https:\/\/github.com\/mathspp\/building-a-python-compiler-and-interpreter\/tree\/v0.7.0\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the tag v0.7.0 of the code in this GitHub repository<\/a>.<\/p>\n<h2 id=\"objectives\">Objectives<a href=\"#objectives\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The objectives for this article are:<\/p>\n<ul><li>the introduction of the Boolean literal values <code>True<\/code> and <code>False<\/code>; and<\/li>\n<li>the unary Boolean operator <code>not<\/code>.<\/li>\n<\/ul><h2 id=\"adding-boolean-literals\">Adding Boolean literals<a href=\"#adding-boolean-literals\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<h3 id=\"boolean-literals-are-keywords\">Boolean literals are keywords<a href=\"#boolean-literals-are-keywords\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>The addition of the Boolean literals doesn't amount to too much work, altough we did tweak the parser in a way that may not be obvious.<\/p>\n<p>So, we start off by creating the appropriate token types and then registering <code>True<\/code> and <code>False<\/code> as keywords.\nIn case you're wondering why we classify <code>True<\/code> and <code>False<\/code> as keywords, the answer is simple.\nWe do it because Python does it:<\/p>\n<pre><code class=\"language-py\">&gt;&gt;&gt; import keyword\n&gt;&gt;&gt; keyword.iskeyword(\"True\")\nTrue\n&gt;&gt;&gt; keyword.iskeyword(\"False\")\nTrue<\/code><\/pre>\n<p>See?\nI answered your question without answering your question!<\/p>\n<p>Let us add <code>True<\/code> and <code>False<\/code> as keywords:<\/p>\n<pre><code class=\"language-py\">class TokenType(StrEnum):\n    # ...\n    TRUE = auto()  # True\n    FALSE = auto()  # False\n\n# ...\n\nKEYWORDS_AS_TOKENS: dict[str, TokenType] = {\n    \"if\": TokenType.IF,\n    \"True\": TokenType.TRUE,\n    \"False\": TokenType.FALSE,\n}<\/code><\/pre>\n<p>We add three tests, and now Boolean literals can be tokenized:<\/p>\n<pre><code class=\"language-py\">@pytest.mark.parametrize(\n    [\"code\", \"token\"],\n    [\n        # ...\n        (\"True\", Token(TokenType.TRUE)),\n        (\"False\", Token(TokenType.FALSE)),\n    ],\n)\ndef test_tokenizer_recognises_each_token(code: str, token: Token):\n    assert Tokenizer(code).next_token() == token\n\n# ...\n\ndef test_tokenizer_boolean_values():\n    code = \"a = True\\nb=False\"\n    tokens = list(Tokenizer(code))\n    assert tokens == [\n        Token(TokenType.NAME, \"a\"),\n        Token(TokenType.ASSIGN),\n        Token(TokenType.TRUE),\n        Token(TokenType.NEWLINE),\n        Token(TokenType.NAME, \"b\"),\n        Token(TokenType.ASSIGN),\n        Token(TokenType.FALSE),\n        Token(TokenType.NEWLINE),\n        Token(TokenType.EOF),\n    ]<\/code><\/pre>\n<h3 id=\"creating-a-general-tree-node-for-constants\">Creating a general tree node for constants<a href=\"#creating-a-general-tree-node-for-constants\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>At this point, we have two subclasses of <code>TreeNode<\/code> that are used for constant values:<\/p>\n<ol><li><code>Int<\/code>; and<\/li>\n<li><code>Float<\/code>.<\/li>\n<\/ol><p>With the addition of Booleans, we'd need a third node, and possibly a fourth and a fifth for <code>None<\/code> and strings.\nAnd possibly even more for dictionaries, lists, sets, and other things.<\/p>\n<p>Instead of doing this, we'll simplify things a bit and we'll create a tree node called <code>Constant<\/code> that will replace all these use cases.<\/p>\n<p>We start by creating this class <code>Constant<\/code> and by deleting <code>Int<\/code> and <code>Float<\/code>:<\/p>\n<pre><code class=\"language-py\"># Gone:\n# @dataclass\n# class Int(Expr):\n#     value: int\n\n# @dataclass\n# class Float(Expr):\n#     value: float\n\n@dataclass\nclass Constant(Expr):\n    value: bool | float | int<\/code><\/pre>\n<p>Now, we need to search and replace all occurrences of <code>Int(<\/code> and <code>Float(<\/code> with <code>Constant(<\/code>.<\/p>\n<p>For example, <code>Parser.parse_value<\/code> must be fixed:<\/p>\n<pre><code class=\"language-py\">class Parser:\n    # ...\n\n    def parse_value(self) -&gt; Variable | Constant:\n        \"\"\"Parses an integer or a float.\"\"\"\n        next_token_type = self.peek()\n        if next_token_type == TokenType.NAME:\n            return Variable(self.eat(TokenType.NAME).value)\n        elif next_token_type in {TokenType.INT, TokenType.FLOAT}:\n            return Constant(self.eat(next_token_type).value)\n        else:\n            raise RuntimeError(f\"Can't parse {next_token_type} as a value.\")<\/code><\/pre>\n<p>And so do all of the tests and <code>import<\/code> statements.<\/p>\n<p>We also need to remove the...<\/p>","summary":"In the 8th part of building a Python compiler and interpreter we will add support for Boolean literals and Boolean operators.","date_modified":"2023-11-13T20:20:08+01:00","tags":["bpci","compilers","interpreters","programming","python"],"image":"\/user\/pages\/02.blog\/building-a-python-compiler-and-interpreter-08-booleans\/thumbnail.webp"},{"title":"Building a Python compiler and interpreter \u2013 07 if","date_published":"2023-11-10T16:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-07-if","url":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-07-if","content_html":"<p>In the 7th part of <a href=\"\/blog\/tag:bpci\">this series<\/a> of building a Python compiler and interpreter we will add support for <code>if<\/code> statements.<\/p>\n\n<h1 id=\"building-a-python-compiler-and-interpreter-07-if\">Building a Python compiler and interpreter &ndash; 07 if<a href=\"#building-a-python-compiler-and-interpreter-07-if\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>This is the 7th article of the <a href=\"\/blog\/tag:bpci\">&ldquo;Building a Python compiler and interpreter&rdquo; series<\/a>, so make sure you've gone through the first six articles before tackling this one!<\/p>\n<p>The code that serves as a starting point for this article is <a href=\"https:\/\/github.com\/mathspp\/building-a-python-compiler-and-interpreter\/tree\/v0.6.0\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the tag v0.6.0 of the code in this GitHub repository<\/a>.<\/p>\n<h2 id=\"objectives\">Objectives<a href=\"#objectives\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>Broadly speaking, for this article we want to add support to <code>if<\/code> statements.\nIn practice, these are the smaller tasks we'll need to handle:<\/p>\n<ul><li>start tokenizing keywords and colons <code>:<\/code>;<\/li>\n<li>start tokenizing indentation;<\/li>\n<li>change the grammar to support arbitrarily nested <code>if<\/code> statements;<\/li>\n<li>compile these <code>if<\/code> statements; and<\/li>\n<li>change the interpreter so that <code>if<\/code> statements can change the bytecode that the interpreter will run next.<\/li>\n<\/ul><h2 id=\"booleans\">Booleans<a href=\"#booleans\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>In case you're wondering how we'll use conditional statements if we don't even have Booleans in our language yet, it's simple: we'll cheat a bit.\nIn Python, <a href=\"\/blog\/pydonts\/truthy-falsy-and-bool\">any object has a Truthy or Falsy value<\/a>, which means integers do too.\nSo, in the beginning we'll just use integers as Booleans.<\/p>\n<p>The truth of the matter is that adding the literal Booleans <code>True<\/code> and <code>False<\/code>, alongside comparison operators and Boolean operators, is pretty similar to what we've been doing so far.\nSo, I decided to lead with the challenge of implementing the <code>if<\/code> statement itself, and then we'll tackle the other adjacent things (which are also included as <a href=\"#exercises\">exercises<\/a>).<\/p>\n<h2 id=\"tokenization-of-keywords\">Tokenization of keywords<a href=\"#tokenization-of-keywords\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>One thing that <code>if<\/code> statements will introduce is keywords.\nSo far, our program did not have any keywords whatsoever.\nNow, this will change with the addition of the keyword <code>if<\/code>.<\/p>\n<p>I thought about creating a token type called <code>KEYWORD<\/code> and then having the keyword be the value of the token.\nHowever, I noticed I have different token types for the different operators <code>+<\/code>, <code>-<\/code>, and others, so I thought it made sense to introduce a token type for the keyword <code>if<\/code>, specifically:<\/p>\n<pre><code class=\"language-py\">class TokenType(StrEnum):\n    # ...\n    IF = auto()  # if<\/code><\/pre>\n<p>Next, tokenization of keywords can be simple if we leverage some of the work we've already done.\nUp until now, <code>if<\/code> would be a perfectly valid variable name in our language.\nSo, what we can do is let the tokenizer handle names as usual, and right before it creates a token <code>NAME<\/code> we intercept it, check if the name is a keyword, and create a keyword token if necessary.<\/p>\n<p>To do that, we can create a dictionary that maps keywords onto tokens:<\/p>\n<pre><code class=\"language-py\">KEYWORDS_AS_TOKENS: dict[str, TokenType] = {\n    \"if\": Token(TokenType.IF),\n}<\/code><\/pre>\n<p>Here is how we'd change the implementation of <code>Tokenizer.next_token<\/code> using the ideas outlined above:<\/p>\n<pre><code class=\"language-py\">class Tokenizer:\n    # ...\n\n    def next_token(self) -&gt; Token:\n        # ...\n\n        elif char in LEGAL_NAME_START_CHARACTERS:\n            name = self.consume_name()\n            keyword_token_type = KEYWORDS_AS_TOKENS.get(name, None)\n            if keyword_token_type:\n                return Token(keyword_token_type)\n            else:\n                return Token(TokenType.NAME, name)\n\n        # ...<\/code><\/pre>\n<p>We can check that <code>if<\/code> is a...<\/p>","summary":"In the 7th part of building a Python compiler and interpreter we will add support for if statements.","date_modified":"2023-11-17T15:20:04+01:00","tags":["bpci","compilers","interpreters","programming","python"],"image":"\/user\/pages\/02.blog\/building-a-python-compiler-and-interpreter-07-if\/thumbnail.webp"},{"title":"Building a Python compiler and interpreter \u2013 06 variables","date_published":"2023-11-09T10:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-06-variables","url":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-06-variables","content_html":"<p>In the 6th part of <a href=\"\/blog\/tag:bpci\">this series<\/a> of building a Python compiler and interpreter we will add support for variables, simple assignments, and chained assignments.<\/p>\n\n<h1 id=\"building-a-python-compiler-and-interpreter-06-variables\">Building a Python compiler and interpreter &ndash; 06 variables<a href=\"#building-a-python-compiler-and-interpreter-06-variables\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>This is the 6th article of the <a href=\"\/blog\/tag:bpci\">&ldquo;Building a Python compiler and interpreter&rdquo; series<\/a>, so make sure you've gone through the first five articles before tackling this one!<\/p>\n<p>The code that serves as a starting point for this article is <a href=\"https:\/\/github.com\/mathspp\/building-a-python-compiler-and-interpreter\/tree\/v0.5.0\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the tag v0.5.0 of the code in this GitHub repository<\/a>.<\/p>\n<h2 id=\"objectives\">Objectives<a href=\"#objectives\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The objectives for this article all revolve around adding support for variables:<\/p>\n<ul><li>tokenising variable names;<\/li>\n<li>tokenising assignments;<\/li>\n<li>parsing variable assignment and variable references;<\/li>\n<li>add compilation support for the assignment statement;<\/li>\n<li>modifying the interpreter to introduce a scope; and<\/li>\n<li>add support for consecutive assignments of the form <code>a = b = c = 3<\/code>.<\/li>\n<\/ul><h2 id=\"adding-variable-assignment\">Adding variable assignment<a href=\"#adding-variable-assignment\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<h3 id=\"tokenizing-names\">Tokenizing names<a href=\"#tokenizing-names\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>Naturally, the first thing we need to do is make sure we are able to tokenize names from the source code.\nThis is straightforward to do if we mimic the work we've done for arbitrarily long integers.<\/p>\n<p>First, we create an appropriate token type and identify the characters that can be a part of variable names:<\/p>\n<pre><code class=\"language-py\">from string import digits, ascii_letters\n# ...\n\nclass TokenType(StrEnum):\n    # ...\n    NAME = auto()  # any possible variable name\n\n# ...\n\nLEGAL_NAME_CHARACTERS = ascii_letters + digits + \"_\"\nLEGAL_NAME_START_CHARACTERS = ascii_letters + \"_\"<\/code><\/pre>\n<p>Python variable names don't need to be restricted to ASCII letters only.\nFor example, <code>&aacute;&ntilde;&oslash;<\/code> is a perfectly valid variable name:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; &aacute;&ntilde;&oslash; = 3\n&gt;&gt;&gt; &aacute;&ntilde;&oslash;\n3<\/code><\/pre>\n<p>But we'll keep it simpler for our own sake.<\/p>\n<p>Then, we create a method <code>Tokenizer.consume_name<\/code> and we use it to tokenize names:<\/p>\n<pre><code class=\"language-py\">class Tokenizer:\n    # ...\n\n    def consume_name(self) -&gt; str:\n        \"\"\"Consumes a sequence of characters that could be a variable name.\"\"\"\n        start = self.ptr\n        self.ptr += 1\n        while (\n            self.ptr &lt; len(self.code) and self.code[self.ptr] in LEGAL_NAME_CHARACTERS\n        ):\n            self.ptr += 1\n        return self.code[start : self.ptr]\n\n    # ...\n\n    def next_token(self) -&gt; Token:\n        # ...\n\n        self.beginning_of_line = False\n        if self.peek(length=2) == \"**\":\n            self.ptr += 2\n            return Token(TokenType.EXP)\n        elif char in CHARS_AS_TOKENS:\n            self.ptr += 1\n            return Token(CHARS_AS_TOKENS[char])\n        elif char in LEGAL_NAME_START_CHARACTERS:  # &lt;-- New!\n            name = self.consume_name()\n            return Token(TokenType.NAME, name)\n        # ...<\/code><\/pre>\n<p>Now, I decided to also modify the code at the bottom of the file <code>tokenizer.py<\/code>:<\/p>\n<pre><code class=\"language-py\">if __name__ == \"__main__\":\n    import sys\n\n    code = sys.argv[1]\n    for token in Tokenizer(code):\n        print(token)<\/code><\/pre>\n<p>This way, I can just run something like <code>python -m python.tokenizer a b c _123<\/code> and get the output that follows:<\/p>\n<pre><code class=\"language-py\">Token(TokenType.NAME, 'a')\nToken(TokenType.NAME, 'b')\nToken(TokenType.NAME, 'c')\nToken(TokenType.NAME, '_234')\nToken(TokenType.NEWLINE, None)\nToken(TokenType.EOF, None)<\/code><\/pre>\n<p>I'll also tweak the method <code>__repr__<\/code> on tokens so that it doesn't show the second argument when it's <code>None<\/code>:<\/p>\n<pre><code class=\"language-py\">@dataclass\nclass Token:\n    type: TokenType\n    value: Any = None\n\n    def __repr__(self) -&gt; str:  # &lt;-- Changed.\n        if self.value is not None:\n            return f\"{self.__class__.__name__}({self.type!r}, {self.value!r})\"\n        else:\n            return f\"{self.__class__.__name__}({self.type!r})\"<\/code><\/pre>\n<p>The same command from above now produces slightly more condensed output:<\/p>\n<pre><code class=\"language-py\">Token(TokenType.NAME, 'a')\nToken(TokenType.NAME, 'b')\nToken(TokenType.NAME, 'c')\nToken(TokenType.NAME, '_234')...<\/code><\/pre>","summary":"In the 6th part of building a Python compiler and interpreter we will add support for variables, simple assignments, and chained assignments.","date_modified":"2023-11-09T20:06:04+01:00","tags":["bpci","compilers","interpreters","programming","python"],"image":"\/user\/pages\/02.blog\/building-a-python-compiler-and-interpreter-06-variables\/thumbnail.webp"},{"title":"Building a Python compiler and interpreter \u2013 05 statements","date_published":"2023-11-08T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-05-statements","url":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-05-statements","content_html":"<p>In the 5th part of <a href=\"\/blog\/tag:bpci\">this series<\/a> of building a Python compiler and interpreter we will add support for multiple statements in our program.<\/p>\n\n<h1 id=\"building-a-python-compiler-and-interpreter-05-statements\">Building a Python compiler and interpreter &ndash; 05 statements<a href=\"#building-a-python-compiler-and-interpreter-05-statements\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>This is the 5th article of the <a href=\"\/blog\/tag:bpci\">&ldquo;Building a Python compiler and interpreter&rdquo; series<\/a>, so make sure you've gone through the first four articles before tackling this one!<\/p>\n<p>The code that serves as a starting point for this article is <a href=\"https:\/\/github.com\/mathspp\/building-a-python-compiler-and-interpreter\/tree\/v0.4.0\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the tag v0.4.0 of the code in this GitHub repository<\/a>.<\/p>\n<h2 id=\"objectives\">Objectives<a href=\"#objectives\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The objective for this article is to make sure that our program can be composed of a series of statements (separated by newlines, as Python does).\nAs it stands, we can only run a single line of code:<\/p>\n<pre><code class=\"language-bash\">&#10095; python -m python.interpreter \"1 + 2\n3 + 4\n5 + 6\"\nRuntimeError: Can't tokenize '\\n'.<\/code><\/pre>\n<p>We'll change this in this article.<\/p>\n<h2 id=\"handling-multiple-statements\">Handling multiple statements<a href=\"#handling-multiple-statements\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<h3 id=\"tokenizing\">Tokenizing<a href=\"#tokenizing\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>In order to handle multiple statements we need to be able to tokenize the statement separators, which are newlines.\nThus, we start by introducing that token type:<\/p>\n<pre><code class=\"language-py\">class TokenType(StrEnum):\n    # ...\n    NEWLINE = auto()  # statement separator<\/code><\/pre>\n<p>Now, we might want to add the newline character <code>\"\\n\"<\/code> to the mapping <code>CHARS_AS_TOKENS<\/code>.\nThis should be enough to allow tokenizing newlines.<\/p>\n<p>However, if we do that, we'll produce as many <code>NEWLINE<\/code> tokens as there are newlines in the code, even if we have multiple empty lines in a row.\nThis isn't helpful, because we only care about newlines that appear <em>after<\/em> some code.<\/p>\n<p>We'll modify the tokenizer to cope with this.\nWe'll create an attribute <code>beginning_of_line<\/code> that determines whether we've produced any tokens on this line or not.\nIf we hit a newline character and <code>beginning_of_line<\/code> is <code>True<\/code>, that's because we haven't done anything on this line yet and thus we don't want to produce a <code>NEWLINE<\/code> token.<\/p>\n<p>So, the tokenizer is modified to look like this:<\/p>\n<pre><code class=\"language-py\">class Tokenizer:\n    def __init__(self, code: str) -&gt; None:\n        self.code = code\n        self.ptr: int = 0\n        self.beginning_of_line = True\n\n    # ...\n\n    def next_token(self) -&gt; Token:\n        while self.ptr &lt; len(self.code) and self.code[self.ptr] == \" \":\n            self.ptr += 1\n\n        if self.ptr == len(self.code):\n            return Token(TokenType.EOF)\n\n        # Handle the newline case.\n        char = self.code[self.ptr]\n        if char == \"\\n\":\n            self.ptr += 1\n            if not self.beginning_of_line:\n                self.beginning_of_line = True\n                return Token(TokenType.NEWLINE)\n            else:  # If we're at the BoL, get the next token instead.\n                return self.next_token()\n\n        # If we got to this point, we're about to produce another token\n        # so we can set BoL to False.\n        self.beginning_of_line = False\n        if self.peek(length=2) == \"**\":\n            self.ptr += 2\n            return Token(TokenType.EXP)\n        # Other cases here...<\/code><\/pre>\n<div class=\"notices yellow\">\n<p>In case you're wondering, it's not trivial to figure out this was the &ldquo;best&rdquo; thing to do when you're inexperienced (like I am).\nMany times, I decide to do things in a certain way, and when I make some progress I realise that I should've done it in a different way.\nI'm just trying to short-circuit <em>some<\/em> of those bad decisions in these articles...\nAlthough they're...<\/p><\/div>","summary":"In the 5th part of building a Python compiler and interpreter we will add support for multiple statements in our program.","date_modified":"2023-11-10T17:45:44+01:00","tags":["bpci","compilers","interpreters","programming","python"],"image":"\/user\/pages\/02.blog\/building-a-python-compiler-and-interpreter-05-statements\/thumbnail.webp"},{"title":"Building a Python compiler and interpreter \u2013 04 arithmetic","date_published":"2023-11-07T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-04-arithmetic","url":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-04-arithmetic","content_html":"<p>In the 4th part of <a href=\"\/blog\/tag:bpci\">this series<\/a> of building a Python compiler and interpreter we will add support for more arithmetic operations and parenthesised expressions.<\/p>\n\n<h1 id=\"building-a-python-compiler-and-interpreter-04-arithmetic\">Building a Python compiler and interpreter &ndash; 04 arithmetic<a href=\"#building-a-python-compiler-and-interpreter-04-arithmetic\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>This is the 4th article of the <a href=\"\/blog\/tag:bpci\">&ldquo;Building a Python compiler and interpreter&rdquo; series<\/a>, so make sure you've gone through the first three articles before tackling this one!<\/p>\n<p>The code that serves as a starting point for this article is <a href=\"https:\/\/github.com\/mathspp\/building-a-python-compiler-and-interpreter\/tree\/v0.3.0\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the tag v0.3.0 of the code in this GitHub repository<\/a>.<\/p>\n<h2 id=\"objectives\">Objectives<a href=\"#objectives\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The objectives for this article are the following:<\/p>\n<ul><li>add support for the unary operators <code>-<\/code> and <code>+<\/code>;<\/li>\n<li>add support for parenthesised expressions;<\/li>\n<li>add support for more binary operators: <code>*<\/code>, <code>\/<\/code>, <code>%<\/code>, and <code>**<\/code>; and<\/li>\n<li>understand the relationship between the precedence of operations and the order in which the grammar rules are written.<\/li>\n<\/ul><h2 id=\"unary-operators-and\">Unary operators <code>-<\/code> and <code>+<\/code><a href=\"#unary-operators-and\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>We want to add support to the unary operators <code>-<\/code> and <code>+<\/code>, and sometimes adding support for new syntax starts at the tokenizer level...\nBut not this time, as the tokenizer already knows what the operators <code>-<\/code> and <code>+<\/code> are.<\/p>\n<p>Thus, we can start at the grammar level.<\/p>\n<h3 id=\"grammar-rule-for-unary-operators\">Grammar rule for unary operators<a href=\"#grammar-rule-for-unary-operators\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>In the previous article I hinted at the fact that the nesting of the grammar rules influences the precedence of operations and unary operators have higher precedence over binary operators, so the grammar rule for unary operators must be deeper than the one for binary operators.<\/p>\n<p>The current grammar looks like this:<\/p>\n<pre><code>program := computation\ncomputation := number ( (PLUS | MINUS) number )*\nnumber := INT | FLOAT<\/code><\/pre>\n<p>We'll add a rule <code>unary<\/code> that is referenced by <code>computation<\/code>, so that parsing a computation now means we look for unary operators on both sides of the operator, instead of looking for numbers:<\/p>\n<pre><code>program := computation\ncomputation := unary ( (PLUS | MINUS) unary )*  # &lt;- reference unary here\nunary := PLUS unary | MINUS unary | number    # &lt;- new rule\nnumber := INT | FLOAT<\/code><\/pre>\n<p>Notice how the rule <code>unary<\/code> references itself in the first two options.\nThis makes it so that we can handle <code>-3<\/code> and <code>-----3<\/code> with the same ease.<\/p>\n<h3 id=\"add-unary-operators-to-the-ast\">Add unary operators to the AST<a href=\"#add-unary-operators-to-the-ast\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>Before we can change the parser to accomodate the new unary operators, we need to make sure that the AST can represent unary operators.\nMuch like we have a node <code>BinOp<\/code>, we can create a node <code>UnaryOp<\/code>:<\/p>\n<pre><code class=\"language-py\">@dataclass\nclass UnaryOp(Expr):\n    op: str\n    value: Expr<\/code><\/pre>\n<h3 id=\"add-the-new-rule-to-the-parser\">Add the new rule to the parser<a href=\"#add-the-new-rule-to-the-parser\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>Now that we changed the grammar rules, we need to modify our parser.\nBy looking at the rules that changed, we can know <em>exactly<\/em> which parser methods we need to modify:<\/p>\n<ul><li><code>parse_computation<\/code> &ndash; the rule <code>computation<\/code> was changed so we need to change this method; and<\/li>\n<li><code>parse_unary<\/code> &ndash; the rule <code>unary<\/code> is new, so we need to implement this method.<\/li>\n<\/ul><p>Here are the changes to the method <code>parse_computation<\/code>:<\/p>\n<pre><code class=\"language-py\">class Parser:\n    # ......<\/code><\/pre>","summary":"In the 4th part of building a Python compiler and interpreter we will add support for more arithmetic operations and parenthesised expressions.","date_modified":"2023-11-09T00:33:35+01:00","tags":["bpci","compilers","interpreters","programming","python"],"image":"\/user\/pages\/02.blog\/building-a-python-compiler-and-interpreter-04-arithmetic\/thumbnail.webp"},{"title":"Building a Python compiler and interpreter \u2013 03 visitor pattern","date_published":"2023-11-05T19:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-03-visitor-pattern","url":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter-03-visitor-pattern","content_html":"<p>In the third part of <a href=\"\/blog\/tag:bpci\">this series<\/a> of building a Python compiler and interpreter we will make our parser, compiler, and interpreter, much more flexible with the visitor pattern.<\/p>\n\n<h1 id=\"building-a-python-compiler-and-interpreter-03-visitor-pattern\">Building a Python compiler and interpreter &ndash; 03 visitor pattern<a href=\"#building-a-python-compiler-and-interpreter-03-visitor-pattern\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>This is the third article of the <a href=\"\/blog\/tag:bpci\">&ldquo;Building a Python compiler and interpreter&rdquo; series<\/a>, so make sure you've gone through the first two articles before tackling this one!<\/p>\n<p>The code that serves as a starting point for this article is <a href=\"https:\/\/github.com\/mathspp\/building-a-python-compiler-and-interpreter\/tree\/v0.2.0\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the tag v0.2.0 of the code in this GitHub repository<\/a>.<\/p>\n<h2 id=\"objectives\">Objectives<a href=\"#objectives\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>The objectives for this article are the following:<\/p>\n<ul><li>understand what a language grammar is and how it will shape the design of the parser;<\/li>\n<li>learn about the visitor pattern and see how much flexible it makes our compiler and interpreter; and<\/li>\n<li>add support for consecutive additions and subtractions.<\/li>\n<\/ul><h2 id=\"language-grammar\">Language grammar<a href=\"#language-grammar\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<p>A language grammar is a way of representing the syntax that is valid in a given language.\nThe exact notation varies a bit but the gist is always the same.\nIn a language grammar you write \"rules\" that represent what is valid in your language and each rule has two parts:<\/p>\n<ol><li>the name of the rule; and<\/li>\n<li>the \"body\" of the rule, which represents the syntax that matches the rule.<\/li>\n<\/ol><p>Naturally, rules can reference each other and that is what introduces freedom (but also complexity) to the grammars (and, ultimately, to our programming language).<\/p>\n<h3 id=\"the-grammar-of-our-language\">The grammar of our language<a href=\"#the-grammar-of-our-language\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>Right now, the grammar that represents the subset of Python that we support could be represented as such:<\/p>\n<pre><code>program := computation EOF\ncomputation := number (PLUS | MINUS) number\nnumber := INT | FLOAT<\/code><\/pre>\n<p>We write rules in lowercase and token types in upper case.\nSo, in the grammar above, the words <code>program<\/code>, <code>computation<\/code>, and <code>number<\/code> refer to grammar rules while the words <code>EOF<\/code>, <code>PLUS<\/code>, <code>MINUS<\/code>, <code>INT<\/code>, and <code>FLOAT<\/code>, refer to token types.<\/p>\n<p>The rule <code>program<\/code> reads<\/p>\n<pre><code>program := computation EOF<\/code><\/pre>\n<p>This means that a program is a computation followed by an <code>EOF<\/code> token.\nIn turn, the rule <code>computation<\/code> reads<\/p>\n<pre><code>computation := number (PLUS | MINUS) number<\/code><\/pre>\n<p>This means that a computation is a number, followed by a plus sign or a minus sign, and then another number.\nNotice how we use the symbol <code>|<\/code> to represent alternatives, so <code>PLUS | MINUS<\/code> means \"a plus <em>or<\/em> a minus\".<\/p>\n<p>Finally, the rule <code>number<\/code> reads<\/p>\n<pre><code>number := INT | FLOAT<\/code><\/pre>\n<p>This rule means that a number is either an <code>INT<\/code> token or a <code>FLOAT<\/code> token.<\/p>\n<h3 id=\"relationship-between-the-grammar-and-the-parser\">Relationship between the grammar and the parser<a href=\"#relationship-between-the-grammar-and-the-parser\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>Here is the full grammar, once again:<\/p>\n<pre><code>program := computation EOF\ncomputation := number (PLUS | MINUS) number\nnumber := INT | FLOAT<\/code><\/pre>\n<p>Now, here is the skeleton of our parser:<\/p>\n<pre><code class=\"language-py\">class Parser:\n    # ...\n\n    def parse_number(self) -&gt; Int | Float:\n        \"\"\"Parses an integer or a float.\"\"\"\n        ...\n\n    def parse_computation(self) -&gt; BinOp:\n        \"\"\"Parses a computation.\"\"\"\n        ...\n\n    def parse(self) -&gt; BinOp:\n        \"\"\"Parses the program.\"\"\"\n        ...<\/code><\/pre>\n<p>Notice how we have a parse method for...<\/p>","summary":"In the third part of building a Python compiler and interpreter we will make our parser, compiler, and interpreter, much more flexible with the visitor pattern.","date_modified":"2023-11-07T23:00:35+01:00","tags":["bpci","compilers","interpreters","programming","python"],"image":"\/user\/pages\/02.blog\/building-a-python-compiler-and-interpreter-03-visitor-pattern\/thumbnail.webp"},{"title":"Building a Python compiler and interpreter","date_published":"2023-11-03T00:00:00+01:00","id":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter","url":"https:\/\/mathspp.com\/blog\/building-a-python-compiler-and-interpreter","content_html":"<p>In this tutorial series we will build a Python compiler and interpreter from scratch. We start with simple arithmetic expressions.<\/p>\n\n<h1 id=\"building-a-python-compiler-and-interpreter\">Building a Python compiler and interpreter<a href=\"#building-a-python-compiler-and-interpreter\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>In this series of articles we will be implementing the Python programming language, from scratch, in Python.<\/p>\n<p>The end goal of this series is to explore and play around with the concepts and algorithms that are needed to implement a programming language like Python.\nTo that end, we will create a programming language with a subset of the features that Python has and, along the way, we will play with tokenizers, parsers, compilers, and interpreters!<\/p>\n<p>An important disclaimer is due: my role in this series is to take you with me in this exploratory journey as I dig into the workings of programming languages.\nIt'll be an opportunity for both of us to learn some new things.<\/p>\n<h2 id=\"setup\">Setup<a href=\"#setup\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<div class=\"notices yellow\">\n<p>This series will be written in Python 3.12.\nIf you're stuck in an older Python version, you should still be able to follow along as I won't be using many recent features.<\/p>\n<\/div>\n<p>The setup for this project is pretty minimal.\nWe can start by creating and activating a virtual environment:<\/p>\n<pre><code class=\"language-bash\">&#10095; python -m venv .venv\n&#10095; . .venv\/bin\/activate<\/code><\/pre>\n<p>Then, we can create a file <code>requirements.txt<\/code> with the following requirements:<\/p>\n<pre><code class=\"language-txt\">mypy\nblack\npytest<\/code><\/pre>\n<p>And we can install the requirements with<\/p>\n<pre><code class=\"language-bash\">&#10095; python -m pip install -r requirements.txt<\/code><\/pre>\n<p>Make sure everything was installed correctly with the following commands:<\/p>\n<pre><code class=\"language-bash\">&#10095; mypy --version\nmypy 1.6.1 (compiled: yes)\n\n&#10095; black --version\nblack, 23.10.1 (compiled: no)\nPython (CPython) 3.12.0\n\n&#10095; pytest --version\npytest 7.4.2<\/code><\/pre>\n<p>You don't need to have the exact same versions as I do, but you should have a version that is either close to mine or more recent than mine.<\/p>\n<h2 id=\"structure-of-the-program\">Structure of the program<a href=\"#structure-of-the-program\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h2>\n<h3 id=\"is-python-compiled-or-interpreted\">Is Python compiled or interpreted?<a href=\"#is-python-compiled-or-interpreted\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>Yes.<\/p>\n<p>The Python language has a compilation step and an interpretation step, and we will emulate that in our copy of Python.\nIf you've ever seen a folder <code>__pycache__<\/code> next to your Python code, that's a folder that contains compiled Python code.\nHowever, Python code isn't compiled into machine code.\nRather, it's compiled into bytecode.<\/p>\n<h3 id=\"python-bytecode-and-the-module-dis\">Python bytecode and the module <code>dis<\/code>\n<a href=\"#python-bytecode-and-the-module-dis\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h3>\n<p>Python bytecode is a set of simple instructions that Python code is compiled into.\nThen, there is a program that reads those bytecode instructions sequentially and interprets them, which is why we often say that Python is an interpreted language.<\/p>\n<p>During our explorations we will use the module <code>dis<\/code> a lot.\nThe module <code>dis<\/code> &ndash; which stands for disassemble &ndash; can let us peek at the bytecode behind many Python objects.<\/p>\n<p>For example, one of the simplest things you can do is pass a function to <code>dis.dis<\/code> and take a look at the result:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; import dis\n\n&gt;&gt;&gt; def f(a, b):\n...     return a + b\n...\n\n&gt;&gt;&gt; dis.dis(f)\n  1           0 RESUME                   0\n\n  2           2 LOAD_FAST                0 (a)\n              4 LOAD_FAST                1 (b)\n              6 BINARY_OP                0 (+)\n             10 RETURN_VALUE<\/code><\/pre>\n<p>You don't need to understand all the output right now.\nInstead,...<\/p>","summary":"In this tutorial series we will build a Python compiler and interpreter from scratch. We start with simple arithmetic expressions.","date_modified":"2023-11-05T01:18:07+01:00","tags":["bpci","compilers","interpreters","programming","python"],"image":"\/user\/pages\/02.blog\/building-a-python-compiler-and-interpreter\/thumbnail.webp"},{"title":"All functions return something","date_published":"2023-07-29T23:00:00+02:00","id":"https:\/\/mathspp.com\/blog\/all-functions-return-something","url":"https:\/\/mathspp.com\/blog\/all-functions-return-something","content_html":"<p>ALL Python functions return something and this article explains how and why.<\/p>\n\n<h1 id=\"does-print-return-something\">Does <code>print<\/code> return something?<a href=\"#does-print-return-something\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>Does the built-in <code>print<\/code> return something when you call it?\nIt doesn't!..<\/p>\n<p>Or does it?<\/p>\n<p>If you open a Python REPL and call <code>print<\/code>, you get the value you printed and nothing more:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; print(\"Hello, world!\")\nHello, world!<\/code><\/pre>\n<p>However, when you call a function that returns something, the value that is returned is shown in the output:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; def return_5():\n...     print(\"Hello, world!\")\n...     return 5\n...\n&gt;&gt;&gt; return_5()\nHello, world!\n5<\/code><\/pre>\n<p>So, it looks like <code>return_5<\/code> returns <code>5<\/code> and that <code>print<\/code> returns nothing...<\/p>\n<p>But that's not quite true.\nAnd the reason why it <em>looks<\/em> true is because of the way the REPL handles a certain value...<\/p>\n<h1 id=\"the-value-none-in-the-repl\">The value <code>None<\/code> in the REPL<a href=\"#the-value-none-in-the-repl\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>Open the REPL, type <code>None<\/code>, and press <kbd>Enter<\/kbd>.\nWhat do you expect to see?<\/p>\n<p>What if you assign <code>None<\/code> to a variable and then type the name of the variable into the REPL?<\/p>\n<p>What if you create a function that returns <code>None<\/code> and then call it?<\/p>\n<p>In all of the scenarios above, the REPL shows nothing:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; None\n&gt;&gt;&gt; my_value = None\n&gt;&gt;&gt; my_value\n&gt;&gt;&gt; def return_none():\n...     return None\n...\n&gt;&gt;&gt; return_none()<\/code><\/pre>\n<p>Notice that the REPL never shows any output.\nThat's because the REPL treats the value <code>None<\/code> in a special way and omits it from outputs!\nIf you want to see these values, you can print them, for example:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; print(None)\nNone\n&gt;&gt;&gt; print(my_value)\nNone\n&gt;&gt;&gt; print(return_none())\nNone<\/code><\/pre>\n<p>So, we are now aware that the value <code>None<\/code> is handled differently inside the REPL.\nNow, this is going to be a very important piece of information for what comes next.<\/p>\n<h1 id=\"all-functions-return-something\">All functions return something<a href=\"#all-functions-return-something\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>The truth of the matter is that <em><strong>ALL<\/strong><\/em> functions return <em>something<\/em>.\nAnd the ones that look like they don't?\nThat's because they return <code>None<\/code>.<\/p>\n<p>For example, here I'm assigning the return value of calling <code>print<\/code> to a variable and then printing the value of that variable:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; print_return = print(\"Hello, world!\")\nHello, world!\n&gt;&gt;&gt; print(print_return)\nNone<\/code><\/pre>\n<p>As another example, the method <code>.append<\/code> of Python lists also looks like it doesn't return anything.\nWrong!\nIt returns <code>None<\/code>:<\/p>\n<pre><code class=\"language-pycon\">&gt;&gt;&gt; my_list = [73, 42]\n&gt;&gt;&gt; append_return = my_list.append(0)\n&gt;&gt;&gt; print(append_return)\nNone<\/code><\/pre>\n<p>If <code>print<\/code> or <code>append<\/code> didn't return a thing, we wouldn't be able to assign the results of calling them to variables.\nBut we can.\nAnd we can print those values.\nSo, we know that those functions always return something.<\/p>\n<p>But now, things get <em>even<\/em> more interesting.\nSee, maybe <code>print<\/code> and <code>append<\/code> end with <code>return None<\/code>, right?\nSo, what if you write a function that doesn't have <code>return None<\/code> at the end?<\/p>\n<h1 id=\"no-return-functions-return-none\">No-return functions return <code>None<\/code><a href=\"#no-return-functions-return-none\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>Let me define an empty function:<\/p>\n<pre><code class=\"language-py\">def empty():\n    pass<\/code><\/pre>\n<p>If you use <a href=\"https:\/\/docs.python.org\/3\/library\/dis.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"external-link no-image\">the module <code>dis<\/code><\/a> to dissect that function, you will see the instructions that Python runs under the hood when you call the function.\nFor such a simple function, the result of dissecting it tells...<\/p>","summary":"ALL Python functions return something and this article explains how and why.","date_modified":"2023-07-30T00:45:56+02:00","tags":["compilers","programming","python"],"image":"\/user\/pages\/02.blog\/all-functions-return-something\/thumbnail.webp"},{"title":"TIL #072 \u2013 read bytecode from a .pyc file","date_published":"2023-07-25T00:00:00+02:00","id":"https:\/\/mathspp.com\/blog\/til\/read-bytecode-from-a-pyc-file","url":"https:\/\/mathspp.com\/blog\/til\/read-bytecode-from-a-pyc-file","content_html":"<p>Today I learned how to read the bytecode from a file of compiled Python bytecode (<code>.pyc<\/code>).<\/p>\n\n<h1 id=\"how-to-read-bytecode-from-a-pyc-file\">How to read bytecode from a <code>.pyc<\/code> file<a href=\"#how-to-read-bytecode-from-a-pyc-file\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>If you have a <code>.pyc<\/code> file, you can use the modules <code>dis<\/code> and <code>marshal<\/code> from the standard library to get the corresponding bytecode:<\/p>\n<pre><code class=\"language-py\">import dis\nimport marshal\n\nwith open(path_to_pyc_file, \"rb\") as f:\n    _ = f.read(16)  # Header is 16 bytes in 3.6+.\n    # _ = f.read(8)  # Header is 8 bytes in &lt;3.6.\n    loaded = marshal.load(f)\n\ndis.dis(loaded)<\/code><\/pre>\n<h1 id=\"example\">Example<a href=\"#example\" class=\"toc-anchor after\" data-anchor-icon=\"#\" aria-label=\"Anchor\"><\/a><\/h1>\n<p>Suppose that you have a file <code>fibonacci.py<\/code> with the following code:<\/p>\n<pre><code class=\"language-py\">def fibonacci(num):\n    \"\"\"Computes terms of the Fibonacci sequence.\"\"\"\n    if num &lt;= 1:\n        return 1\n    return fibonacci(num - 1) + fibonacci(num - 2)<\/code><\/pre>\n<p>If you import your function <code>fibonacci<\/code> into the REPL or from another file, Python will compile the bytecode and write it to a <code>.pyc<\/code> file.<\/p>\n<p>The quickest way to force Python to compile and dump the bytecode in a file is with this command:<\/p>\n<pre><code class=\"language-bash\">\u276f python -c \"import fibonacci\"<\/code><\/pre>\n<p>This will create a folder <code>__pycache__<\/code> (if it doesn't exist yet) and write the bytecode to a file.\nBecause I'm running Python 3.11 at the time of writing, the file that I got was <code>fibonacci.cpython-311.pyc<\/code>.<\/p>\n<p>Now, to get the bytecode back from that file, I run the code from above, but I specify the path to the file <code>fibonacci.cpython-311.pyc<\/code>:<\/p>\n<pre><code class=\"language-py\">import dis\nimport marshal\n\nwith open(\"__pycache__\/fibonacci.cpython-311.pyc\", \"rb\") as f:\n    _ = f.read(16)  # Header is 16 bytes in 3.6+.\n    loaded = marshal.load(f)\n\ndis.dis(loaded)<\/code><\/pre>\n<p>If you run the code above, you get the bytecode associated with the <code>.pyc<\/code> file you opened:<\/p>\n<pre><code class=\"language-txt\">  0           0 RESUME                   0\n\n  1           2 LOAD_CONST               0 (&lt;code object fibonacci at 0x100f997d0, file \"\/Users\/rodrigogs\/Documents\/tmp\/fibonacci.py\", line 1&gt;)\n              4 MAKE_FUNCTION            0\n              6 STORE_NAME               0 (fibonacci)\n              8 LOAD_CONST               1 (None)\n             10 RETURN_VALUE\n\nDisassembly of &lt;code object fibonacci at 0x100f997d0, file \"\/Users\/rodrigogs\/Documents\/tmp\/fibonacci.py\", line 1&gt;:\n  1           0 RESUME                   0\n\n  3           2 LOAD_FAST                0 (num)\n              4 LOAD_CONST               1 (1)\n              6 COMPARE_OP               1 (&lt;=)\n             12 POP_JUMP_FORWARD_IF_FALSE     2 (to 18)\n\n  4          14 LOAD_CONST               1 (1)\n             16 RETURN_VALUE\n\n  5     &gt;&gt;   18 LOAD_GLOBAL              1 (NULL + fibonacci)\n             30 LOAD_FAST                0 (num)\n             32 LOAD_CONST               1 (1)\n             34 BINARY_OP               10 (-)\n             38 PRECALL                  1\n             42 CALL                     1\n             52 LOAD_GLOBAL              1 (NULL + fibonacci)\n             64 LOAD_FAST                0 (num)\n             66 LOAD_CONST               2 (2)\n             68 BINARY_OP               10 (-)\n             72 PRECALL                  1\n             76 CALL                     1\n             86 BINARY_OP                0 (+)\n             90 RETURN_VALUE<\/code><\/pre>\n<p>That's it for now! <a href=\"\/subscribe\">Stay tuned<\/a> and I'll see you around!<\/p>","summary":"Today I learned how to read the bytecode from a file of compiled Python bytecode (.pyc).","date_modified":"2024-08-10T20:53:02+02:00","tags":["compilers","programming","python"],"image":"\/user\/pages\/02.blog\/04.til\/072.read-bytecode-from-a-pyc-file\/thumbnail.webp"}]}