The 10th article of this series adds support for elif
and else
statements.
elif
and else
This is the 10th article of the “Building a Python compiler and interpreter” series, so make sure you've gone through the first nine articles before tackling this one!
The code that serves as a starting point for this article is the tag v0.9.0 of the code in this GitHub repository.
The objective for this article is:
elif
and else
.Let's get into it.
else
In a conditional statement, the code that follows the else
is a list of statements that we want to run if the Boolean expression of the conditional statement evaluated to False
.
We can add this by expanding the node Conditional
to accomodate for a (possibly non-existent) else
body:
# parser.py
@dataclass
class Conditional(Statement):
condition: Expr
body: Body
orelse: Body | None = None # <-- New attribute.
And in order to get here, we need to be able to tokenize this new keyword:
# tokenizer.py
class TokenType(StrEnum):
# ...
ELSE = auto() # else
# ...
KEYWORDS_AS_TOKENS: dict[str, TokenType] = {
# ...
"else": TokenType.ELSE,
}
To be able to parse the else
and its body, we split the rule conditional
into two new rules: if_statement
and else_statement
.
Then, the original conditional
puts both together:
# ...
conditional := if_statement else_statement?
if_statement := IF expr COLON NEWLINE body
else_statement := ELSE COLON NEWLINE body
# ...
This means that the old parser method parse_conditional
now becomes parse_if_statement
, and we need to create two new parser methods, parse_else_statement
and parse_conditional
:
# parser.py
# ...
class Parser:
# ...
def parse_else_statement(self) -> Body: # <-- New method.
"""Parses an `else` statement and returns its body."""
self.eat(TokenType.ELSE)
self.eat(TokenType.COLON)
self.eat(TokenType.NEWLINE)
body = self.parse_body()
return body
def parse_if_statement(self) -> Conditional: # <-- Was `parse_conditional`.
"""Parses an `if` conditional and returns it."""
self.eat(TokenType.IF)
condition = self.parse_expr()
self.eat(TokenType.COLON)
self.eat(TokenType.NEWLINE)
body = self.parse_body()
return Conditional(condition, body)
def parse_conditional(self) -> Conditional: # <-- New method.
"""Parses a conditional block."""
if_statement = self.parse_if_statement()
if self.peek() == TokenType.ELSE: # The keyword `else` might not be there.
else_statement = self.parse_else_statement()
if_statement.orelse = else_statement
return if_statement
else
Now, we need to compile this new attribute orelse
that the nodes Conditional
might have.
This involves some jumping around, similar to how we handled Boolean short-circuiting.
If the condition is false, we need to jump past the body of the if
.
If we don't jump past the if
, then we reach the end we need to jump past the else
.
This is the code that implements these two jumps:
# compiler.py
# ...
class Compiler:
# ...
def compile_Conditional(self, conditional: Conditional) -> BytecodeGenerator:
condition_bytecode = self._compile(conditional.condition)
body_bytecode = list(self._compile(conditional.body))
orelse = conditional.orelse
orelse_bytecode = [] if orelse is None else list(self._compile(orelse))
# Add a “jump past the else” at the end of the `if` when needed:
if orelse_bytecode:
body_bytecode.append(
Bytecode(BytecodeType.JUMP_FORWARD, len(orelse_bytecode) + 1)
)
yield from condition_bytecode
yield Bytecode( # If the condition is false, jump past the body of the `if`.
BytecodeType.POP_JUMP_IF_FALSE, len(body_bytecode) + 1
)
yield from body_bytecode
yield from orelse_bytecode
For this to work, we need to introduce the new bytecode type that jumps forward no matter what:
# compiler.py
class BytecodeType(StrEnum):
# ...
JUMP_FORWARD = auto()
To interpret this new bytecode type, we just increment the program pointer:
# interpreter.py
class Interpreter:
# ...
def interpret_jump_forward(self, bc: Bytecode) -> None:
self.ptr += bc.value
elif
Now, with all of this work, we still need to implement the statement elif
.
What's quite neat is that you can get the elif
almost for free.
To understand how, look at the following snippet of code:
if cond1:
body1
elif cond2:
body2
else:
body3
The statement elif
is just syntactic sugar for nested if
statements:
if cond1:
body1
else:
if cond2:
body2
else:
body3
And since we already know how to compile if
and else
, it is enough for us to tokenize the keyword elif
and parse it into this structure.
Since the nested if
will be parsed into a Conditional
, which is a statement, we can plug that nested Conditional
straight into the attribute orelse
of the outer Conditional
!
But let's take this one step at a time. First, we tokenize the new keyword:
# tokenizer.py
class TokenType(StrEnum):
# ...
ELIF = auto() # elif
# ...
KEYWORDS_AS_TOKENS: dict[str, TokenType] = {
# ...
"elif": TokenType.ELIF,
}
Next, we modify the grammar to include a rule elif_statement
that is almost equal to the rule if_statement
:
# ...
if_statement := IF expr COLON NEWLINE body
elif_statement := ELIF expr COLON NEWLINE body
else_statement := ELSE COLON NEWLINE body
# ...
The rule conditional
now needs to take the elif
statements into account, which may be in any number:
# ...
conditional := if_statement ( elif_statement )* else_statement?
# ...
The new parser method that parses elif
statements is also almost equal to the method that parses if
statements, since the respective grammar rules are so similar.
Since we want to leverage the fact that an elif
can be seen as a nested if
, we don't need to create a new AST node for the elif
statements.
Instead, we can just parse an elif
into a node of the type Conditional
:
# parser.py
# ...
class Parser:
# ...
def parse_elif_statement(self) -> Conditional: # <-- New method.
"""Parses an `elif` conditional and returns it."""
self.eat(TokenType.ELIF)
condition = self.parse_expr()
self.eat(TokenType.COLON)
self.eat(TokenType.NEWLINE)
body = self.parse_body()
return Conditional(condition, body)
The fun part comes now.
How do we parse an unknown number of elif
statements?
How do we build the tree for this?
Since an elif
produces a nested Conditional
, we take that Conditional
and put it inside the orelse
of the node Conditional
that belongs to the if
.
This builds a structure that can look more or less like this:
Conditional(
cond1, # if cond1:
body1, # body1:
orelse=Conditional(
cond2, # elif cond2:
body2, # body2
),
)
Then, if there are any more elif
statements or an else
, those get plugged into the attribute orelse
of the inner Conditional
!
Thus, to parse a full conditional at this point, we
if
statement;elif
statements as there are, and every time we do:
elif
conditional in the attribute orelse
of the previous node of type Conditional
; andConditional
that has an available attribute orelse
in case we need to nest even more statements;else
if needed and put it in the attribute orelse
of the innermost node of type Conditional
.Here is the code for this algorithm:
# parser.py
# ...
class Parser:
# ...
def parse_conditional(self) -> Conditional:
"""Parses a conditional block."""
top_conditional = self.parse_if_statement()
innermost_conditional = top_conditional
while self.peek() == TokenType.ELIF:
elif_statement = self.parse_elif_statement()
innermost_conditional.orelse = Body([elif_statement])
innermost_conditional = elif_statement
if self.peek() == TokenType.ELSE:
else_statement = self.parse_else_statement()
innermost_conditional.orelse = else_statement
return top_conditional
And because we didn't introduce any changes to the structure of the AST, our compiler does not need to be modified and nor does the interpreter!
Don't tell anyone I said this, but I'm getting tired of writing tests for these things! And looking back on a couple of previous articles, testing the specific shape of the AST or the structure of the bytecode can be dangerous if we need to do non-trivial refactors later...
I need to fix this eventually, but for now I wrote a couple more tests. They're fairly long and they aren't necessarily super interesting, so I'll just link to the tests on GitHub instead of pasting them here:
In this article we:
else
;elif
is equivalent to a nested if
; andelif
at the expense of if
and else
only, without modifying the AST, the compiler, and the interpreter, further.You can get the code for this article at tag v0.10.0 of this GitHub repository.
The very next article will be slightly different because I want to improve the setup of this project. We've been spending enough time here that it is worth introducing a project management tool to help us with packaging, dependencies, and more. Then, I will look at comparison operators, chained comparison operators, and then, functions.
The exercises below will challenge you to try and implement a couple of features that we will implement eventually, so go ahead and take a look at those.
==
, !=
, <
, <=
, >
, >=
.while
loop. (You can go crazy and also try to add the keywords break
and continue
.)+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book “Pydon'ts” teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here 🐍🚀.