In part 4 of this series we add some unit testing,
improve our tokenizer and implement the primitives ⍴
and ⍤
.
In the previous post I said I would be making less changes in between each blog post to make the blog posts smaller, but now I went back to check the changes I had to write about and I realised I did way too much to fit in a single blog post...
So while RGSPL v0.4 is out, I'm going to split it in a couple of blog posts. You can see all the changes from v0.3 to v0.4 here.
The RGSPL v0.4 is essentially v0.3 plus the exercises of the previous blog post, and that represents plenty of changes and additions to our code base. In this article we will only cover a few:
⍴
); and⍤
.Here are the changes we still have to go through in the next blog posts.
add class methods to deal with the n-cells of APL arrays and others;
homogenise the representation of APL scalars;
modify the index generator ⍳
function to make a distinction between 1-item vectors and scalars;
modify the without ~
function to work on the major cells of the left argument;
implement pretty printing of APL arrays;
add auxiliary decorators that do input checking;
implement the Boolean functions ∧∨⍲⍱
;
implement the encode and decode functions ⊤⊥
;
The whole code for this project is hosted in this GitHub repo and the code for this specific blog post is the release v0.4.
This link shows the diff with all the changes that happened since v0.3.
Unit testing is a really important thing for several reasons, for example it provides an automated way to test your code (and therefore gives you confidence that your code works as expected) and it helps you figure out when code refactoring broke something. If you also start by writing the unit testing for a certain feature and only then implement the feature, you are forced to write code that passes your tests, instead of writing tests that your code passes.
I don't have much experience with unit testing in Python so I went
to look for a simple alternative in the Python Standard Library and found unittest
,
so that is what I am using.
I figured I would be writing many more tests than the ones we wrote in the
previous blog post, so I decided to create a directory for the tests and created
a tests/utils.py
file with some utility functions:
"""
Utility functions used by the tests.
"""
import functools
import unittest
from rgspl import Interpreter, Parser, Tokenizer
from arraymodel import APLArray
def run(code):
"""Run a string containing APL code."""
return Interpreter(Parser(Tokenizer(code))).interpret()
def S(scalar):
"""Create an APL scalar."""
return APLArray([], [scalar])
def run_apl_code_decorator(assert_method):
"""Create a new assert method interpreting positional strings as APL code."""
@functools.wraps(assert_method)
def new_assert_method(*args, **kwargs):
i = 0
args = list(args) # to allow in-place modification.
# Run, as APL code, the first consecutive strings in the positional arguments.
while i < len(args) and isinstance(args[i], str):
args[i] = run(args[i])
i += 1
return assert_method(*args, **kwargs)
return new_assert_method
class APLTestCase(unittest.TestCase):
"""The assert methods preprocess the arguments by running the APL code.
A test case class that overrides some assert methods that start by running
the APL code in the arguments and only then applying the assertions over them.
"""
def __init__(self, *args, **kwargs):
unittest.TestCase.__init__(self, *args, **kwargs)
# Traverse all the methods of the unittest.TestCase, looking for assertX
# methods and decorating them accordingly.
for method_name in dir(self):
if method_name.startswith("assert") and not method_name.endswith("_"):
decorated = run_apl_code_decorator(getattr(self, method_name))
setattr(self, method_name, decorated)
The first two functions are really simple, as they represent simple
functions to run a Python str
as APL code and another one to
create an APLArray
holding a simple scalar like an int
or a float
.
The decorator run_apl_code_decorator
and the class APLTestCase
are
the more interesting things in this file.
The APLTestCase
is a class that inherits from unittest.TestCase
(which is the class you should inherit from when writing tests with the
unittest
framework) and then alters some of the unittest.TestCase
assert
methods, so that they take strings with APL code that get ran before
testing if things match or not.
If you don't know how unittest
works then this won't make any sense
to you, so go ahead and skim through unittest
's documentation.
In order to allow the APLTestCase
to find those assert methods, we use
some introspection through dir
and then run the appropriate functions
through the run_apl_code_decorator
decorator.
This decorator modifies functions in the following way: it traverses the
list of positional arguments and runs the first consecutive run of
strings as if they were APL code.
With those changes, we can do something like
assertEqual("2×4+5", "18")
in a test case, without having to explicitly wrap each argument with the run
function:
assertEqual(run("2×4+5"), run("18")) # annoying to do in EVERY SINGLE test case
After those utility functions and classes are implemented it is just a matter of writing a series of tests for the things that we already have.
RGSPL v0.4 still has no testing for operators but the functions that are already implemented have some coverage and so do the tokenizer and the array parsing mechanism.
You can find all the tests I wrote for v0.4 here. In writing tests for the tokenizer I immediately started finding bugs and errors, so let's cover those changes next.
Token
and Tokenizer
tweaksThe bug that I found was that some decimal numbers were not being tokenized appropriately.
For example, .3
is a valid decimal in APL but my tokenizer didn't support it.
Thankfully, this fix was fairly simple.
I just had to tweak the Tokenizer.get_integer
and Tokenizer.get_next_token
functions:
class Tokenizer:
# ...
def get_integer(self):
"""Parses an integer from the source code."""
start_idx = self.pos
while self.current_char and self.current_char.isdigit():
self.advance()
return self.code[start_idx:self.pos] or "0" # ← in case it's an empty integer part
# ...
def get_next_token(self):
"""Finds the next token in the source code."""
self.skip_whitespace()
self.skip_comment()
if not self.current_char:
return Token(Token.EOF, None)
if self.current_char in "¯.0123456789": # ← numbers can start with .
return self.get_number_token()
if self.current_char in Token.ID_CHARS:
return self.get_id_token()
if self.current_char in Token.WYSIWYG_MAPPING:
return self.get_wysiwyg_token()
self.error("Could not parse the next token...")
Because of the unit tests we need a way to check if two Token
objects
are the same, so I had to implement the __eq__
dunder method
for a Token
object:
class Token:
# ...
def __eq__(self, other):
return (
isinstance(other, Token)
and (self.type, self.value) == (other.type, other.value)
)
I also decided to create another constant list of Token
types,
ARRAY_TOKENS
, because I noticed I was already using that a lot,
I just didn't have a constant list to refer to:
ARRAY_TOKENS = [INTEGER, FLOAT, COMPLEX, ID]
# ...
class Parser:
# ...
def parse_statement(self):
"""Parses a statement."""
self.debug(f"Parsing statement from {self.tokens[:self.pos+1]}")
relevant_types = [Token.ASSIGNMENT, Token.RPARENS] + Token.FUNCTIONS + Token.MONADIC_OPS
statement = self.parse_vector()
while self.token_at.type in relevant_types:
if self.token_at.type == Token.ASSIGNMENT:
self.eat(Token.ASSIGNMENT)
statement = Assignment(Var(self.token_at), statement)
self.eat(Token.ID)
else:
function = self.parse_function()
if self.token_at.type in [Token.RPARENS] + Token.ARRAY_TOKENS: # ← use the constant list here
# ...
# ...
def parse_vector(self):
"""Parses a vector composed of possibly several simple scalars."""
self.debug(f"Parsing vector from {self.tokens[:self.pos+1]}")
nodes = []
while self.token_at.type in Token.ARRAY_TOKENS + [Token.RPARENS]:
if self.token_at.type == Token.RPARENS:
if self.peek_beyond_parens() in Token.ARRAY_TOKENS: # ← same here
# ...
I quickly grew tired of having the interpreter always print to my face
the debugging information that I get when I have the keyword
argument debug=True
on the Parser
, so I factored it out
as an optional argument to my CLI:
if __name__ == "__main__":
arg_parser = argparse.ArgumentParser(description="Parse and interpret an APL program.")
arg_parser.add_argument("-d", "--debug", action="store_true")
# ...
if args.repl:
print("Please notice that, from one input line to the next, variables aren't stored (yet).")
while inp := input(" >> "):
try:
print(Interpreter(Parser(Tokenizer(inp), debug=args.debug)).interpret()) # ← use new flag
# ...
elif args.code:
for expr in args.code:
print(f"{expr} :")
print(Interpreter(Parser(Tokenizer(expr), debug=args.debug)).interpret()) # ← ditto
The shape and reshape functions are, respectively, the monadic and dyadic cases
of the primitive ⍴
.
The shape function takes an array and returns a vector with the shape of the argument
array (which is basically what is stored in our APLArray.shape
attribute).
The reshape function takes a vector on the left and an array on the right, and changes
the right argument vector to have the shape specified by the left argument.
Because the shape of an APLArray
is specified by its shape
attribute,
we just need to update it and ensure the APLArray
instance
has enough data.
Overall, this turns out to be a really simple primitive to implement:
def rho(*, alpha=None, omega):
"""Define monadic shape and dyadic reshape.
Monadic case:
⍴ ⍳2 3
2 3
Dyadic case:
3⍴⊂1 2
(1 2)(1 2)(1 2)
"""
if alpha is None:
shape = [len(omega.shape)]
data = [S(i) for i in omega.shape]
return APLArray(shape, data)
else:
rank = len(alpha.shape)
if rank > 1:
raise ValueError(f"Left argument of reshape cannot have rank {rank}.")
if alpha.is_scalar():
shape = [alpha.data[0]]
else:
shape = [d.data[0] for d in alpha.data]
if not all(isinstance(i, int) for i in shape):
raise TypeError("Left argument of reshape expects integers.")
data_from = omega.data if len(omega.shape) > 0 else [omega]
# Extend the data roughly if needed, then truncate if needed.
data = data_from*(math.ceil(math.prod(shape)/len(data_from)))
data = data[:math.prod(shape)]
return APLArray(shape, data)
Even simpler than the shape and reshape functions is the atop operator ⍤
.
⍤
is very similar to ∘
and ⍥
which are already implemented, so it really is a breeze
to add this operator:
def atop(*, aalpha, oomega):
"""Define the dyadic atop ⍤ operator.
Monadic case:
f⍤g ⍵
f g ⍵
Dyadic case:
⍺ f⍤g ⍵
f ⍺ g ⍵
"""
def derived(*, alpha=None, omega):
return aalpha(alpha=None, omega=oomega(alpha=alpha, omega=omega))
return derived
This is a series that I am working slowly but steadily on. Feel free to ping me in the comments or over email if you'd like an estimate of when the next article is being published :) Until then, here is a list with all the articles in the series:
+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book “Pydon'ts” teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here 🐍🚀.