Today I learned how to disassemble Python code with the module
According to the documentation, the module
“supports the analysis of CPython bytecode by disassembling it. The CPython bytecode which this module takes as an input is defined in the file Include/opcode.h and used by the compiler and the interpreter.”
What does this mean? Wasn't Python an interpreted language?
I can't speak for all other implementations, but in CPython (the dominant Python implementation, which is written in C), Python code is compiled into an intermediate representation, the bytecode, and that bytecode is what is interpreted.
I won't go into the rabbit hole of really explaining what that means,
I'll just assume you are familiar enough with these ideas for me to explain what
In particular, I'll focus on the function
dis.dis accepts a variety of different objects, and then disassembles them for us,
showing us what their bytecode looks like.
For example, you can disassemble a function to see what its bytecode looks like:
>>> def foo(): ... return 42 ... >>> import dis >>> dis.dis(foo) 2 0 LOAD_CONST 1 (42) 2 RETURN_VALUE
This lifts the curtain a little bit, and lets us see a bit of how Python operates under the hood.
By using the function
dis.dis, we can, for example, realise that these three functions are essentially the same in the eyes of Python:
>>> def foo(): ... 1 ... 2 ... return 3 ... >>> dis.dis(foo) 4 0 LOAD_CONST 1 (3) 2 RETURN_VALUE # --- >>> def foo(): ... return 3 ... >>> dis.dis(foo) 2 0 LOAD_CONST 1 (3) 2 RETURN_VALUE # --- >>> def foo(): return 3 ... >>> dis.dis(foo) 1 0 LOAD_CONST 1 (3) 2 RETURN_VALUE
The only noticeable difference in the three outputs is in the top-left number:
it started out as
4, then went down to
2, and finally down to
Well, that's just the line number of the operation!
Line 1 is the line with the definition of the function header (
def foo():) and we just count from there!
So, it seems like the very first
foo, the one with two extra numbers,
def foo(): 1 2 return 3
actually ignores the
1 and the
2 because it realises that they do nothing.
However, it keeps the line numbering so that errors get reported properly.
In fact, we can compare the output from
dis.dis with what we get if we disassemble a function that contains comments, for example, instead of the two integers:
>>> def foo(): ... # Comment 1. ... # Comment 2. ... return 3 ... >>> dis.dis(foo) 4 0 LOAD_CONST 1 (3) 2 RETURN_VALUE # --- >>> def goo(): ... 1 ... 2 ... return 3 ... >>> dis.dis(goo) 4 0 LOAD_CONST 1 (3) 2 RETURN_VALUE
Notice how the printed bytecode looks exactly the same!
So yeah, it really looks like Python is completely ignoring the
1 and the
2 in there.
With the function
dis.dis we can also realise that CPython does some constant folding when compiling the Python code into bytecode!
What does this mean?
It means that some expressions that have a constant value are simplified when the code is compiled,
in order to save time.
For example, take a look at this simple function:
def foo(): return 2 + 2
We know that the addition
2 + 2 will always return
4, so CPython does us a favour and computes that once,
so that calling
foo doesn't actually require you to compute that trivial addition.
By checking the bytecode of
foo, we can see I'm not lying:
>>> dis.dis(foo) 2 0 LOAD_CONST 1 (4) 2 RETURN_VALUE
Notice how the first line of the output has
LOAD_CONST and, at the end of that line, it has a
That means Python is loading the constant value
4, to then return it with
Quite interesting, right?
That's it for now! Stay tuned and I'll see you around!
I hope you learned something new! If you did, consider following the footsteps of the readers who bought me a slice of pizza 🍕. Your small contribution helps me produce this content for free and without spamming you with annoying ads.