This article explains how to extend the JSON format by using a custom encoder and a custom decoder to turn arbitrary Python objects into JSON and back.
The Python module json
allows you to work with the JSON data format.
In previous articles,
I've written about doing custom JSON encoding of arbitrary Python objects
and custom JSON decoding into arbitrary Python objects.
In this article I want to define a system that makes it easy to extend the JSON format, so that we can encode more Python objects into JSON and back.
My goal is to define a mechanism through which you can easily define small, atomic encoders and decoders, and to have them all operate together.
I think it will be easier to understand what I want to achieve if I show you how I want the end product to look like.
Suppose that you want to extend the JSON format so that you can also encode and decode complex numbers and Python range
objects.
This is what you want to do.
How do you achieve that?
When you are done with the article, you will be able to define something like this:
class ComplexAndRangeEncoder(...):
def encode_complex(self, c):
return {"real": c.real, "imag": c.imag}
def encode_range(self, r):
return {"start": r.start, "stop": r.stop, "step": r.step}
class ComplexAndRangeDecoder(...):
def decode_complex(self, obj):
return complex(obj["real"], obj["imag"])
def decode_range(self, obj):
return range(obj["start"], obj["stop"], obj["step"])
Then, you will be able to use these two classes as the cls
argument to the json
methods,
enabling you to encode complex numbers and ranges to JSON, and then decoding them back.
The point, here, is that I want to make it as easy as possible to extend the JSON standard, simply by providing the encoders and the decoders for each new type you want to be able to handle.
The main issue I have to struggle with is in defining the mechanism that will allow the custom JSON decoder to recognise that certain JSON objects should actually be parsed into something else. For example, in the previous article about custom JSON decoding, I showed how to convert the following JSON:
{
"real": 1.0,
"imag": 2.0
}
into the Python complex number \(1 + 2i\):
(1+2j)
However, suppose that we actually have the following Python dictionary:
dict_looks_like_complex = {
"real": 1.0,
"imag": 2.0,
}
If we convert this dictionary to JSON, we get a string:
'{"real": 1.0, "imag": 2.0}'
Now, if we use our custom decoder, we will get a complex number back instead of the original dictionary! Why? Because complex numbers and some dictionaries have the same JSON representations.
In mathematical terms, we say that the JSON encoding is not injective.
After all, I can find two objects obj1
and obj2
such that obj1 != obj2
and yet
json.dumps(obj1) == json.dumps(obj2)
.
We have two options here:
I will go with the second option.
The strategy that I will implement will revolve around using (JSON) dictionaries to encode our new arbitrary types, together with the usage of a special key to disambiguate between the non-standard encodings and native Python dictionaries that were unlucky enough to look like something else.
Let us say that the special key will be something like "__extended_json_type__"
.
Thus, whenever we encode a non-standard object into JSON, we have to annotate the resulting dictionary with that key.
The value of the key will indicate what is the type of the original object.
For example, here is what a complex number could look like:
>>> json.dumps(1+2j)
"""{
"__extended_json_type__": "complex",
"real": 1.0,
"imag": 2.0
}"""
As another example, a range
could look like this:
>>> json.dumps(range(1, 10, 3))
"""{
"__extended_json_type__": "range",
"start": 1.0,
"stop": 10.0,
"step": 3.0
}"""
By providing the key "__extended_json_type__"
, the decoder will know this was not a native Python dictionary and will be able to reconstruct the intended objects.
Except...
Suppose that, for some annoying reason, you have the following Python dictionary:
dict_ = {
"__extended_json_type__": "complex",
"real": 1.0,
"imag": 2.0
}
If you encode this to JSON, you will end up with the exact same JSON as the one we got for the complex number (1+2j)
...
So, are we back at square one?
I don't think so. The special key will help the decoder know what type of object it should build out of the non-standard JSON. The special key also makes it less likely for collisions to happen, although it does not get rid of them entirely.
At this point, I already have a pretty clear picture of what I have to do, and how, so let me show you the code and walk you through it.
import json
class ExtendedEncoder(json.JSONEncoder):
def default(self, obj):
name = type(obj).__name__
try:
encoder = getattr(self, f"encode_{name}")
except AttributeError:
super().default(obj)
else:
encoded = encoder(obj)
encoded["__extended_json_type__"] = name
return encoded
This isn't a lot of code, but it is not your typical for
loop,
so you may need to read the code twice to get what it is doing.
Let me give you a hand:
.default
from json.JSONEncoder
,
because that is what you have to do in order to implement custom JSON encoding of Python objects.__name__
in a Pydon't I wrote. To make this explanation simpler, suppose obj = complex(1, 2)
. Then, name = "complex"
.encode_
and that is then followed by the name of the type at hands.
In our example, we look for a method called encode_complex
.getattr
will raise an AttributeError
, which we catch.
At this point, the encoder has no idea how to encode the object of the given type, so we call the method .default
of the parent class,
because that is what the json
documentation says we should do.else
and we get to use the encoder to encode the object we have.This is what the code is doing. If something isn't clear, feel free to ask for further clarifications!
The decoding mechanism follows a similar approach, as I will show you next.
Here is the code for the decoding mechanism:
import json
class ExtendedDecoder(json.JSONDecoder):
def __init__(self, **kwargs):
kwargs["object_hook"] = self.object_hook
super().__init__(**kwargs)
def object_hook(self, obj):
try:
name = obj["__extended_json_type__"]
decoder = getattr(self, f"decode_{name}")
except (KeyError, AttributeError):
return obj
else:
return decoder(obj)
We subclass json.JSONDecoder
and, in the method __init__
,
we set the parameter object_hook
to the object hook that we define.
This is the object hook responsible for parsing non-standard JSON back into the original Python objects.
The object hook, itself, just undoes what the encoder does:
decode_complex
.Now that we have defined the encoding and decoding mechanisms,
we can extend the JSON standard with, for example, complex numbers and Python range
objects:
range
objectsAssuming you have the definitions of ExtendedEncoder
and ExtendedDecoder
,
this is how you could extend JSON to support complex numbers and range
objects:
import json
class ExtendedEncoder(json.JSONEncoder):
...
class ExtendedDecoder(json.JSONDecoder):
...
class MyEncoder(ExtendedEncoder):
def encode_complex(self, c):
return {"real": c.real, "imag": c.imag}
def encode_range(self, r):
return {"start": r.start, "stop": r.stop, "step": r.step}
class MyDecoder(ExtendedDecoder):
def decode_complex(self, obj):
return complex(obj["real"], obj["imag"])
def decode_range(self, obj):
return range(obj["start"], obj["stop"], obj["step"])
Then, you can use the custom encoder to encode some complex numbers and range
objects:
my_data = {
"hey": complex(1, 2),
"there": range(1, 10, 3),
73: False,
}
json_data = json.dumps(my_data, cls=MyEncoder)
Obviously, you can also go back to retrieve the original data:
decoded = json.loads(json_data, cls=MyDecoder)
print(decoded)
# {'hey': (1+2j), 'there': range(1, 10, 3), '!': False}
And that's it!
The classes ExtendedEncoder
and ExtendedDecoder
provide a convenient way of extending the JSON standard:
ExtendedEncoder
lets you define JSON encodings for non-standard Python objects; andExtendedDecoder
lets you define the way in which the JSON is decoded back into the original objects.I went through all this trouble because I needed this for another project of mine, so I'll package this up and open source it! Stay tuned!
That's it for now! Stay tuned and I'll see you around!
+35 chapters. +400 pages. Hundreds of examples. Over 30,000 readers!
My book “Pydon'ts” teaches you how to write elegant, expressive, and Pythonic code, to help you become a better developer. >>> Download it here 🐍🚀.