Python Dict vs Asdict
The dataclasses
library was introduced in Python 3.7, allowing us to make structured classes specifically for data storage. These classes have specific properties and methods to deal with data and its portrayal.
the dataclasses
Library in Python
To install the dataclasses
library, use the below command.
pip install dataclasses
Unlike a normal class in Python, the dataclasses
are implemented using the @dataclass
decorators with classes. Also, attribute declaration is made using type hints, which specify data types for the attributes in the dataclass
.
Below is a code snippet that puts the concept into practice.
# A bare-bones Data Class
# Don't forget to import the dataclass module
from dataclasses import dataclass
@dataclass
class Student:
"""A class which holds a students data"""
# Declaring attributes
# Making use of type hints
name: str
id: int
section: str
classname: str
fatherName: str
motherName: str
# Below is a dataclass instance
student = Student("Muhammad", 1432, "Red", "0-1", "Ali", "Marie")
print(student)
Output:
Student(name='Muhammad', id=1432, section='Red', classname='0-1', fatherName='Ali', motherName='Marie')
There are two points to note in the code above. First, a dataclass
object accepts arguments and assigns them to relevant data members without an _init_()
constructor.
This is so because the dataclass
provides a built-in _init_()
constructor.
The second point to note is that the print
statement neatly prints the data present in the object without any function specifically programmed to do this. This means it must have an altered _repr_()
function.
Why dict
Is Faster Than asdict
In most cases, where you would have used dict
without dataclasses, you certainly should continue using dict
.
However, the asdict
performs extra tasks during a copy call that might not be useful for your case. These extra tasks will have an overhead that you’d like to avoid.
Here’s what it does according to the official documentation. Each dataclass
object is first converted to a dict
of its fields as name: value
pairs.
Then, the dataclasses
, dicts
, lists, and tuples are recursed.
For instance, if you need recursive dataclass
dictification, go for asdict
. Otherwise, all the extra work that goes into providing it is wasted.
If you use asdict
in particular, then modifying the implementation of contained objects to use dataclass
will change the result of asdict
on the outer objects.
from dataclasses import dataclass, asdict
from typing import List
@dataclass
class APoint:
x1: int
y1: int
@dataclass
class C:
aList: List[APoint]
point_instance = APoint(10, 20)
assert asdict(point_instance) == {"x1": 10, "y1": 20}
c = C([APoint(30, 40), APoint(50, 60)])
assert asdict(c) == {"aList": [{"x1": 30, "y1": 40}, {"x1": 50, "y1": 60}]}
Moreover, the recursive business logic can in no way handle circular references. If you use dataclasses
to represent, well, let’s say, a graph, or some other data structure with circular references, the asdict
will certainly crash.
@dataclasses.dataclass
class GraphNode:
name: str
neighbors: list["GraphNode"]
x = GraphNode("x", [])
y = GraphNode("y", [])
x.neighbors.append(y)
y.neighbors.append(x)
dataclasses.asdict(x)
# The code will crash here as
# the max allowed recursion depth would have exceeded
# while calling the python object
# in case you're running this on jupyter notebook notice
# that the kernel will restart as the code crashed
Furthermore, asdict
builds a new dict
, the __dict__
though directly accesses the object’s dict
attribute.
It is important to note that the return value of asdict
won’t, by any means, be affected by the reassignment of the original object’s attributes.
Also, considering that asdict
uses fields if you add attributes to a dataclass
object that don’t map to declared fields, the asdict
won’t include them.
Lastly, although the docs don’t explicitly mention it, asdict
will call deep-copy on anything that isn’t a dataclass
instance, dict
, list, or tuple.
return copy.deepcopy(instance) # a very costly operation !
Dataclass
instance, dicts
, lists, and tuples go through the recursive logic, which additionally builds a copy just with the recursive dictification applied.
If you are reasonably well versed in the object-oriented paradigm, then you’d know that deep-copy is a costly operation on its own as it inspects every object to see what needs to be copied; the lack of memo handling essentially means that asdict
in all likelihood might create multiple copies of shared objects in nontrivial object graphs.
Beware of such a scenario:
from dataclasses import dataclass, asdict
@dataclass
class PointClass:
x1: object
y1: object
obj_instance = object()
var1 = PointClass(obj_instance, obj_instance)
var2 = asdict(var1)
print(var1.x1 is var1.y1) # prints true
print(var2["x1"] is var2["y1"]) # prints false
print(var2["x1"] is var1.x1) # prints false
Output:
True
False
False