Introduction
Dataclasses is a Python library that automatically generates boilerplate code for defining and initializing properties.
Let's look at the following example of a Person defined in a standard way:
class Person:
def __int__(self, first_name: str, last_name: str, eye_color: str):
self.first_name = first_name
self.last_name = last_name
self.eye_color = eye_color
and here is the same class defined using dataclasses:
from dataclasses import dataclass
@dataclass
class Person:
first_name: str
last_name: str
eye_color: str
Can you spot the difference?
- The first difference is that the properties are declared just after the class definition. This makes it much easier to find, what are the properties of the class.
- The other one is that there is no definition of the __init__ method. The method is generated automatically based on the list of properties.
This example illustrates the primary use case of the dataclasses library. Comprehensive guidelines covering most of the library's functionalities already exist. Below are some notable ones.
- Dataclasses: Your Secret Weapon for Productive Data Science by Ravish Kumar
- Python: Why DataClasses are Awesome by Pravash
- 9 Reasons Why You Should Start Using Python Dataclasses by Ahmed Besbes
In this guideline, I present a pattern that is not covered by the guidelines, which is essential to a fully flexible definition of the class properties — init-only variables.
Init-only variables
Suppose we have a scenario where we need to add a property to a class using a regular expression. In this case, we would define the property in the class constructor function. The constructor should take the pattern as a string, compile it, and assign the resulting Pattern object as an object property. If we were defining the class using the standard method, the code would look something like this:
import re
class Condition:
def __int__(self, prop1: str, prop2: str, prop3: str, pattern_str: str):
self.prop1 = prop1
self.prop2 = prop2
self.prop3 = prop3
self.pattern = re.compile(pattern_str)
If we want to leverage the dataclasses module, we should define our class like this:
import re
from dataclasses import dataclass
@dataclass
class Condition:
prop1: str
prop2: str
prop3: str
pattern_str: str
Dataclasses generates the __init__ method based on the defined properties with a direct assignment of the values. This will work for our prop1, prop2, and prop3 properties, but for patter_str, we do not want to store the raw pattern, but the compiled one. To do so we utilize the __post_init__ method, that is invoked just after the auto-generated method __init__ and the code could look like this:
@dataclass
class Condition:
prop1: str
prop2: str
prop3: str
pattern_str: str
def __post_init__(self):
self.pattern = re.compile(self.pattern_str)
The current solution has improved, but it is still not ideal. The issue lies with the object's property pattern_str, which is only necessary for initialization purposes. Additionally, the pattern property is defined in the __post_init__ method, whereas we would prefer it to be declared in the same way as the other properties.
To effectively execute our use case, we need to make use of two mechanisms:
- define the pattern_str property as an init-only property using InitVar,
- define the pattern property as a property without an automatic initialization using field(init=False).
from dataclasses import dataclass, InitVar, field
from re import Pattern
@dataclass
class Condition:
prop1: str
prop2: str
prop3: str
pattern_str: InitVar[str]
pattern: Pattern = field(init=False)
def __post_init__(self, pattern_str: str):
self.pattern = re.compile(pattern_str)
When a property is defined as InitVar, it has two effects. Firstly, the property's value is passed to the __post_init__ method. Secondly, the property is not a permanent property of the object. It is used only to signal to dataclasses that it will be used as an additional parameter passed to the constructor.
Let’s take a look at what the correct definition of the class should be.
import inspect
import re
from dataclasses import dataclass, InitVar, field
from re import Pattern
@dataclass
class Condition:
prop1: str
prop2: str
prop3: str
pattern_str: InitVar[str]
pattern: Pattern = field(init=False)
def __post_init__(self, pattern_str: str):
self.pattern = re.compile(pattern_str)
After printing all class properties, we can observe the pattern property but not the pattern_str property.
import inspect
c = Condition("p1", "p2", "p3", "[a-z]+")
print(c.__dir__())
[‘prop1’, ‘prop2’, ‘prop3’, ‘pattern’, ‘__module__’, ‘__annotations__’, ‘__post_init__’, ‘__dict__’, ‘__weakref__’, ‘__doc__’, ‘__dataclass_params__’, ‘__dataclass_fields__’, ‘__init__’, ‘__repr__’, ‘__eq__’, ‘__hash__’, ‘__match_args__’, ‘__new__’, ‘__str__’, ‘__getattribute__’, ‘__setattr__’, ‘__delattr__’, ‘__lt__’, ‘__le__’, ‘__ne__’, ‘__gt__’, ‘__ge__’, ‘__reduce_ex__’, ‘__reduce__’, ‘__subclasshook__’, ‘__init_subclass__’, ‘__format__’, ‘__sizeof__’, ‘__dir__’, ‘__class__’]
We can verify the absence of the pattern variable in the generated __init__ method by checking its signature.
import inspect
print(inspect.signature(Condition.__init__))
The output will be as follows:
(self, prop1: str, prop2: str, prop3: str, pattern_str: dataclasses.InitVar[str]) -> None
Conclusions
The combination of InitVar and field(init=False) allows defining variables to be passed only to the constructor for initializing another property by assigning values and other operations.