Build awareness and adoption for your software startup with Circuit.

Dataclasses :  init-only Properties

How do we define init-only properties using the dataclasses library?

Introduction

Dataclasses is a Python library that automatically generates boilerplate code for defining and initializing properties.

Let's look at the following example of a Person defined in a standard way:

class Person:

    def __int__(self, first_name: str, last_name: str, eye_color: str):
        self.first_name = first_name
        self.last_name = last_name
        self.eye_color = eye_color

and here is the same class defined using dataclasses:

from dataclasses import dataclass

@dataclass
class Person:
    first_name: str
    last_name: str
    eye_color: str

Can you spot the difference?

  1. The first difference is that the properties are declared just after the class definition. This makes it much easier to find, what are the properties of the class.
  2. The other one is that there is no definition of the __init__ method. The method is generated automatically based on the list of properties.

This example illustrates the primary use case of the dataclasses library. Comprehensive guidelines covering most of the library's functionalities already exist. Below are some notable ones.

In this guideline, I present a pattern that is not covered by the guidelines, which is essential to a fully flexible definition of the class properties — init-only variables.


Init-only variables

Suppose we have a scenario where we need to add a property to a class using a regular expression. In this case, we would define the property in the class constructor function. The constructor should take the pattern as a string, compile it, and assign the resulting Pattern object as an object property. If we were defining the class using the standard method, the code would look something like this:

import re

class Condition:

    def __int__(self, prop1: str, prop2: str, prop3: str, pattern_str: str):
        self.prop1 = prop1
        self.prop2 = prop2
        self.prop3 = prop3
        self.pattern = re.compile(pattern_str)

If we want to leverage the dataclasses module, we should define our class like this:

import re
from dataclasses import dataclass

@dataclass
class Condition:
    prop1: str
    prop2: str
    prop3: str
    pattern_str: str

Dataclasses generates the __init__ method based on the defined properties with a direct assignment of the values. This will work for our prop1, prop2, and prop3 properties, but for patter_str, we do not want to store the raw pattern, but the compiled one. To do so we utilize the __post_init__ method, that is invoked just after the auto-generated method __init__ and the code could look like this:

@dataclass
class Condition:
    prop1: str
    prop2: str
    prop3: str
    pattern_str: str

    def __post_init__(self):
        self.pattern = re.compile(self.pattern_str)

The current solution has improved, but it is still not ideal. The issue lies with the object's property pattern_str, which is only necessary for initialization purposes. Additionally, the pattern property is defined in the __post_init__ method, whereas we would prefer it to be declared in the same way as the other properties.

To effectively execute our use case, we need to make use of two mechanisms:

  • define the pattern_str property as an init-only property using InitVar,
  • define the pattern property as a property without an automatic initialization using field(init=False).
from dataclasses import dataclass, InitVar, field
from re import Pattern

@dataclass
class Condition:
    prop1: str
    prop2: str
    prop3: str
    pattern_str: InitVar[str]
    pattern: Pattern = field(init=False)

    def __post_init__(self, pattern_str: str):
        self.pattern = re.compile(pattern_str)

When a property is defined as InitVar, it has two effects. Firstly, the property's value is passed to the __post_init__ method. Secondly, the property is not a permanent property of the object. It is used only to signal to dataclasses that it will be used as an additional parameter passed to the constructor.

Let’s take a look at what the correct definition of the class should be.

import inspect
import re
from dataclasses import dataclass, InitVar, field
from re import Pattern


@dataclass
class Condition:
    prop1: str
    prop2: str
    prop3: str
    pattern_str: InitVar[str]
    pattern: Pattern = field(init=False)

    def __post_init__(self, pattern_str: str):
        self.pattern = re.compile(pattern_str)

After printing all class properties, we can observe the pattern property but not the pattern_str property.

import inspect

c = Condition("p1", "p2", "p3", "[a-z]+")
print(c.__dir__())

[‘prop1’, ‘prop2’, ‘prop3’, ‘pattern’, ‘__module__’, ‘__annotations__’, ‘__post_init__’, ‘__dict__’, ‘__weakref__’, ‘__doc__’, ‘__dataclass_params__’, ‘__dataclass_fields__’, ‘__init__’, ‘__repr__’, ‘__eq__’, ‘__hash__’, ‘__match_args__’, ‘__new__’, ‘__str__’, ‘__getattribute__’, ‘__setattr__’, ‘__delattr__’, ‘__lt__’, ‘__le__’, ‘__ne__’, ‘__gt__’, ‘__ge__’, ‘__reduce_ex__’, ‘__reduce__’, ‘__subclasshook__’, ‘__init_subclass__’, ‘__format__’, ‘__sizeof__’, ‘__dir__’, ‘__class__’]

We can verify the absence of the pattern variable in the generated __init__ method by checking its signature.

import inspect

print(inspect.signature(Condition.__init__))

The output will be as follows:

(self, prop1: str, prop2: str, prop3: str, pattern_str: dataclasses.InitVar[str]) -> None


Conclusions

The combination of InitVar and field(init=False) allows defining variables to be passed only to the constructor for initializing another property by assigning values and other operations.


References

  1. https://docs.python.org/3/library/dataclasses.html#init-only-variables



Continue Learning