The open blogging platform. Say no to algorithms and paywalls.

Why you should start using Python dataclasses

Use Dataclasses in Python to store attributes

Photo by Franki Chamaki on Unsplash

Introduction

I hope we all are familiar how plain Python classes work and their use cases. Python also provides an indirect way to implement immutable classes using namedtuples. Wait, why are we talking about namedtuples here? Well, there is a reason why I am talking about them here.

For people, who are new to namedtuples please have a look here to understand more about it and it's applications.

How namedtuples can be used to store attibutes like classes

A namedtuple is a container provided as part of the collections module which has the same properties as a tuple. The only difference is we can access the members of a namedtuple using both , their keys and the dot operator. Just like tuples, namedtuples are immutable and there is no way to change the values of the attributes.

That's why we have data classes which are mutable. However, the functionality of dataclasses is not just limited to mutability.

Introducing dataclass

A dataclass is a Python module with which user-defined classes can be decorated and can be exclusively used to store data/attributes in Python. All these days we never had classes in python just to store attributes like we do in other languages.

However, the dataclass does not impose any restrictions to the user for just storing attributes. A dataclass can very well have regular instance and class methods.

Dataclasses were introduced from Python version 3.7. For Python versions below 3.7, it has to be installed as a library.

Installing dataclass in Python 3.6 and below

pip install dataclasses

Why dataclasses?

Dataclasses have a lot of boilerplate code required to create classes in python which includes the init method but not limited to it. The objective of these classes are to store data or… to be precise to store the state of data which can be modified later.

To understand the significance of dataclasses, let's write a plain Car class just to store data.

1. Creating regular Python class for storing attributes

class Car(object):

    def __init__(self, name: str, make: str, year: int, vehicle_type: str):
        """
        :param name: Name of the care
        :param make: Brand(Hyundai/Toyota)
        :param year: Making year
        :param vehicle_type: Sedan/SUV/Hatch
        """

        self.name = name
        self.make = make
        self.year = year
        self.vehicle_type = vehicle_type


car = Car("Jazz", "Honda", 2008, "Hatch")

print(car.make)
print(car.name)
print(car)

The above code gives the following output:

Output:
Honda
Jazz
<__main__.Car object at 0x000001D2CC934FD0>

We don't have a repr method and hence the class object is not as descriptive as we expect it to be.

I know what you are thinking. This class per se looks as good as it can be and does its job of storing the state of data.

Then why take the pain of implementing dataclasses? What good are they for?

Now, let's assume we need a better description for our Car class implementation. So let's implement a repr method.

2. Adding repr to a regular Python class

class Car(object):

    def __init__(self, name: str, make: str, year: int, vehicle_type: str):
        """
        :param name: Name of the care
        :param make: Brand(Hyundai/Toyota)
        :param year: Making year
        :param vehicle_type: Sedan/SUV/Hatch
        """

        self.name = name
        self.make = make
        self.year = year
        self.vehicle_type = vehicle_type

    def __repr__(self):
        return f"{self.__class__.__name__}({self.name!r}, {self.make!r}, {self.year!r}, {self.vehicle_type!r})"


car = Car("Jazz", "Honda", 2008, "Hatch")
print(car)

Now, let's see what it gives as output for the object instance.

Output:
Car('Jazz', 'Honda', 2008, 'Hatch')

This description is much better than the default one, isn't it? Now our Car storage class has 2 methods .Overwhelming already?

Since we have implemented a class solely to store the state of data, we occasionally may have to compare two instances of the class for equality. Let's do that with our Car class.

print(car == Car("Jazz", "Honda", 2008, "Hatch"))

Output:
False

Of course it will return False because we never implemented a eq method.

3. Adding eq to a regular Python class

class Car(object):

    def __init__(self, name: str, make: str, year: int, vehicle_type: str):
        """
        :param name: Name of the care
        :param make: Brand(Hyundai/Toyota)
        :param year: Making year
        :param vehicle_type: Sedan/SUV/Hatch
        """

        self.name = name
        self.make = make
        self.year = year
        self.vehicle_type = vehicle_type

    def __eq__(self, other):
        if other.__class__ is not self.__class__:
            return NotImplemented
        return (other.name, other.make, other.year, other.vehicle_type) == (self.name, self.make, self.year, self.vehicle_type)


    def __repr__(self):
        return f"{self.__class__.__name__}({self.name!r}, {self.make!r}, {self.year!r}, {self.vehicle_type!r})"


car = Car("Jazz", "Honda", 2008, "Hatch")

print(car == Car("Jazz", "Honda", 2008, "Hatch"))
Output:
True

Great!! We have got what we wanted. However, isn't this a lot for just to store the state of attributes in a class. While languages like Java have this since its inception, Python is bit lagging on this and never had classes just to store attributes.What if all the aformentioned methods are readily available to a user? That's exactly what dataclasses do.

What do dataclasses provide?

Dataclasses were introduced in PEP 557 as a module which provides a decorator that automatically generates dunder methods for user-defined classes. If the methods are already defined, the dataclass ignores them.

A data class provides the following methods

  • init

  • repr

  • eq

However, the auto-generated methods are not limited to the above mentioned ones. Based on the parameters provided to dataclass, the methods may vary.

Defining attributes in a dataclass:

The members in a dataclass are written using type annotations or type hints introduced in PEP526. For more information on type hints, kindly check my article here.

Now, let's rewrite our Car class with a dataclass and see how it makes our lives easier to store attributes.

from dataclasses import dataclass

@dataclass
class Car:
    name: str
    make: str
    year: int
    vehicle_type: str


car = Car("Jazz", "Honda", 2008, "Hatch")
print(car)
print(car.name)
print(car.make)
print(car.year)
print(car == Car("Jazz", "Honda", 2008, "Hatch"))


Output:
Car(name='Jazz', make='Honda', year=2008, vehicle_type='Hatch')
Jazz
Honda
2008
True

This is perfect. We have not implemented any init method or eq method. dataclass has auto-generated all of this for us including the repr. All the code we implemented in the traditional class has been replaced with just 5–6 lines of code.

Now, that's what makes dataclasses intriguing.

Let's check if we are able to modify the fields of a data class.

from dataclasses import dataclass, is_dataclass, make_dataclass, fields, asdict, astuple

@dataclass
class Car:
    name: str
    make: str
    year: int
    vehicle_type: str


car = Car("Jazz", "Honda", 2008, "Hatch")
car.year = 2019
print(car)

Output:
Car(name='Jazz', make='Honda', year=2019, vehicle_type='Hatch')

We are able to modify the field values successfully unlike namedtuples. I know that I am constantly trying to draw a comparison between namedtuples and dataclasses. The reason is namedtuples were the only way we could indirectly create class like attributes(Of course we could always use regular classes)for storage. Moreover, the attributes once defined cannot be modified in namedtuples. However, in dataclasses we can modify them. Also, the methods supported by namedtuples and dataclasses are almost similar which includes fields, asdict etc.

Methods supported by dataclasses

  • fields → Returns all the fields of the data class instance with their type,etc

  • asdict → Returns all the fields as dictionary

  • astuple → Returns all the fields as a tuple

  • make_dataclass → Create a new dataclass with the given params

  • is_dataclass → Returns True if the instance is of type dataclass

The asdict method returns all the attributes of a class as a dictionary and the astuple method returns the members as a tuple. The fields method returns all the fields of the dataclass. If you have a close look, few of these methods are available with namedtuples as well. This is why I was always trying to compare dataclasses with namedtuples. In addition to that, namedtuples can just be used to store the attributes of a variable.

from dataclasses import dataclass, is_dataclass, make_dataclass, fields, asdict, astuple

@dataclass
class Car:
    name: str
    make: str
    year: int
    vehicle_type: str


    def display(self):
        print(self.name)


car = Car("Jazz", "Honda", 2008, "Hatch")

print(asdict(car))
print(astuple(car))
print(fields(car))

Output:

{'name': 'Jazz', 'make': 'Honda', 'year': 2008, 'vehicle_type': 'Hatch'}
('Jazz', 'Honda', 2008, 'Hatch')
(Field(name='name',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x0000025D2D7C0CA0>,default_factory=<dataclasses._MISSING_TYPE object at 0x0000025D2D7C0CA0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD),
Field(name='make',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x0000025D2D7C0CA0>,default_factory=<dataclasses._MISSING_TYPE object at 0x0000025D2D7C0CA0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD),
Field(name='year',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x0000025D2D7C0CA0>,default_factory=<dataclasses._MISSING_TYPE object at 0x0000025D2D7C0CA0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD),
Field(name='vehicle_type',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x0000025D2D7C0CA0>,default_factory=<dataclasses._MISSING_TYPE object at 0x0000025D2D7C0CA0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD))

The rest of the methods are quite self-explanatory. So I just leave it here.

Summary

  • Dataclasses are very useful to create classes that just store attributes

  • Dataclasses resemble a lot with NamedTuples however namedtuples are immutable whereas dataclasses aren't (unless the frozen parameter is set to True.)

  • Dataclasses auto-generate a lot of dunder methods for the user-defined classes

References:




Continue Learning