Python 3.7: Introducing Data Classes
Python 3.7 is set to be released this summer, let’s have a sneak peek at some of the new features! If you’d like to play along at home with PyCharm, make sure you get PyCharm 2018.1 (or later if you’re reading this from the future).
There are many new things in Python 3.7: various character set improvements, postponed evaluation of annotations, and more. One of the most exciting new features is support for the dataclass
decorator.
What is a Data Class?
Most Python developers will have written many classes which looks like:
class MyClass: def __init__(self, var_a, var_b): self.var_a = var_a self.var_b = var_b
Data classes help you by automatically generating dunder methods for simple cases. For example, a init which accepted those arguments and assigned each to self. The small example before could be rewritten like:
@dataclass class MyClass: var_a: str var_b: str
A key difference is that type hints are actually required for data classes. If you’ve never used a type hint before: they allow you to mark what type a certain variable should be. At runtime, these types are not checked, but you can use PyCharm or a command-line tool like mypy to check your code statically.
So let’s have a look at how we can use this!
The Star Wars API
You know a movie’s fanbase is passionate when a fan creates a REST API with the movie’s data in it. One Star Wars fan has done exactly that, and created the Star Wars API. He’s actually gone even further, and created a Python wrapper library for it.
Let’s forget for a second that there’s already a wrapper out there, and see how we could write our own.
We can use the requests library to get a resource from the Star Wars API:
response = requests.get('https://swapi.co/api/films/1/')
This endpoint (like all swapi endpoints) responds with a JSON message. Requests makes our life easier by offering JSON parsing:
dictionary = response.json()
And at this point we have our data in a dictionary. Let’s have a look at it (shortened):
{ 'characters': ['https://swapi.co/api/people/1/', … ], 'created': '2014-12-10T14:23:31.880000Z', 'director': 'George Lucas', 'edited': '2015-04-11T09:46:52.774897Z', 'episode_id': 4, 'opening_crawl': 'It is a period of civil war.\r\n … ', 'planets': ['https://swapi.co/api/planets/2/', ...], 'producer': 'Gary Kurtz, Rick McCallum', 'release_date': '1977-05-25', 'species': ['https://swapi.co/api/species/5/', ...], 'starships': ['https://swapi.co/api/starships/2/', ...], 'title': 'A New Hope', 'url': 'https://swapi.co/api/films/1/', 'vehicles': ['https://swapi.co/api/vehicles/4/', ...] }
Wrapping the API
To properly wrap an API, we should create objects that our wrapper’s user can use in their application. So let’s define an object in Python 3.6 to contain the responses of requests to the /films/ endpoint:
class StarWarsMovie: def __init__(self, title: str, episode_id: int, opening_crawl: str, director: str, producer: str, release_date: datetime, characters: List[str], planets: List[str], starships: List[str], vehicles: List[str], species: List[str], created: datetime, edited: datetime, url: str ): self.title = title self.episode_id = episode_id self.opening_crawl= opening_crawl self.director = director self.producer = producer self.release_date = release_date self.characters = characters self.planets = planets self.starships = starships self.vehicles = vehicles self.species = species self.created = created self.edited = edited self.url = url if type(self.release_date) is str: self.release_date = dateutil.parser.parse(self.release_date) if type(self.created) is str: self.created = dateutil.parser.parse(self.created) if type(self.edited) is str: self.edited = dateutil.parser.parse(self.edited)
Careful readers may have noticed a little bit of duplicated code here. Not so careful readers may want to have a look at the complete Python 3.6 implementation: it’s not short.
This is a classic case of where the data class decorator can help you out. We’re creating a class that mostly holds data, and only does a little validation. So let’s have a look at what we need to change.
Firstly, data classes automatically generate several dunder methods. If we don’t specify any options to the dataclass
decorator, the generated methods are: __init__
, __eq__
, and __repr__
. Python by default (not just for data classes) will implement __str__
to return the output of __repr__
if you’ve defined __repr__
but not __str__
. Therefore, you get four dunder methods implemented just by changing the code to:
@dataclass class StarWarsMovie: title: str episode_id: int opening_crawl: str director: str producer: str release_date: datetime characters: List[str] planets: List[str] starships: List[str] vehicles: List[str] species: List[str] created: datetime edited: datetime url: str
We removed the __init__
method here to make sure the data class decorator can add the one it generates. Unfortunately, we lost a bit of functionality in the process. Our Python 3.6 constructor didn’t just define all values, but it also attempted to parse dates. How can we do that with a data class?
If we were to override __init__
, we’d lose the benefit of the data class. Therefore a new dunder method was defined for any additional processing: __post_init__
. Let’s see what a __post_init__
method would look like for our wrapper class:
def __post_init__(self): if type(self.release_date) is str: self.release_date = dateutil.parser.parse(self.release_date) if type(self.created) is str: self.created = dateutil.parser.parse(self.created) if type(self.edited) is str: self.edited = dateutil.parser.parse(self.edited)
And that’s it! We could implement our class using the data class decorator in under a third of the number of lines as we could without the data class decorator.
More goodies
By using options with the decorator, you can tailor data classes further for your use case. The default options are:
@dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
- init determines whether to generate the
__init__
dunder method. - repr determines whether to generate the
__repr__
dunder method. - eq does the same for the
__eq__
dunder method, which determines the behavior for equality checks (your_class_instance == another_instance
). - order actually creates four dunder methods, which determine the behavior for all lesser than and/or more than checks. If you set this to true, you can sort a list of your objects.
The last two options determine whether or not your object can be hashed. This is necessary (for example) if you want to use your class’ objects as dictionary keys. A hash function should remain constant for the life of the objects, otherwise the dictionary will not be able to find your objects anymore. The default implementation of a data class’ __hash__
function will return a hash over all objects in the data class. Therefore it’s only generated by default if you also make your objects read-only (by specifying frozen=True
).
By setting frozen=True
any write to your object will raise an error. If you think this is too draconian, but you still know it will never change, you could specify unsafe_hash=True
instead. The authors of the data class decorator recommend you don’t though.
If you want to learn more about data classes, you can read the PEP or just get started and play with them yourself! Let us know in the comments what you’re using data classes for!