Tips & Tricks Tutorials

Python 3.7: Introducing Data Classes

Python 3.7 is set to be released this summer, let’s have a sneak peek at some of the new features! If you’d like to play along at home with PyCharm, make sure you get PyCharm 2018.1 (or later if you’re reading this from the future).

There are many new things in Python 3.7: various character set improvements, postponed evaluation of annotations, and more. One of the most exciting new features is support for the dataclass decorator.

What is a Data Class?

Most Python developers will have written many classes which looks like:

class MyClass:
    def __init__(self, var_a, var_b):
        self.var_a = var_a
        self.var_b = var_b

Data classes help you by automatically generating dunder methods for simple cases. For example, a init which accepted those arguments and assigned each to self. The small example before could be rewritten like:

@dataclass
class MyClass:
    var_a: str
    var_b: str

A key difference is that type hints are actually required for data classes. If you’ve never used a type hint before: they allow you to mark what type a certain variable should be. At runtime, these types are not checked, but you can use PyCharm or a command-line tool like mypy to check your code statically.

So let’s have a look at how we can use this!

The Star Wars API

You know a movie’s fanbase is passionate when a fan creates a REST API with the movie’s data in it. One Star Wars fan has done exactly that, and created the Star Wars API. He’s actually gone even further, and created a Python wrapper library for it.

Let’s forget for a second that there’s already a wrapper out there, and see how we could write our own.

We can use the requests library to get a resource from the Star Wars API:

response = requests.get('https://swapi.co/api/films/1/')

This endpoint (like all swapi endpoints) responds with a JSON message. Requests makes our life easier by offering JSON parsing:

dictionary = response.json()

And at this point we have our data in a dictionary. Let’s have a look at it (shortened):

{
 'characters': ['https://swapi.co/api/people/1/',
                … ],
 'created': '2014-12-10T14:23:31.880000Z',
 'director': 'George Lucas',
 'edited': '2015-04-11T09:46:52.774897Z',
 'episode_id': 4,
 'opening_crawl': 'It is a period of civil war.\r\n … ',
 'planets': ['https://swapi.co/api/planets/2/',
     ...],
 'producer': 'Gary Kurtz, Rick McCallum',
 'release_date': '1977-05-25',
 'species': ['https://swapi.co/api/species/5/',
                 ...],
 'starships': ['https://swapi.co/api/starships/2/',
                   ...],
 'title': 'A New Hope',
 'url': 'https://swapi.co/api/films/1/',
 'vehicles': ['https://swapi.co/api/vehicles/4/',
                  ...]
}

Wrapping the API

To properly wrap an API, we should create objects that our wrapper’s user can use in their application. So let’s define an object in Python 3.6 to contain the responses of requests to the /films/ endpoint:

class StarWarsMovie:

   def __init__(self,
                title: str,
                episode_id: int,
                opening_crawl: str,
                director: str,
                producer: str,
                release_date: datetime,
                characters: List[str],
                planets: List[str],
                starships: List[str],
                vehicles: List[str],
                species: List[str],
                created: datetime,
                edited: datetime,
                url: str
                ):

       self.title = title
       self.episode_id = episode_id
       self.opening_crawl= opening_crawl
       self.director = director
       self.producer = producer
       self.release_date = release_date
       self.characters = characters
       self.planets = planets
       self.starships = starships
       self.vehicles = vehicles
       self.species = species
       self.created = created
       self.edited = edited
       self.url = url

       if type(self.release_date) is str:
           self.release_date = dateutil.parser.parse(self.release_date)

       if type(self.created) is str:
           self.created = dateutil.parser.parse(self.created)

       if type(self.edited) is str:
           self.edited = dateutil.parser.parse(self.edited)

Careful readers may have noticed a little bit of duplicated code here. Not so careful readers may want to have a look at the complete Python 3.6 implementation: it’s not short.

This is a classic case of where the data class decorator can help you out. We’re creating a class that mostly holds data, and only does a little validation. So let’s have a look at what we need to change.

Firstly, data classes automatically generate several dunder methods. If we don’t specify any options to the dataclass decorator, the generated methods are: __init__, __eq__, and __repr__. Python by default (not just for data classes) will implement __str__ to return the output of __repr__ if you’ve defined __repr__ but not __str__. Therefore, you get four dunder methods implemented just by changing the code to:

@dataclass
class StarWarsMovie:
   title: str
   episode_id: int
   opening_crawl: str
   director: str
   producer: str
   release_date: datetime
   characters: List[str]
   planets: List[str]
   starships: List[str]
   vehicles: List[str]
   species: List[str]
   created: datetime
   edited: datetime
   url: str

We removed the __init__ method here to make sure the data class decorator can add the one it generates. Unfortunately, we lost a bit of functionality in the process. Our Python 3.6 constructor didn’t just define all values, but it also attempted to parse dates. How can we do that with a data class?

If we were to override __init__, we’d lose the benefit of the data class. Therefore a new dunder method was defined for any additional processing: __post_init__. Let’s see what a __post_init__ method would look like for our wrapper class:

def __post_init__(self):
   if type(self.release_date) is str:
       self.release_date = dateutil.parser.parse(self.release_date)

   if type(self.created) is str:
       self.created = dateutil.parser.parse(self.created)

   if type(self.edited) is str:
       self.edited = dateutil.parser.parse(self.edited)

And that’s it! We could implement our class using the data class decorator in under a third of the number of lines as we could without the data class decorator.

More goodies

By using options with the decorator, you can tailor data classes further for your use case. The default options are:

@dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
  • init determines whether to generate the __init__ dunder method.
  • repr determines whether to generate the __repr__ dunder method.
  • eq does the same for the __eq__ dunder method, which determines the behavior for equality checks (your_class_instance == another_instance).
  • order actually creates four dunder methods, which determine the behavior for all lesser than and/or more than checks. If you set this to true, you can sort a list of your objects.

The last two options determine whether or not your object can be hashed. This is necessary (for example) if you want to use your class’ objects as dictionary keys. A hash function should remain constant for the life of the objects, otherwise the dictionary will not be able to find your objects anymore. The default implementation of a data class’ __hash__ function will return a hash over all objects in the data class. Therefore it’s only generated by default if you also make your objects read-only (by specifying frozen=True).

By setting frozen=True any write to your object will raise an error. If you think this is too draconian, but you still know it will never change, you could specify unsafe_hash=True instead. The authors of the data class decorator recommend you don’t though.

If you want to learn more about data classes, you can read the PEP or just get started and play with them yourself! Let us know in the comments what you’re using data classes for!

image description