Python 3.7: Introducing Data Classes

Python 3.7 is set to be released this summer, let’s have a sneak peek at some of the new features! If you’d like to play along at home with PyCharm, make sure you get PyCharm 2018.1 (or later if you’re reading this from the future).

There are many new things in Python 3.7: various character set improvements, postponed evaluation of annotations, and more. One of the most exciting new features is support for the dataclass decorator.

What is a Data Class?

Most Python developers will have written many classes which looks like:

Data classes help you by automatically generating dunder methods for simple cases. For example, a __init__ which accepted those arguments and assigned each to self. The small example before could be rewritten like:

A key difference is that type hints are actually required for data classes. If you’ve never used a type hint before: they allow you to mark what type a certain variable _should_ be. At runtime, these types are not checked, but you can use PyCharm or a command-line tool like mypy to check your code statically.

So let’s have a look at how we can use this!

The Star Wars API

You know a movie’s fanbase is passionate when a fan creates a REST API with the movie’s data in it. One Star Wars fan has done exactly that, and created the Star Wars API. He’s actually gone even further, and created a Python wrapper library for it.

Let’s forget for a second that there’s already a wrapper out there, and see how we could write our own.

We can use the requests library to get a resource from the Star Wars API:

This endpoint (like all swapi endpoints) responds with a JSON message. Requests makes our life easier by offering JSON parsing:

And at this point we have our data in a dictionary. Let’s have a look at it (shortened):

Wrapping the API

To properly wrap an API, we should create objects that our wrapper’s user can use in their application. So let’s define an object in Python 3.6 to contain the responses of requests to the /films/ endpoint:

Careful readers may have noticed a little bit of duplicated code here. Not so careful readers may want to have a look at the complete Python 3.6 implementation: it’s not short.

This is a classic case of where the data class decorator can help you out. We’re creating a class that mostly holds data, and only does a little validation. So let’s have a look at what we need to change.

Firstly, data classes automatically generate several dunder methods. If we don’t specify any options to the dataclass decorator, the generated methods are: __init__, __eq__, and __repr__. Python by default (not just for data classes) will implement __str__ to return the output of __repr__ if you’ve defined __repr__ but not __str__. Therefore, you get four dunder methods implemented just by changing the code to:

We removed the __init__ method here to make sure the data class decorator can add the one it generates. Unfortunately, we lost a bit of functionality in the process. Our Python 3.6 constructor didn’t just define all values, but it also attempted to parse dates. How can we do that with a data class?

If we were to override __init__, we’d lose the benefit of the data class. Therefore a new dunder method was defined for any additional processing: __post_init__. Let’s see what a __post_init__ method would look like for our wrapper class:

And that’s it! We could implement our class using the data class decorator in under a third of the number of lines as we could without the data class decorator.

More goodies

By using options with the decorator, you can tailor data classes further for your use case. The default options are:

  • init determines whether to generate the __init__ dunder method.
  • repr determines whether to generate the __repr__ dunder method.
  • eq does the same for the __eq__ dunder method, which determines the behavior for equality checks (your_class_instance == another_instance).
  • order actually creates four dunder methods, which determine the behavior for all lesser than and/or more than checks. If you set this to true, you can sort a list of your objects.

The last two options determine whether or not your object can be hashed. This is necessary (for example) if you want to use your class’ objects as dictionary keys. A hash function should remain constant for the life of the objects, otherwise the dictionary will not be able to find your objects anymore. The default implementation of a data class’ __hash__ function will return a hash over all objects in the data class. Therefore it’s only generated by default if you also make your objects read-only (by specifying frozen=True).

By setting frozen=True any write to your object will raise an error. If you think this is too draconian, but you still know it will never change, you could specify unsafe_hash=True instead. The authors of the data class decorator recommend you don’t though.

If you want to learn more about data classes, you can read the PEP or just get started and play with them yourself! Let us know in the comments what you’re using data classes for!

This entry was posted in Tutorial and tagged , . Bookmark the permalink.

18 Responses to Python 3.7: Introducing Data Classes

  1. Varun Ramesh says:

    It seems to me that the ‘StarWarsMovie’ dataclass will fail a static type check if a string is passed in as an argument for ‘release_date’, ‘created’, or ‘edited’. Since type annotations support unions, I think that ‘Union[datetime, str]’ might be the right annotation.

  2. Peter Norvig says:

    post_init could be

    for attr in [‘release_date’, ‘created’, ‘edited’]:
    if isinstance(getattr(self, attr), str):
    setattr(self, attr, dateutil.parser.parse(getattr(self, attr)))

    • Wiliam says:

      I wonder if this hinders readability or understanding. Does it? When do we star tto worry about these small details?

      • victor n. says:

        oh wow.

        Wiliam, i don’t know about readability/understanding but what Peter wrote above is what is more maintainable. it’s preferable (imo) to the original. all you need to do now is add attributes to that list above and everything automagically works.

      • Kevin says:

        Not really, it might look weird to someone with little coding experience in Python (<2 years) but with a few years of proficiency this sort of thing becomes commonplace. Although I probably wouldn’t write it exactly how the guy above did, or I’d at least surround all that getattr/setattr stuff with a comment explaining why this is done in a loop.

        The loop takes about 3 lines so if I have only 3 attributes to do this on I might not turn it into a loop/dynamic thing.

      • Anentropic says:

        it’s better in every way

    • Kevin Galkov says:

      I wonder if this hinders readability or understanding. Does it? When do we start to worry about these small details?

      • Chris Adams says:

        Kevin: I generally prefer this style because it makes it immediately obvious that all of the listed fields have intentionally identical behaviour. That might be a minor thing now but it tends to avoid bugs later when maintenance work means someone either has to confirm that intention or, worse, misses an instance and the behaviour is subtly no longer consistent.

    • Quaint Alien says:

      Love your code golfing tricks!

  3. tm says:

    Reviewing the non-dataclass class, if your constructor can take a str or datetime argument for the date objects, shouldn’t the __init__ arguments for the date objects be Union[str, datetime]?

    Also, mypy doesn’t like the way that the parse function is called with a typed datetime argument: Argument 1 to "parse" has incompatible type "datetime"; expected "Union[bytes, str, IO[str], IO[Any]]" Not sure how you rectified that.

  4. Darren says:

    That moves python closer to the scala case class
    https://docs.scala-lang.org/tour/case-classes.html

    Given python and scala are commonly used in big-data (Spark), some kind of python/scala convergence is not too surprising. The key difference, however, is that scala will catch type errors at compile-time.

    • Anentropic says:

      mypy will catch Python type errors at “build time” i.e. whenever you choose to run mypy, perhaps as part of your CI tests

  5. Brian Bruggeman says:

    on dataclasses, I’m not sure types are needed anymore: https://twitter.com/raymondh/status/959153776484470784?lang=en

    I think this is super awesome; Python should never require types.

    • Wagner Macedo says:

      And we lose readability. Like it or not, we programmers have to deal with data types every time, if there is a standard way to document the types (this was the main reason behind type hinting), why not to use?

  6. bc says:

    Better to use a library like Traits (https://pypi.org/project/traits/) or Atom (https://pypi.org/project/atom/).

  7. Eric Frederich says:

    What if the json-object returns a key which a reserved word or otherwise not a valid Python variable name?
    I supposed you could define a @classmethod called from_json_response or something which would then return something like cls(a=data[‘a’], b=data[‘b’], …etc) where a mapping of json names to python names could be enumerated. Unfortunately this seems to repeat a lot of code.

    I think golang lets you decorate structs saying what the JSON keys should be when serializing/deserializing.

Leave a Reply

Your email address will not be published. Required fields are marked *