Join data items that want to go together – code smells series

This post is part of a 10-week series by Dino Esposito (@despos) around a common theme: code smells and code structure.

In the previous post of this series, we presented the value of using data containers instead of plain primitives to provide a representation of data that is closer to the real domain. We used URL and Email custom types to replace strings. Both URL and email addresses, usually identified with strings, are ultimately richer entities which deserve their own business-specific programming model.

This approach can – and should! – be taken further. Which is what we will be looking at in this blog post.

In this series: (we will update the table of contents as we progress)

Now let us dive into another code smell: Data Clump!

Data Clump

Sometimes, you just find out that a small set of data items – whether plain primitives, value types or even complex types – are constantly passed around together across different sections of your code base. Those distinct values are referred to as a “data clump”.

Data Clump is a good example of a code smell. It is not a problem per-se, but it might be the indicator of a deeper problem that in the long run could easily degenerate into some forms of technical debt.

As an example, suppose you are modeling an object that represents a booking.

The constructor needs to take a couple of DateTime parameters to define the validity timeframe. Those two parameters will go together every time you create a new Booking in code, and probably every time bookings in a given timeframe are queried.

While sticking to use two parameters every time is not a mistake in itself, using a new TimeInterval class makes the code easier to read but also to extend. The good news is ReSharper and Rider come with the Extract class refactoring to help us out!

Tip: Extract class can be invoked from the context menu, or using the Refactor This action (Ctrl+Shift+R).

Refactor this - Extract class

The Extract Class refactoring takes a subclass out of the Booking class and adds just the highlighted members to it.

This new class can be further extended with one or more factory methods, or be tailor-made in any way that makes sense to your business domain.

You now have a new entity with its own personality, that actually plays a defined role in the business domain: the interval of time between two dates. Two sparse DateTime objects could have still performed this, but in a much more blurred way.

The Booking constructor may evolve as below:

Addressing the Data Clump code smell by extracting specific classes also makes your code inherently more extensible, because what moved around now is a wrapped-up object and not the mere collection of simpler data items. And a new standalone object can be extended at will with additional properties and methods when needed.

In our next post, let’s look at converting objects and readability. See you next week!

Download ReSharper 2018.1.2 or Rider 2018.1.2 and give them a try. They can help spot and fix common code smells! Check our code analysis series for more tips and tricks on automatic code inspection with ReSharper and Rider.

This entry was posted in How-To's and tagged , , , , . Bookmark the permalink.

12 Responses to Join data items that want to go together – code smells series

  1. J says:

    Do we really need From = from; in the last snippet?

  2. Dick Nagtegaal says:

    Why does the constructor of TimeInterval set From and To to DateTime.MinValue and MaxValue, while the properties From and To are nullable? I would expect these to be null when I invoke the constructor.

  3. Sebastian Andres says:

    Shouldn’t the take away here be that we now use a TimeInterval object in the Booking constructor instead of the two nullable DateTime parameters? Otherwise what is the point of saying:

    While sticking to use two parameters every time is not a mistake in itself, using a new TimeInterval class makes the code easier to read but also to extend.

    if at the end we still have two parameters?

    • It could be, indeed. But doesn’t have to be, right now the constructor that takes DateTime is a convenience for setting up the TimeInterval. Adding a second constructor that takes the TimeInterval is obviously possible as well.

  4. Somebody who cares about efficiency says:

    I’d say adding yet another class, depending on how many Bookings you expect to have in memory, is far smellier than having two fields, which isn’t smelly at all unless you’re looking to complicate things. And now your new Booking call is that much more complex too, because every time you now need new TimeInterval(v1,v2) instead of just v1,v2.

    • Jeff Odell says:

      Until either:

      1) There is some behavior exclusive to TimeInterval, or
      2) Naming the class clarified the relationship between the fields in a much stronger way

      I would tend to agree. A From and To field on a Booking is pretty clear and arguably clearer that TimeInterval.

  5. T says:

    I find it a bit strange that a booking could exist in the domain but without knowing when it starts or finishes. How is it a valid booking model without set what seems to me, it’s fundamental properties?

    Congratulations on booking, your are booked into Room 123 from 1st Jan 0001 to 31st Dec 9999. If you allow that booking into your application, I would suggest your code smell lies here rather than the use of primitives (although I generally agree on this point!)

    • You are absolutely right though. Unless longevity becomes a thing (and even then), bookings from year 0001 to 9999 don’t make much sense and validation should be added. But that would clutter this post a lot: should that validation be part of our domain classes? Should there be a specific class checking what is a valid booking? And what is a valid booking? Is it a week, two weeks, max. a month? And since there is a time component as well, can people check in before 12 PM? Or is it 3 PM?

      As you see, adding all of the validation logic etc. would clutter the main point of this blog post (which is that From and To belong together). Consider the examples in these posts not as “the single point of truth”, rather a second explanation of the text, in code :-)

      Thanks for your feedback though, and thanks for reading the series!

  6. Pingback: Dew Drop - July 3, 2018 (#2758) - Morning Dew

  7. Scott Hannen says:

    Another benefit of isolating a “data clump” into a class is that it keeps things neater if we need to add more properties.

    For example, what if the Booking needs to allow for alternate dates?

    I’d rather have Interval and AlternateInterval than To, From, AlternateTo, and AlternateFrom. You can create a list of preferred intervals.

Leave a Reply

Your email address will not be published. Required fields are marked *