Join data items that want to go together – code smells series

This post is part of a 10-week series by Dino Esposito (@despos) around a common theme: code smells and code structure.

In the previous post of this series, we presented the value of using data containers instead of plain primitives to provide a representation of data that is closer to the real domain. We used URL and Email custom types to replace strings. Both URL and email addresses, usually identified with strings, are ultimately richer entities which deserve their own business-specific programming model.

This approach can – and should! – be taken further. Which is what we will be looking at in this blog post.

In this series:

Now let us dive into another code smell: Data Clump!

Data Clump

Sometimes, you just find out that a small set of data items – whether plain primitives, value types or even complex types – are constantly passed around together across different sections of your code base. Those distinct values are referred to as a “data clump”.

Data Clump is a good example of a code smell. It is not a problem per-se, but it might be the indicator of a deeper problem that in the long run could easily degenerate into some forms of technical debt.

As an example, suppose you are modeling an object that represents a booking.

The constructor needs to take a couple of DateTime parameters to define the validity timeframe. Those two parameters will go together every time you create a new Booking in code, and probably every time bookings in a given timeframe are queried.

While sticking to using two parameters every time is not a mistake in itself, using a new TimeInterval class makes the code easier to read – and to extend. The good news is ReSharper and Rider come with the Extract class refactoring to help us out!

Tip: Extract class can be invoked from the context menu, or by using the Refactor This action (Ctrl+Shift+R).

Refactor this - Extract class

The Extract Class refactoring takes a subclass out of the Booking class and adds just the highlighted members to it.

This new class can be further extended with one or more factory methods, or be tailor-made in any way that makes sense to your business domain.

You now have a new entity with its own personality, which actually plays a defined role in the business domain: the interval of time between two dates. Two sparse DateTime objects could have still performed this, but in a much more blurred way.

The Booking constructor may evolve as below:

Addressing the Data Clump code smell by extracting specific classes also makes your code inherently more extensible, because what moved around now is a wrapped-up object and not the mere collection of simpler data items. And a new standalone object can be extended at will with additional properties and methods when needed.

In our next post, let’s look at converting objects and readability. See you next week!

Download ReSharper 2018.1.2 or Rider 2018.1.2 and give them a try. They can help spot and fix common code smells! Check our code analysis series for more tips and tricks on automatic code inspection with ReSharper and Rider.

This entry was posted in How-To's and tagged , , , , . Bookmark the permalink.

15 Responses to Join data items that want to go together – code smells series

  1. J says:

    Do we really need From = from; in the last snippet?

  2. Dick Nagtegaal says:

    Why does the constructor of TimeInterval set From and To to DateTime.MinValue and MaxValue, while the properties From and To are nullable? I would expect these to be null when I invoke the constructor.

  3. Sebastian Andres says:

    Shouldn’t the take away here be that we now use a TimeInterval object in the Booking constructor instead of the two nullable DateTime parameters? Otherwise what is the point of saying:


    While sticking to use two parameters every time is not a mistake in itself, using a new TimeInterval class makes the code easier to read but also to extend.

    if at the end we still have two parameters?

    • It could be, indeed. But doesn’t have to be, right now the constructor that takes DateTime is a convenience for setting up the TimeInterval. Adding a second constructor that takes the TimeInterval is obviously possible as well.

  4. Somebody who cares about efficiency says:

    I’d say adding yet another class, depending on how many Bookings you expect to have in memory, is far smellier than having two fields, which isn’t smelly at all unless you’re looking to complicate things. And now your new Booking call is that much more complex too, because every time you now need new TimeInterval(v1,v2) instead of just v1,v2.

    • Jeff Odell says:

      Until either:

      1) There is some behavior exclusive to TimeInterval, or
      2) Naming the class clarified the relationship between the fields in a much stronger way

      I would tend to agree. A From and To field on a Booking is pretty clear and arguably clearer that TimeInterval.

  5. T says:

    I find it a bit strange that a booking could exist in the domain but without knowing when it starts or finishes. How is it a valid booking model without set what seems to me, it’s fundamental properties?

    Congratulations on booking, your are booked into Room 123 from 1st Jan 0001 to 31st Dec 9999. If you allow that booking into your application, I would suggest your code smell lies here rather than the use of primitives (although I generally agree on this point!)

    • You are absolutely right though. Unless longevity becomes a thing (and even then), bookings from year 0001 to 9999 don’t make much sense and validation should be added. But that would clutter this post a lot: should that validation be part of our domain classes? Should there be a specific class checking what is a valid booking? And what is a valid booking? Is it a week, two weeks, max. a month? And since there is a time component as well, can people check in before 12 PM? Or is it 3 PM?

      As you see, adding all of the validation logic etc. would clutter the main point of this blog post (which is that From and To belong together). Consider the examples in these posts not as “the single point of truth”, rather a second explanation of the text, in code :-)

      Thanks for your feedback though, and thanks for reading the series!

  6. Pingback: Dew Drop - July 3, 2018 (#2758) - Morning Dew

  7. Scott Hannen says:

    Another benefit of isolating a “data clump” into a class is that it keeps things neater if we need to add more properties.

    For example, what if the Booking needs to allow for alternate dates?

    I’d rather have Interval and AlternateInterval than To, From, AlternateTo, and AlternateFrom. You can create a list of preferred intervals.

  8. Hi Dino, great discussion starter on where to refractor classes. The TimeInterval Class would also be somewhere where you could logically override ToString() to display a friendly date for a view or email. Or a function to work out how many days and hours until your booking starts, so that you can send out notifications.

  9. Ersky says:

    Moreover, Interval class can have many usefull ctors, like:
    Interval(Datetime from, int days)
    Interval(Datetime to, int days)
    which will be very usefull for everyday purposes…

  10. Guilherme says:

    Just a correction:

    “The good news is ReSharper and Rider come with the Extract class refactoring to help us out!”

    While ReSharper does have that refactoring option, Rider does not seem to have it. When can we expect it to be available for Rider?

Leave a Reply

Your email address will not be published. Required fields are marked *