Join data items that want to go together – code smells series

Dino Esposito

This post is part of a 10-week series by Dino Esposito (@despos) around a common theme: code smells and code structure.

In the previous post of this series, we presented the value of using data containers instead of plain primitives to provide a representation of data that is closer to the real domain. We used URL and Email custom types to replace strings. Both URL and email addresses, usually identified with strings, are ultimately richer entities which deserve their own business-specific programming model.

This approach can – and should! – be taken further. Which is what we will be looking at in this blog post.

In this series:

Now let us dive into another code smell: Data Clump!

Data Clump

Sometimes, you just find out that a small set of data items – whether plain primitives, value types or even complex types – are constantly passed around together across different sections of your code base. Those distinct values are referred to as a “data clump”.

Data Clump is a good example of a code smell. It is not a problem per-se, but it might be the indicator of a deeper problem that in the long run could easily degenerate into some forms of technical debt.

As an example, suppose you are modeling an object that represents a booking.

public class Booking
{
    public Booking(int bookingId, int roomId, DateTime? from, DateTime? to)
    {
        BookingId = bookingId;
        RoomId = roomId;
        From = from;
        To = to;
    }

    public int BookingId { get; private set; }
    public int RoomId { get; private set; }
    public DateTime? From { get; private set; }
    public DateTime? To { get; private set; }
}

The constructor needs to take a couple of DateTime parameters to define the validity timeframe. Those two parameters will go together every time you create a new Booking in code, and probably every time bookings in a given timeframe are queried.

While sticking to using two parameters every time is not a mistake in itself, using a new TimeInterval class makes the code easier to read – and to extend. The good news is ReSharper and Rider come with the Extract class refactoring to help us out!

Tip: Extract class can be invoked from the context menu, or by using the Refactor This action (Ctrl+Shift+R).

Refactor this - Extract class

The Extract Class refactoring takes a subclass out of the Booking class and adds just the highlighted members to it.

public class TimeInterval
{
    public DateTime? From { get; set; }
    public DateTime? To { get; set; }
}

This new class can be further extended with one or more factory methods, or be tailor-made in any way that makes sense to your business domain.
public class TimeInterval
{
    public TimeInterval(DateTime? from = null, DateTime? to = null)
    {
        From = from;
        To = to;
    }

    public DateTime? From { get; set; }
    public DateTime? To { get; set; }
}

You now have a new entity with its own personality, which actually plays a defined role in the business domain: the interval of time between two dates. Two sparse DateTime objects could have still performed this, but in a much more blurred way.

The Booking constructor may evolve as below:

public Booking(int bookingId, int roomId, DateTime? from, DateTime? to)
{
    BookingId = bookingId;
    RoomId = roomId;
    Interval = new TimeInterval(from, to);
}

Addressing the Data Clump code smell by extracting specific classes also makes your code inherently more extensible, because what moved around now is a wrapped-up object and not the mere collection of simpler data items. And a new standalone object can be extended at will with additional properties and methods when needed.

In our next post, let’s look at converting objects and readability. See you next week!

Download ReSharper 2018.1.2 or Rider 2018.1.2 and give them a try. They can help spot and fix common code smells! Check our code analysis series for more tips and tricks on automatic code inspection with ReSharper and Rider.

Comments below can no longer be edited.

15 Responses to Join data items that want to go together – code smells series

  1. J says:

    July 2, 2018

    Do we really need `From = from;` in the last snippet?

  2. Dick Nagtegaal says:

    July 2, 2018

    Why does the constructor of TimeInterval set From and To to DateTime.MinValue and MaxValue, while the properties From and To are nullable? I would expect these to be null when I invoke the constructor.

  3. Sebastian Andres says:

    July 3, 2018

    Shouldn’t the take away here be that we now use a `TimeInterval` object in the `Booking` constructor instead of the two nullable DateTime parameters? Otherwise what is the point of saying:

    “`
    While sticking to use two parameters every time is not a mistake in itself, using a new TimeInterval class makes the code easier to read but also to extend.
    “`

    if at the end we still have two parameters?

    • Maarten Balliauw says:

      July 3, 2018

      It could be, indeed. But doesn’t have to be, right now the constructor that takes DateTime is a convenience for setting up the TimeInterval. Adding a second constructor that takes the TimeInterval is obviously possible as well.

  4. Somebody who cares about efficiency says:

    July 3, 2018

    I’d say adding yet another class, depending on how many `Booking`s you expect to have in memory, is far smellier than having two fields, which isn’t smelly at all unless you’re looking to complicate things. And now your `new Booking` call is that much more complex too, because every time you now need `new TimeInterval(v1,v2)` instead of just `v1,v2`.

    • Jeff Odell says:

      July 11, 2018

      Until either:

      1) There is some behavior exclusive to TimeInterval, or
      2) Naming the class clarified the relationship between the fields in a much stronger way

      I would tend to agree. A From and To field on a Booking is pretty clear and arguably clearer that TimeInterval.

  5. T says:

    July 3, 2018

    I find it a bit strange that a booking could exist in the domain but without knowing when it starts or finishes. How is it a valid booking model without set what seems to me, it’s fundamental properties?

    Congratulations on booking, your are booked into Room 123 from 1st Jan 0001 to 31st Dec 9999. If you allow that booking into your application, I would suggest your code smell lies here rather than the use of primitives (although I generally agree on this point!)

    • Maarten Balliauw says:

      July 3, 2018

      You are absolutely right though. Unless longevity becomes a thing (and even then), bookings from year 0001 to 9999 don’t make much sense and validation should be added. But that would clutter this post a lot: should that validation be part of our domain classes? Should there be a specific class checking what is a valid booking? And what is a valid booking? Is it a week, two weeks, max. a month? And since there is a time component as well, can people check in before 12 PM? Or is it 3 PM?

      As you see, adding all of the validation logic etc. would clutter the main point of this blog post (which is that From and To belong together). Consider the examples in these posts not as “the single point of truth”, rather a second explanation of the text, in code 🙂

      Thanks for your feedback though, and thanks for reading the series!

  6. Dew Drop - July 3, 2018 (#2758) - Morning Dew says:

    July 3, 2018

    […] Join data items that want to go together – code smells series (Dino Esposito) […]

  7. Scott Hannen says:

    July 3, 2018

    Another benefit of isolating a “data clump” into a class is that it keeps things neater if we need to add more properties.

    For example, what if the `Booking` needs to allow for alternate dates?

    I’d rather have `Interval` and `AlternateInterval` than `To`, `From`, `AlternateTo`, and `AlternateFrom`. You can create a list of preferred intervals.

  8. Peter Campbell says:

    July 20, 2018

    Hi Dino, great discussion starter on where to refractor classes. The TimeInterval Class would also be somewhere where you could logically override ToString() to display a friendly date for a view or email. Or a function to work out how many days and hours until your booking starts, so that you can send out notifications.

  9. Ersky says:

    July 26, 2018

    Moreover, Interval class can have many usefull ctors, like:
    Interval(Datetime from, int days)
    Interval(Datetime to, int days)
    which will be very usefull for everyday purposes…

  10. Guilherme says:

    August 20, 2018

    Just a correction:

    “The good news is ReSharper and Rider come with the Extract class refactoring to help us out!”

    While ReSharper does have that refactoring option, Rider does not seem to have it. When can we expect it to be available for Rider?

Subscribe

Subscribe to .NET Tools updates