The simple case of special string types – code smells series

This post is part of a 10-week series by Dino Esposito (@despos) around a common theme: code smells and code structure.

Good to see you back! In our series about code smells, so far we have seen various theoretical cases. Today, we will look at an example that can be found in many code bases: special string types and the Primitive Obsession code smell.

In this series: (we will update the table of contents as we progress)

Modern programming languages, including C# and Kotlin, make every type available as an object – regardless of their internal representation. Primitive types such as strings, numbers and booleans are often implemented internally as plain values and are boxed into objects for use. Once boxed into an object, those primitive types gain an array of methods for developers to perform a wide range of operations. For example, the String type is given methods for text operation such as trimming, counting, slicing, parsing and the like.

All these operations are specific to the type String and act on the collection of characters that actually make up the string object. A string object is only a generic container of characters, but sometimes those characters hold some semantics and require ad-hoc methods. Let’s look at an example of strings with a special meaning: URLs!

URLs

A URL is a string that identifies a network resource. Trimming the URL, or determining its length, are not key operations when compared to escaping the URL or making it relative.  To easily manipulate a URL, properties like PortScheme and Segments are also quite useful.

Not coincidentally, in the .NET Framework, you deal with URLs through a dedicated class: System.Uri. While a URL is in essence only a string, the System.Uri class treats it as a special type, recognizing it has the status of a business concern.

Using dedicated classes to encapsulate specific amounts of data is a good way to vaporize the Primitive Obsession code smell.

Your code becomes more readable when small value objects are used instead of raw primitives.

Are e-mail addresses just strings?

Another good example of primitive obsession is working with e-mail addresses in your code. While in the .NET Framework there is a type to work with e-mails (System.Net.Mail.MailAddress)., many of us will be using a string, right?

Let’s see what it takes to extract the server name from an email address.

At the end of the code, the variable server contains the desired substring. It works, but can you really guess what the code does at first glance? Using primitive types instead of clearly definable business-entity classes is a smell! In addition, using similar chunks of primitive-based code leads straight to duplicated code.

Here’s how to solve it:

The same code we’ve seen above now takes the following two lines:

Note that the GetServer method could easily be a property with the logic encapsulated in the getter. And for good measure, we may have to introduce a class Server which has a Name property, instead of returning a string here.

An immediate benefit is that all methods that could be performed on some particular data are not scattered and live side by side in the same logical container.

The “Value Object” pattern

The Email class in the sample code is a regular .NET class written to achieve a functional goal. Simple classes built around some semantical data are good candidates for the Value Object pattern. Domain-driven Design (DDD) pushes the use of this pattern as a way to more closely model a business domain.

A value object is a class that has no identity and is fully represented by the data it holds. Furthermore, for the same reason, the class is immutable. This means that the class will not offer any public member to alter the state. For example, an email address stored in an instance of the Email class can’t be changed. To work with a different email address, you need another instance of the Email class.

The Email class above is immutable, but not yet fully compliant with the Value Object pattern. To fully identify an instance of the Email class with the data it holds we need to alter the way the class is checked for equality. This requires overriding Equals and subsequently GetHashCode.

The new implementation of Equals compares two instances by simply comparing the value of the internal Address properties. As a result, two distinct instances of Email are considered just the same if they contain the same email address.

Tip: Equals and GetHashCode can be generated in ReSharper and Rider using the Generate action (Alt+Insert).

No, it’s not that obvious!

The most significant benefit we gain out of this refactoring is keeping the Email class much closer to the real-world idea of an email. The class has behavior-driven methods and has equality adapted to the context.

The approach of using small, simple and immutable classes to render small-but-significant pieces of data can be extended to a variety of other common cases – currency, address, temperature, quantity, and dates – even though a DateTime class exists in the .NET Framework.

Primitive obsession is an obsession you should get rid of. For the sake of your code.

Next week, we’ll take this approach a little bit further and tackle the Data Clump code smell. Stay tuned!

Download ReSharper 2018.1.2 or Rider 2018.1.2 and give them a try. They can help spot and fix common code smells! Check our code analysis series for more tips and tricks on automatic code inspection with ReSharper and Rider.

This entry was posted in How-To's and tagged , , , , . Bookmark the permalink.

10 Responses to The simple case of special string types – code smells series

  1. Lucas Trzesniewski says:

    “In the .NET Framework, there is a type for URLs but not for emails.”

    System.Net.Mail.MailAddress

    :-)

  2. Dino Esposito says:

    :) Correct! Just didn’t think of it. No excuses :)

  3. Jakub Januszkiewicz says:

    Hi Dino,

    just wanted to notice that your Equals and GetHashCode implementations have a bug – GetHashCode is case sensitive while Equals is not, so new Email(“foo”).Equals(new Email(“FOO”)) == true, but they have different hash codes. This breaks the GetHashCode contract.

    Other than that, a nice article, thanks :-)

  4. Jeevan says:

    Do you think that adding implicit cast operators to the Email class will make it more intuitive to work with?

    public static implicit operator Email(string email) => new Email(email);
    public static implicit operator string(Email email) => email.Address;

    This allows you to do things like:

    Email email = “user@server.com”;
    string emailStr = email;

    • Aaron says:

      Good question Jeevan. I’ve seen this done in other libraries and while I like that it “just works” with minimal amount of code, I have been spent some time confused by primitive types that are magically turned into objects. I finally cracked open the source code to figure out why it “just worked”.
      Sometimes the clarity of Email.Parse or new Email(string email) provides readability benefits.
      I am certainly not against the idea and would love to hear more opinions.
      #weCanButShouldWe

    • Dick Nagtegaal says:

      This is tricky. If you’d have a unit test where you use. var email % new Email(“a@b.c”); Assert.AreEqual(“a@b.c”, email); what would you expect?
      The compiler won’t know if it should use AreEqual(Email, Email) or AreEqual(string, string) and fall back to AreEqual(object, object)

  5. Pingback: Dew Drop - June 26, 2018 (#2753) - Morning Dew

  6. Nicholas Petersen says:

    In summary: don’t forget to be an object oriented programmer!

  7. Ralph says:

    Nice and well structured article.

    But I think there is a mistake, because Address.Substring(0, index) will return the username and not the server.

    And to be consequent, GetServer() should return a server object with a Name property or something like this.

    Looking forward to your next posts of this series!

Leave a Reply

Your email address will not be published. Required fields are marked *