Tips & Tricks

Code Smells: Mutation

This is part of a series investigating code that looks suspicious (“code smells”), and exploring possible alternatives.

In the article about deeply nested code I wanted to change getMappedField to return an Optional value. The change to the method was fairly trivial, the change to the code that calls that method… well, that turned out to be another matter altogether.

One of the places this method is used is in validateQuery, a complex method that does rather a lot more than you might expect. When I changed getMappedField to return an Optional, I ended up replacing null checks with a surprisingly large number of isPresent checks:

Large number of isPresent checks

That’s not all of them; this method is nearly a hundred lines long, and it doesn’t fit in a single screenshot (this will be a topic for another article). My first reaction was irritation that Optional not only didn’t solve my null check problem, but made my code uglier with the addition of these checks and the associated gets. But of course this was just a sign of other underlying problems.

The Smell: Mutable Values

This method is long enough to contain more than one smell. For this article, I want to focus on the first issue I noticed: the mf variable is exactly that – variable. It’s a mutable value, and is reassigned under some circumstances. We can see this if we make mf final.

Value is resassigned

If the getMappedField method doesn’t return a value (and the fieldIsArrayOperator flag is false, whatever that means) we go off and try to assign mf to the value from getMappedFieldByJavaField. I assume readers don’t have a clear idea of the context of this specific code, but to be honest I’ve been working with this project on and off for years and I don’t really understand what it’s doing either. This is what being a developer is all about: it’s hard to write code, but it’s often harder to read it.

Anyway.

The point is this mf value is being changed, and this makes it even harder to reason about.  For example, when we check if it’s present later in the code, if the value is empty we don’t know if it’s empty because getMappedField returned no value or if it’s because getMappedFieldByJavaField returned no value. We just don’t know what it means if it is empty.

We could try and address this specific problem by moving the logic that can reassign the value into its own method and giving that a useful name. Then we should have an idea of what our variable is and why it may, or may not, be present. If we ask IntelliJ IDEA to extract this code into a method…

Extract method fails

…it can’t be done, because of at least one other mutable value.

We’re modifying the values inside a pathElements array. This doesn’t stop us from doing the refactoring, as we can pass this array in as a parameter and alter it in the method, but there is a problem with altering this value. More on this later.

The modification that stops us from being able to extract the method is when we’re setting a flag hasTranslations to true under some circumstances. Turns out that this flag is an optimization – if it has been set to true, then we need to modify another value:

Parameter origProp is changed

If we look more closely at this code, we might wonder about two things. First, why are we changing a value with the name origProp? It’s not going to be the original property after the change, is it? Second, why are we reusing and resetting a StringBuilder? Seems like an odd thing to do, rather than using a new String value, or setting a String value in some domain object.

Turns out we’re changing a StringBuilder because this code intentionally alters an input parameter value. This code not only makes heavy use of mutability to set the state within a long and complicated method, but it actually alters the values of the input parameters, presumably because Java doesn’t support multiple output values and this seemed like an OK compromise (note: it’s not).

So to summarize the problems introduced here by mutating values:

  1. Mutating the value of mf makes the complex logic that follows very difficult to reason about.
  2. Setting a boolean flag to be used later in the method makes it very difficult to extract sections of the code into more readable small methods.
  3. Changing an input parameter adds extra complexity into this method and, most importantly, is potentially extremely surprising to the caller of this method – we generally don’t expect our parameters to be changed when we pass them into a method.

We have a long, complex, difficult to understand method that is extremely hard to refactor under these circumstances. And that’s just looking at the mutability smells in this code.

Unlike previous articles which showed similar but unrelated examples, the refactoring suggestions in this article takes the form of a series of steps that need to be performed in order. These steps may not be the first that occur to you (they weren’t the first ideas I tried), but they do try to address one small problem at a time and impact the code as little as possible.

Step 1: Stop mutating the array we’re iterating over

The first problem I’m going to tackle is the mutation of the pathElements array. By removing the code that alters pathElements, I hope to a) get rid of the rather nasty smell of altering an array/collection I am iterating over and b) remove the need for the hasTranslations flag. Part b) should help us extract some of this enormous code into a simpler method which will limit the mutation of the mf variable to just one, easier-to-reason-about method.

First I introduce another object to store the new values (rather than changing the original object to contain the new values). I’ve chosen to use a List<String> for this, because I like working with collections and because Streams gives us a nice feature which I’ll use later in the method. I could just have easily have chosen an array of Strings; in this case it’s totally personal choice (you might be trading readability for performance, so this will depend upon your own code).

Introduce databasePathElements

I create and initialize this collection in the same place the original pathElements array was created, and I’m going to initialize it with the same values.

Next, where I changed the values in pathElements, I now want to change the values in databasePathElements. For now, I’m going to carry on doing both changes to try and reduce the chances of the code breaking.

Set new values on databasePathElements

Finally, I’m going to use databasePathElements instead of pathElements to set the new values of the StringBuilder origProp.

Use databasePathElements to set the StringBuffer

Note that I’ve used one of my favorite features from Java 8, Collectors.joining. This has replaced 5 lines of code with a single line – it’s really useful for producing CSV content or, in this case, turning a collection into a String separated by dots. The performance of this method may be different from the original code (plus we’re using a Collection instead of an array), so if nano-second performance matters to you, you should performance-test this. I expect for most applications the impact is negligible, and the new syntax reduces boilerplate.

When I run all the tests, everything still passes. I go back and remove the code that alters pathElement (it was on line 76, here I’ve clicked on the blue change indicator in the gutter in IntelliJ IDEA to show the old code vs the new code).

Delete line that changes array

Running the tests again shows this all works as before.

This is a small change and I’ve limited it to only impact the internals of this method to minimize the chances of errors. It may seem a bit pointless but it is step one to reducing the complexity in this method.

Step 2: Remove hasTranslations flag

This flag is causing me all sorts of trouble when I try to extract a smaller method. One approach is to perform a check between the original path and our new path, to see if it has changed instead of making a note of when it has changed.

Check if the path has changed

This is clearly a more expensive check (performance-wise) than a simple boolean check – not only do we do a String comparison, but we also have to build our new databasePath String regardless of whether we’re going to use it or not. I assume the original boolean was there to save us having to build the new String if we didn’t need it (there is no test or documentation so I’m guessing), so this latest version of the code doesn’t seem to have any real value.

Given that Morphia a) uses java.lang.reflect extensively and b) is designed to talk over a network to a database, nano-second CPU-instruction-level performance is not our biggest concern. Also, given there are no performance tests/perf criteria for this part of the code, I’m going to make an executive decision and simply update the value with databasePath every time, regardless of whether it’s different or not. I expect the performance impact to be negligible, and the readability impact will be very positive.

So I remove the check:

Remove the check completely

…and now I can remove the flag:

Remove the flag

Running all my tests again shows everything still works.

Step 3: Extract Method

Now I’ve removed this flag and introduced a new Collection for storing the updated database path, I should be able to extract a method that limits the scope of the mutation of MappedField mf to a small, easy-to-reason-about method.

Extract method for getting the MappedField

It’s not a particularly pretty method, as it takes a lot of parameters for just a few lines of code, but these can be simplified later. Our focus right now is limiting mutability and making the very long method easier to reason about.

Step 4: Introduce a new type

Next I want to tackle the smell of altering the value of one of the input parameters. If we needed to return more than one value, we either need to change the structure of the object that’s being returned, or return a new type that encapsulates all the returned values.

In order to decide which approach to take, we need to think about the values we’re talking about: what are they, where do they live, what other data are they associated with? The two return values from this method are the MappedField that is the original return type, and a String value that represents this field’s path in the MongoDB database. I can’t add a new value to MappedField to store its database name for the simple reason that the returned MappedField is sometimes null, but the database path is always known (it’s part of the query so we always have it).

The simplest solution to our problem, therefore, is to change this method to return a new type, ValidatedField, which wraps the original MappedField that was returned, and the String representing the database path. It’s not the prettiest solution at this stage, but given the current state of the method, a) anything to simplify the method is an improvement, and b) we may find ourselves using this for later refactoring, or we may find that this is a missing domain concept (we may rename it later when we discover what its true purpose is).

The first part of this is to introduce a new type which wraps the MappedField. I’m going to use an inner class right now; I may refactor this later if/when it becomes clear where it lives. I’m also going to use an Optional field, even though this is generally not recommended. It’ll make the refactoring a bit more straightforward, and I can always address this later.

Introduce new ValidatedField

I can remove retVal because it’s not being used anymore.

Remove retVal

I’ve changed the method to return this new ValidatedField type, so I’m going to have to fix up any compilation errors from this – anywhere that calls the original method is expecting a MappedField not a ValidatedField.

Change callers to use new type

For now I’ve done the simplest thing that works and getting the MappedField from the new ValidatedField. Rerunning all the tests shows this works as expected.

Step 5: Return the database path value instead of mutating a parameter

The next step is to add the database path to this ValidatedField object, so that we can stop the travesty that is the mutation of the StringBuilder parameter.

Firstly, we add a database path field to ValidatedField.

Add database path to ValidatedField

I’m not going to remove the old code just yet, but I’m going to move over to using the new value first. I locate all the callers of this method using Find Usages (Alt+F7), and any that cared about the changed value of the origProp parameter need to be changed to use the new databasePath field instead.

Change callers to use the new database path

When I re-run the tests, I find that I’ve missed a step – the code that sets the database path in ValidatedField isn’t always reached in all cases, so I need to make sure I initialize the value in all cases.

Initialise database path correctly

Running the tests shows everything works now – thank goodness! – so we can do some of the cleanup that should make our code easier to understand. We can delete the code that altered the origProp parameter:

Delete code that changes origProp

Now that we’re not doing that nasty hack of using a StringBuilder to let us change what is effectively a String parameter, we can change this into a straightforward String.

Change StringBuilder parameter to a String

Finally, we can do some tidying up and replace all references to origProp and delete the parameter.

Remove StringBuilder from the callers

We should also check all the calling code, we can probably delete a lot of StringBuilder creation.

Remove StringBuilder from the callers

In Summary

At the end of all this, our validateQuery method is a tiny bit smaller. It was 93 lines of code, and now it’s 74 lines. We achieved this simplification by trying to focus exclusively on where mutating values added complexity to the code. We:

  1. Stopped mutating the pathElements array that we were iterating over and introduced a databasePathElements collection. This meant we were no longer changing an array we were also trying to read. A nice bonus effect here is that we can use Java 8 streams functionality to produce our final database path rather than doing our own iteration (note: the performance of this is potentially not as fast as the original code. In the wider context of this application this doesn’t matter).
  2. Stopped setting a hasTranslations flag, which meant we could now move the mutation of the MappedField mf into a separate method. Moving this logic into its own method is one of the main contributors to the simplification of the larger method, but it’s still just a small step and may benefit from further refactoring.
  3. Stopped changing a parameter value, which was a hack around not being able to return multiple values from a method. We achieved this by introducing a new type to wrap both return values (the mapped field and the database path). This change slightly reduced the complexity of the method and let us remove code in places that call this method, but is also likely to be an area for further refactoring.

The original code exhibits more than one smell. By focusing on the problems caused by mutable values, we have taken a step towards making the code easier to understand. You’ll notice that we still make changes to values and to data structures, but what we’ve tried to do is a) limit mutation to smaller and easier to reason about sections, and b) make sure that the values that are changed or updated are not surprising ones.

Symptoms

  • Reassignment of variables
  • Parameters altered to hold new values
  • Passing around primitive wrapper classes (Boolean, Integer etc) or other wrappers (e.g. StringBuilder) in order to be able to alter the contents
  • Difficult or impossible to extract smaller methods because multiple primitive values are being updated

Possible solutions

  • If the code is altering an array or collection that it is also reading from, consider having a second collection or array to track changes and leave the initial values unaltered. Reading and writing from the same collection, while safe in our specific example, can lead to unexpected results and concurrency issues.
  • If it’s hard to tell which code branch a particular value came from, you may either want to store several values with readable names instead of reusing a variable, or move the code that changes the value into a separate method so that the changes are restricted to a small section of code.
  • If input parameters are being altered, consider adding a new value to the object that is returned so it contains all the data that needs to be returned. If this isn’t possible, consider introducing a new type to wrap all the return values.
  • Counters or boolean primitives may not be the best way to keep track of state within a method. Removing them or moving them into an appropriate domain class may make it easier to refactor code into something simpler.
image description