IntelliJ IDEA
IntelliJ IDEA – the Leading Java and Kotlin IDE, by JetBrains
Code Smells: Mutation
This is part of a series investigating code that looks suspicious (“code smells”), and exploring possible alternatives.
- Code Smells: Null
- Code Smells: Deeply Nested Code
- Code Smells: Iteration
- Code Smells: Mutation
- Code Smells: Multi-Responsibility Methods
- Code Smells: If Statements
- Code Smells: Too Many Problems
In the article about deeply nested code I wanted to change getMappedField
to return an Optional
value. The change to the method was fairly trivial, the change to the code that calls that method… well, that turned out to be another matter altogether.
One of the places this method is used is in validateQuery
, a complex method that does rather a lot more than you might expect. When I changed getMappedField
to return an Optional, I ended up replacing null checks with a surprisingly large number of isPresent
checks:
That’s not all of them; this method is nearly a hundred lines long, and it doesn’t fit in a single screenshot (this will be a topic for another article). My first reaction was irritation that Optional
not only didn’t solve my null check problem, but made my code uglier with the addition of these checks and the associated gets. But of course this was just a sign of other underlying problems.
The Smell: Mutable Values
This method is long enough to contain more than one smell. For this article, I want to focus on the first issue I noticed: the mf
variable is exactly that – variable. It’s a mutable value, and is reassigned under some circumstances. We can see this if we make mf
final.
If the getMappedField
method doesn’t return a value (and the fieldIsArrayOperator
flag is false, whatever that means) we go off and try to assign mf
to the value from getMappedFieldByJavaField
. I assume readers don’t have a clear idea of the context of this specific code, but to be honest I’ve been working with this project on and off for years and I don’t really understand what it’s doing either. This is what being a developer is all about: it’s hard to write code, but it’s often harder to read it.
Anyway.
The point is this mf
value is being changed, and this makes it even harder to reason about. For example, when we check if it’s present later in the code, if the value is empty we don’t know if it’s empty because getMappedField
returned no value or if it’s because getMappedFieldByJavaField
returned no value. We just don’t know what it means if it is empty.
We could try and address this specific problem by moving the logic that can reassign the value into its own method and giving that a useful name. Then we should have an idea of what our variable is and why it may, or may not, be present. If we ask IntelliJ IDEA to extract this code into a method…
…it can’t be done, because of at least one other mutable value.
We’re modifying the values inside a pathElements
array. This doesn’t stop us from doing the refactoring, as we can pass this array in as a parameter and alter it in the method, but there is a problem with altering this value. More on this later.
The modification that stops us from being able to extract the method is when we’re setting a flag hasTranslations
to true under some circumstances. Turns out that this flag is an optimization – if it has been set to true, then we need to modify another value:
If we look more closely at this code, we might wonder about two things. First, why are we changing a value with the name origProp
? It’s not going to be the original property after the change, is it? Second, why are we reusing and resetting a StringBuilder
? Seems like an odd thing to do, rather than using a new String value, or setting a String value in some domain object.
Turns out we’re changing a StringBuilder
because this code intentionally alters an input parameter value. This code not only makes heavy use of mutability to set the state within a long and complicated method, but it actually alters the values of the input parameters, presumably because Java doesn’t support multiple output values and this seemed like an OK compromise (note: it’s not).
So to summarize the problems introduced here by mutating values:
- Mutating the value of
mf
makes the complex logic that follows very difficult to reason about. - Setting a boolean flag to be used later in the method makes it very difficult to extract sections of the code into more readable small methods.
- Changing an input parameter adds extra complexity into this method and, most importantly, is potentially extremely surprising to the caller of this method – we generally don’t expect our parameters to be changed when we pass them into a method.
We have a long, complex, difficult to understand method that is extremely hard to refactor under these circumstances. And that’s just looking at the mutability smells in this code.
Unlike previous articles which showed similar but unrelated examples, the refactoring suggestions in this article takes the form of a series of steps that need to be performed in order. These steps may not be the first that occur to you (they weren’t the first ideas I tried), but they do try to address one small problem at a time and impact the code as little as possible.
Step 1: Stop mutating the array we’re iterating over
The first problem I’m going to tackle is the mutation of the pathElements
array. By removing the code that alters pathElements
, I hope to a) get rid of the rather nasty smell of altering an array/collection I am iterating over and b) remove the need for the hasTranslations
flag. Part b) should help us extract some of this enormous code into a simpler method which will limit the mutation of the mf
variable to just one, easier-to-reason-about method.
First I introduce another object to store the new values (rather than changing the original object to contain the new values). I’ve chosen to use a List<String>
for this, because I like working with collections and because Streams gives us a nice feature which I’ll use later in the method. I could just have easily have chosen an array of Strings; in this case it’s totally personal choice (you might be trading readability for performance, so this will depend upon your own code).
I create and initialize this collection in the same place the original pathElements
array was created, and I’m going to initialize it with the same values.
Next, where I changed the values in pathElements
, I now want to change the values in databasePathElements
. For now, I’m going to carry on doing both changes to try and reduce the chances of the code breaking.
Finally, I’m going to use databasePathElements
instead of pathElements
to set the new values of the StringBuilder
origProp
.
Note that I’ve used one of my favorite features from Java 8, Collectors.joining
. This has replaced 5 lines of code with a single line – it’s really useful for producing CSV content or, in this case, turning a collection into a String separated by dots. The performance of this method may be different from the original code (plus we’re using a Collection instead of an array), so if nano-second performance matters to you, you should performance-test this. I expect for most applications the impact is negligible, and the new syntax reduces boilerplate.
When I run all the tests, everything still passes. I go back and remove the code that alters pathElement
(it was on line 76, here I’ve clicked on the blue change indicator in the gutter in IntelliJ IDEA to show the old code vs the new code).
Running the tests again shows this all works as before.
This is a small change and I’ve limited it to only impact the internals of this method to minimize the chances of errors. It may seem a bit pointless but it is step one to reducing the complexity in this method.
Step 2: Remove hasTranslations flag
This flag is causing me all sorts of trouble when I try to extract a smaller method. One approach is to perform a check between the original path and our new path, to see if it has changed instead of making a note of when it has changed.
This is clearly a more expensive check (performance-wise) than a simple boolean check – not only do we do a String comparison, but we also have to build our new databasePath
String regardless of whether we’re going to use it or not. I assume the original boolean was there to save us having to build the new String if we didn’t need it (there is no test or documentation so I’m guessing), so this latest version of the code doesn’t seem to have any real value.
Given that Morphia a) uses java.lang.reflect
extensively and b) is designed to talk over a network to a database, nano-second CPU-instruction-level performance is not our biggest concern. Also, given there are no performance tests/perf criteria for this part of the code, I’m going to make an executive decision and simply update the value with databasePath
every time, regardless of whether it’s different or not. I expect the performance impact to be negligible, and the readability impact will be very positive.
So I remove the check:
…and now I can remove the flag:
Running all my tests again shows everything still works.
Step 3: Extract Method
Now I’ve removed this flag and introduced a new Collection for storing the updated database path, I should be able to extract a method that limits the scope of the mutation of MappedField
mf
to a small, easy-to-reason-about method.
It’s not a particularly pretty method, as it takes a lot of parameters for just a few lines of code, but these can be simplified later. Our focus right now is limiting mutability and making the very long method easier to reason about.
Step 4: Introduce a new type
Next I want to tackle the smell of altering the value of one of the input parameters. If we needed to return more than one value, we either need to change the structure of the object that’s being returned, or return a new type that encapsulates all the returned values.
In order to decide which approach to take, we need to think about the values we’re talking about: what are they, where do they live, what other data are they associated with? The two return values from this method are the MappedField
that is the original return type, and a String value that represents this field’s path in the MongoDB database. I can’t add a new value to MappedField
to store its database name for the simple reason that the returned MappedField
is sometimes null, but the database path is always known (it’s part of the query so we always have it).
The simplest solution to our problem, therefore, is to change this method to return a new type, ValidatedField
, which wraps the original MappedField
that was returned, and the String representing the database path. It’s not the prettiest solution at this stage, but given the current state of the method, a) anything to simplify the method is an improvement, and b) we may find ourselves using this for later refactoring, or we may find that this is a missing domain concept (we may rename it later when we discover what its true purpose is).
The first part of this is to introduce a new type which wraps the MappedField
. I’m going to use an inner class right now; I may refactor this later if/when it becomes clear where it lives. I’m also going to use an Optional
field, even though this is generally not recommended. It’ll make the refactoring a bit more straightforward, and I can always address this later.
I can remove retVal
because it’s not being used anymore.
I’ve changed the method to return this new ValidatedField
type, so I’m going to have to fix up any compilation errors from this – anywhere that calls the original method is expecting a MappedField
not a ValidatedField
.
For now I’ve done the simplest thing that works and getting the MappedField
from the new ValidatedField
. Rerunning all the tests shows this works as expected.
Step 5: Return the database path value instead of mutating a parameter
The next step is to add the database path to this ValidatedField
object, so that we can stop the travesty that is the mutation of the StringBuilder
parameter.
Firstly, we add a database path field to ValidatedField
.
I’m not going to remove the old code just yet, but I’m going to move over to using the new value first. I locate all the callers of this method using Find Usages (Alt+F7), and any that cared about the changed value of the origProp
parameter need to be changed to use the new databasePath
field instead.
When I re-run the tests, I find that I’ve missed a step – the code that sets the database path in ValidatedField
isn’t always reached in all cases, so I need to make sure I initialize the value in all cases.
Running the tests shows everything works now – thank goodness! – so we can do some of the cleanup that should make our code easier to understand. We can delete the code that altered the origProp
parameter:
Now that we’re not doing that nasty hack of using a StringBuilder
to let us change what is effectively a String parameter, we can change this into a straightforward String.
Finally, we can do some tidying up and replace all references to origProp
and delete the parameter.
We should also check all the calling code, we can probably delete a lot of StringBuilder
creation.
In Summary
At the end of all this, our validateQuery
method is a tiny bit smaller. It was 93 lines of code, and now it’s 74 lines. We achieved this simplification by trying to focus exclusively on where mutating values added complexity to the code. We:
- Stopped mutating the
pathElements
array that we were iterating over and introduced adatabasePathElements
collection. This meant we were no longer changing an array we were also trying to read. A nice bonus effect here is that we can use Java 8 streams functionality to produce our final database path rather than doing our own iteration (note: the performance of this is potentially not as fast as the original code. In the wider context of this application this doesn’t matter). - Stopped setting a
hasTranslations
flag, which meant we could now move the mutation of theMappedField mf
into a separate method. Moving this logic into its own method is one of the main contributors to the simplification of the larger method, but it’s still just a small step and may benefit from further refactoring. - Stopped changing a parameter value, which was a hack around not being able to return multiple values from a method. We achieved this by introducing a new type to wrap both return values (the mapped field and the database path). This change slightly reduced the complexity of the method and let us remove code in places that call this method, but is also likely to be an area for further refactoring.
The original code exhibits more than one smell. By focusing on the problems caused by mutable values, we have taken a step towards making the code easier to understand. You’ll notice that we still make changes to values and to data structures, but what we’ve tried to do is a) limit mutation to smaller and easier to reason about sections, and b) make sure that the values that are changed or updated are not surprising ones.
Symptoms
- Reassignment of variables
- Parameters altered to hold new values
- Passing around primitive wrapper classes (
Boolean
,Integer
etc) or other wrappers (e.g.StringBuilder
) in order to be able to alter the contents - Difficult or impossible to extract smaller methods because multiple primitive values are being updated
Possible solutions
- If the code is altering an array or collection that it is also reading from, consider having a second collection or array to track changes and leave the initial values unaltered. Reading and writing from the same collection, while safe in our specific example, can lead to unexpected results and concurrency issues.
- If it’s hard to tell which code branch a particular value came from, you may either want to store several values with readable names instead of reusing a variable, or move the code that changes the value into a separate method so that the changes are restricted to a small section of code.
- If input parameters are being altered, consider adding a new value to the object that is returned so it contains all the data that needs to be returned. If this isn’t possible, consider introducing a new type to wrap all the return values.
- Counters or boolean primitives may not be the best way to keep track of state within a method. Removing them or moving them into an appropriate domain class may make it easier to refactor code into something simpler.