Code Smells: Mutation

This is part of a series investigating code that looks suspicious (“code smells”), and exploring possible alternatives.

In the article about deeply nested code I wanted to change getMappedField to return an Optional value.  The change to the method was fairly trivial, the change to the code that calls that method… well, that turned out to be another matter altogether.

One of the places this method is used is in validateQuery, a complex method that does rather a lot more than you might expect.  When I changed getMappedField to return an Optional, I ended up replacing null checks with a surprisingly large number of isPresent checks:

Large number of isPresent checks

That’s not all of them, this method is nearly a hundred lines long, it doesn’t fit in a single screenshot (this will be a topic for another article). My first reaction was irritation that Optional not only didn’t solve my null check problem, but made my code uglier with the addition of these checks and the associated gets.  But of course this was just a sign of other underlying problems.

The Smell: Mutable Values

This method is long enough to contain more than one smell.  For this article I want to focus on the first issue I noticed: the mf variable is exactly that – variable.  It’s a mutable value, and is reassigned under some circumstances.  We can see this if we make mf final.

Value is resassigned

If the getMappedField method doesn’t return a value (and the fieldIsArrayOperator flag is false, whatever that means) we go off and try to assign mf to the value from getMappedFieldByJavaField. I assume readers don’t have a clear idea of the context of this specific code, but to be honest I’ve been working with this project on and off for years and I don’t really understand what it’s doing either.  This is what being a developer is all about: it’s hard to write code, it’s often harder to read it.

Anyway.

The point is this mf value is being changed, and this makes it even harder to reason about.  For example, when we check if it’s present later in the code, if the value is empty we don’t know if it’s empty because getMappedField returned no value or if it’s because getMappedFieldByJavaField returned no value. We just don’t know what it means if it is empty.

We could try and address this specific problem by moving the logic that can reassign the value into its own method and giving that a useful name, then we should have an idea of what our variable is and why it may, or may not, be present.  If we ask IntelliJ IDEA to extract this code into a method…

Extract method fails

…it can’t be done, because of at least one other mutable value.

We’re modifying the values inside a pathElements array. This doesn’t stop us doing the refactoring, as we can pass this array in as a parameter and alter it in the method, but there is a problem with altering this value. More on this later.

The modification that stops us being able to extract the method is when we’re setting a flag hasTranslations to true under some circumstances.  Turns out that this flag is an optimisation – if it has been set to true, then we need to modify another value:

Parameter origProp is changed

If we look more closely at this code, we might wonder 1) why are we changing a value with the name origProp? It’s not going to be the original property after the change, is it? 2) why are we reusing and resetting a StringBuilder?  Seems like an odd thing to do, rather than using a new String value, or setting a String value in some domain object.

Turns out we’re changing a StringBuilder because this code intentionally alters an input parameter value.  This code not only makes heavy use of mutability to set the state within a long and complicated method, but it actually alters the values of the input parameters, presumably because Java doesn’t support multiple output values and this seemed like an OK compromise (note: it’s not).

So to summarise the problems introduced here by mutating values:

  1. Mutating the value of mf makes the complex logic that follows very difficult to reason about.
  2. Setting a boolean flag to be used later in the method makes it very difficult to extract sections of the code into more readable small methods.
  3. Changing an input parameter adds extra complexity into this method and, most importantly, is potentially extremely surprising to the caller of this method – we generally don’t expect our parameters to be changed when we pass them into a method.

We have a long, complex, difficult to understand method that is extremely hard to refactor under these circumstances. And that’s just looking at the mutability smells in this code.

Unlike previous articles which showed similar but unrelated examples, the refactoring suggestions in this article takes the form of a series of steps that need to be performed in order. These steps may not be the first that occur to you (they weren’t the first ideas I tried), but they do try to address one small problem at a time and impact the code as little as possible.

Step 1: Stop mutating the array we’re iterating over

The first problem I’m going to tackle is the mutation of the pathElements array.  By removing the code that alters pathElements, I hope to a) get rid of the rather nasty smell of altering an array/collection I am iterating over and b) remove the need for the hasTranslations flag.  Part b) should help us to extract some of this enormous code into a simpler method which will limit the mutation of the mf variable to just one, easier-to-reason-about method.

First I introduce another object to store the new values (rather than changing the original object to contain the new values).  I’ve chosen to use a List<String> for this, because I like working with collections, and because Streams gives us a nice feature which I’ll use later in the method.  I could just have easily have chosen an array of Strings, in this case it’s totally personal choice (you might be trading readability for performance, this will depend upon your own code).

Introduce databasePathElements

I create and initialise this collection in the same place the original pathElements array was created, and I’m going to initialise it with the same values.

Next, where I changed the values in pathElements, I now want to change the values in databasePathElements.  For now, I’m going to carry on doing both changes to try and reduce the chances of the code breaking.

Set new values on databasePathElements

Finally, I’m going to use databasePathElements instead of pathElements to set the new values of the StringBuilder origProp.

Use databasePathElements to set the StringBuffer

Note that I’ve used one of my favourite features from Java 8, Collectors.joining.  This has replaced 5 lines of code with a single line, it’s really useful for producing CSV content or, in this case, turning a collection into a String separated by dots. The performance of this method may be different from the original code (plus we’re using a Collection instead of an array), so if nano-second performance matters to you, you should performance test this. I expect for most applications the impact is negligible, and the new syntax reduces boilerplate.

When I run all the tests, everything still passes. I go back and remove the code that alters pathElement (it was on line 76, here I’ve clicked on the blue change indicator in the gutter in IntelliJ IDEA to show the old code vs the new code).

Delete line that changes array

Running the tests again shows this all works as before.

This is a small change and I’ve limited it to only impact the internals of this method to minimise the chances of errors.  It may seem a bit pointless but it is step one to reducing the complexity in this method.

Step 2: Remove hasTranslations flag

This flag is causing me all sorts of trouble when I try to extract a smaller method.  One approach is to perform a check between the original path and our new path, to see if it has changed instead of making a note of when it has changed.

Check if the path has changed

This is clearly a more expensive check (performance-wise) than a simple boolean check – not only do we do a String comparison, but we also have to build our new databasePath String regardless of whether we’re going to use it or not.  I assume the original boolean was there to save us having to build the new String if we didn’t need it (there is no test or documentation so I’m guessing), so this latest version of the code doesn’t seem to have any real value.

Given that Morphia a) uses java.lang.reflect extensively and b) is designed to talk over a network to a database, nano-second CPU-instruction-level performance is not our biggest concern. Also, given there are no performance tests/perf criteria for this part of the code, I’m going to make an executive decision and simply update the value with databasePath every time, regardless of whether it’s different or not.  I expect the performance impact to be negligible, and the readability impact will be very positive.

So I remove the check:

Remove the check completely

…and now I can remove the flag

Remove the flag

Running all my tests again shows everything still works.

Step 3: Extract Method

Now I’ve removed this flag and introduced a new Collection for storing the updated database path, I should be able to extract a method that limits the scope of the mutation of MappedField mf to a small, easy to reason about method.

Extract method for getting the MappedField

It’s not a particularly pretty method, it takes a lot of parameters for just a few lines of code, but these can be simplified later.  Our focus right now is limiting mutability and making the very long method easier to reason about.

Step 4: Introduce a new type

Next I want to tackle the smell of altering the value of one of the input parameters. If we needed to return more than one value, we either need to change the structure of the object that’s being returned, or return a new type that encapsulates all the returned values.

In order to decide which approach to take, we need to think about the values we’re talking about: what are they, where do they live, what other data are they associated with?  The two return values from this method are the MappedField that is the original return type, and a String value that represents this field’s path in the MongoDB database.  I can’t add a new value to MappedField to store its database name for the simple reason that the returned MappedField is sometimes null, but the database path is always known (it’s part of the query so we always have it).

The simplest solution to our problem, therefore, is to change this method to return a new type, ValidatedField, which wraps the original MappedField that was returned, and the String representing the database path.  It’s not the prettiest solution at this stage, but given the current state of the method a) anything to simplify the method is an improvement and b) we may find ourselves using this for later refactoring, or we may find that this is a missing domain concept (we may rename it later when we discover what its true purpose is).

The first part of this is to introduce a new type which wraps the MappedField.  I’m going to use an inner class right now, I may refactor this later if/when it becomes clear where it lives. I’m also going to use an Optional field, even though this is generally not recommended.  It’ll make the refactoring a bit more straightforward, and I can always address this later.

Introduce new ValidatedField

I can remove retVal now it’s not being used.

Remove retVal

I’ve changed the method to return this new ValidatedField type, so I’m going to have to fix up any compilation errors from this – anywhere that calls the original method is expecting a MappedField not a ValidatedField.

Change callers to use new type

For now I’ve done the simplest thing that works and getting the MappedField from the new ValidatedField. Rerunning all the tests shows this works as expected.

Step 5: Return the database path value instead of mutating a parameter

The next step is to add the database path to this ValidatedField object, so that we can stop the travesty that is the mutation of the StringBuilder parameter.

Firstly, we add a database path field to ValidatedField.

Add database path to ValidatedField

I’m not going to remove the old code just yet, I’m going to move over to using the new value first.  I locate all the callers of this method using Find Usages (Alt+F7), and any that cared about the changed value of the origProp parameter need to be changed to use the new databasePath field instead.

Change callers to use the new database path

When I re-run the tests I find that I’ve missed a step – the code that sets the database path in ValidatedField isn’t always reached in all cases, so I need to make sure I initialise the value in all cases.

Initialise database path correctly

Running the tests shows everything works now, thank goodness, so we can do some of the cleanup that should make our code easier to understand. We can delete the code that altered the origProp parameter

Delete code that changes origProp

Now we’re not doing that nasty hack of using a StringBuilder to let us change what is effectively a String parameter, we can change this into a straightforward String.

Change StringBuilder parameter to a String

Finally we can do some tidy up and replace all references to origProp and delete the parameter.

Remove StringBuilder from the callers

We should also check all the calling code, we can probably delete a lot of StringBuilder creation.

Remove StringBuilder from the callers

In Summary

At the end of all this, our validateQuery method is a tiny bit smaller.  It was 93 lines of code, now it’s 74 lines. We achieved this simplification by trying to focus exclusively on where mutating values added complexity to the code.  We:

  1. Stopped mutating the pathElements array that we were iterating over and introduced a databasePathElements collection.  This meant we were no longer changing an array we were also trying to read. A nice bonus effect here is that we can use Java 8 streams functionality to produce our final database path rather than doing our own iteration (note: the performance of this is potentially not as fast as the original code. In the wider context of this application this doesn’t matter).
  2. Stopped setting a hasTranslations flag, which meant we could now move the mutation of the MappedField mf into a separate method.  Moving this logic into its own method is one of the main contributors to the simplification of the larger method, but it’s still just a small step and may benefit from further refactoring.
  3. Stopped changing a parameter value, which was a hack around not being able to return multiple values from a method. We achieved this by introducing a new type to wrap both return values (the mapped field and the database path). This change slightly reduced the complexity of the method and let us remove code in places that call this method, but is also likely to be an area for further refactoring.

The original code exhibits more than one smell. By focusing on the problems cause by mutable values we have taken a step towards making the code easier to understand.  You’ll notice that we still make changes to values and to data structures, but what we’ve tried to do is a) limit mutation to smaller and easier to reason about sections, and b) make sure that the values that are changed or updated are not surprising ones.

Symptoms

  • Reassignment of variables
  • Parameters altered to hold new values
  • Passing around primitive wrapper classes (Boolean, Integer etc) or other wrappers (e.g. StringBuilder) in order to be able to alter the contents.
  • Difficult or impossible to extract smaller methods because multiple primitive values are being updated.

Possible Solutions

  • If the code is altering an array or collection that it is also reading from, consider having a second collection or array to track changes and leave the initial values unaltered. Reading and writing from the same collection, while safe in our specific example, can lead to unexpected results and concurrency issues.
  • If it’s hard to tell which code branch a particular value came from, you may either want to store several values with readable names instead of reusing a variable, or move the code that changes the value into a separate method so that the changes are restricted to a small section of code.
  • If input parameters are being altered, consider adding a new value to the object that is returned so it contains all the data that needs to be returned.  If this isn’t possible, consider introducing a new type to wrap all the return values.
  • Counters or boolean primitives may not be the best way to keep track of state within a method. Removing them or moving them into an appropriate domain class may make it easier to refactor code into something simpler.

This entry was posted in Tips & Tricks and tagged , . Bookmark the permalink.

19 Responses to Code Smells: Mutation

  1. Ben Hall says:

    Facscinating post. One question: did this refactoring alter performance at all? A common justification for using mutable is speed

    • Trisha Gee says:

      I haven’t benchmarked it yet, although I do want to do a post on optimisation, so I hope to do so soon.

      There are at least two places I expect my refactored code to perform worse in a microbenchmark: 1) where I remove the hasTranslations check which optimises the cases where the dataabase path is the same as the query path and 2) using Java 8 Streams to create the database path String. However, in a system level performance test, I expect these areas to have little to no impact on performance. I really hope to get around to doing these tests to prove my hypothesis.

      Also it’s worth noting that there are different types of mutation, which can have different effects on performance. For example, using and mutating an array is probably better performance than using an immutable data structure, because a) arrays are super-optimisable by CPUs and languages and b) the possible impact on the garbage collector of using immutable collections. But mutating objects by reassigning them at random times doesn’t seem something that’s performance-friendly, particularly for the garbage collector. You want either very short-lived or very long-lived objects for most modern GCs to be effective, and mutation can lead to patterns that could cause stop-the-world pauses. Also, the JVM is really good at optimising our less-than-optimal code, but as you can see from this example mutation makes the code hard to reason about and hard to refactor, which means it’s also likely to be hard for the JVM to optimise. Immutable objects can help improve performance because they can lead to code that is easier for the JVM to optimise.

  2. Marcelo says:

    By the usage of java 8 Optional it looks like you could have used the ifpresent method avoiding multiple “gets” and nested ifs.
    Given that why have you choose the isPresent method?

    • Trisha Gee says:

      With the current state of the method, it’s not possible to use ifPresent: some of the checks have more than one condition to check; some have else statements which don’t translate to the ifPresent syntax; and some mutate a value, which can’t be done inside a lambda expression.

      Later blog posts may explore solutions that can use ifPresent.

  3. Tamoghna Chowdhury says:

    Couldn’t you use String.join on the pathElements array?

    • Trisha Gee says:

      Yes, you absolutely could. Obviously this doesn’t address the mutation problem if you use it in the original code, but it does highlight that the iteration part of that original code is also a smell, since String.join is a perfectly usable language features that will do this for you.

  4. Stefan Rother-Stübs says:

    It would be nice, if I could stop the animated GIFs and go forward and back in an animation.
    Once I stop the GIF it starts again if I push the play button.

    • Henning says:

      In fact, it would be much better to just use proper videos instead of GIFs. Filesizes go way down (that’s why Twitter for example actually converts uploaded animated GIFs to videos), and a pause button is trivial to implement (browsers provide it).

      • Trisha Gee says:

        Thank you both for the feedback. I also have concerns about the GIFs, particularly about being able to start/stop/skip, but unfortunately have not been able to come up with a better solution (due to extremely boring reasons, video recording and upload has not been possible for this series of posts).

        I’m considering changing the format to just images, but having one long video at the end that shows all the steps. What do you think?

  5. Mark says:

    add a code smell blog post about good variable naming. 😉

    • Trisha Gee says:

      I was very tempted! But I think variable/method naming is something we (should) all know needs to be done “properly”, and all teams have different ideas on what’s the best approach.

  6. Yury says:

    Why not mention Immutable collections? There should be plenty of them even in Java

    • Trisha Gee says:

      I didn’t mention them here because they don’t help this specific problem. Immutable collections are an entirely different blog post!

      • Yury says:

        Well, they do help by making sure this problem (or at least part of it) simply can’t happen. You can still re-assign to a variable tho, but I think they at least encourage a way of programming that reduces mutations.

Leave a Reply

Your email address will not be published. Required fields are marked *