Tips & Tricks

Code Smells: Iteration

This is part of a series investigating code that looks suspicious (“code smells”), and exploring possible alternatives.

Last time we looked at some suspicious nested code, and I suggested that the best way to deal with this was to move some of it into a different class to stop the original class having to understand the internals of how the data was stored.

The Smell: Iteration

But that’s not the end of the story.  In this article I want to explore a different problem that code iteration might be pointing to.  The nested iteration in the last example suggested the logic was in the wrong place.  The existence of iteration at all in our new hasName method (whether implemented as a for loop or using the Streams API) suggests another problem: this might not be the correct data structure to store the names in.

This may even be hidden (or at least less obvious) when using the Streams API because this gives us a readable way to query our collection.  But it doesn’t necessarily give us a fast way to query our collection.

Example 1: Set instead of List

Let’s take a closer look at the code in question. In the Deeply Nested Code article we created hasName() in MappedField

    boolean hasName(String storedName) {
        return getLoadNames().stream()
                             .anyMatch(storedName::equals);
    }

I’m going to expand it back out to a for loop so it’s more obvious what’s happening

Turn Streams API call into a for loop

Turn Streams API call into a for loop

We end up with

    boolean hasName(String storedName) {
        for (String s : getLoadNames()) {
            if (storedName.equals(s)) {
                return true;
            }
        }
        return false;
    }

The getLoadNames method returns a List of Strings, which we iterate over in order to see if a particular String value is in there.  If we look at the other places getLoadNames is used, the usage is pretty similar – we iterate over the list to find something we want.

Find usages of getLoadNames()

Find usages of getLoadNames()

This seems pretty wasteful – why iterate over the same list again and again, when you can store it in a data structure which lets you ask immediately if it contains what you want? Given that in this particular case, the names are either unique, or duplicates are effectively ignored, it seems that Set might be the data structure we want instead.

The getLoadNames method is not a simple getter, as you might expect from the name (but misleading naming is not the topic of this blog post, nor is caching the results of an operation that only needs to be done once, so let’s gloss over those smells for now).  It processes some data in order to calculate the list of names. The implementation details aren’t important for the purpose of this article, we’re just going to do the simplest thing that works and change the method to return a Set.

Return a Set from getLoadNames()

Return a Set from getLoadNames()

Because Set is also a Collection (like the original return type List), all of the code that calls this method still passes – the three callers only used the method in a for loop. Going back to our original method hasName, we can simplify it to:

    boolean hasName(String storedName) {
        return getLoadNames().contains(storedName);
    }

That’s it.  No more looping, just a simple check.

We should look at the other callers to see if they can be simplified as well, but if they need to iterate over the Set (for any reason) we can leave them as they are.

Running all the tests for the project shows everything still works as expected so we can commit these changes.

Example 2: Map instead of List

Here’s a very similar example, with a slightly different solution:

    public MappedField getMappedFieldByJavaField(final String name) {
        for (final MappedField mf : persistenceFields) {
            if (mf.getJavaFieldName().equals(name)) {
                return mf;
            }
        }
        return null;
    }

In this method we’re iterating over all the fields, and looking for one with the same javaFieldName as some given value.  This sounds like something which would be better handled by a different data structure. In this case, a Map of String name to MappedField would let us immediately find the right field with this name.

The persistenceFields field is used in a number of places in this class, so it’s best to take a safe approach to migrating its data structure.  I’m going to replace it with a new field, fields, which is a Map of String to MappedField.

Create a new Map field

Create a new Map field

I use highlight usages to find everywhere in this class that persistenceFields was used. The first step is to initialise the Map in the same place the List was initialised

Initialise the new Map

Initialise the new Map

Then I can find all the places that queried persistenceFields, and the simplest way to migrate them to use the new field is to use the values() Collection from the Map, so all the existing code works the same way it used to

Use Map.values() instead of original list

Use Map.values() instead of original list

Most of these changes only impact this class and not other code.  The exception is the getter, which we have to change to return a Collection rather than a List. A quick look at the usages of this method show that the List is only used for iteration, or other cases where a Collection will work just as well.  So we change this to return the Collection

Change the return type of the getter

Change the return type of the getter

When we’ve replaced the usages of the original List with the new Map, we can safely remove the old List and rename the Map (if we like) to the name of the original field.

Remove original list

Remove original list

Rebuilding the project and re-running the tests shows everything still compiles and runs as expected.

Now we can go back to the original method with the smell, and make the use of the map.  The updated method is:

    public MappedField getMappedFieldByJavaField(final String name) {
        return persistenceFields.get(name);
    }

Much simpler!

If there are other methods that were iterating over the list to find the name, we can make the same change (or simply delegate to getMappedFieldByJavaField).  All callers that were iterating for any other reason still work exactly the same as before, just iterating over the Map’s values(). We may want to look at these other methods later to see if there’s some way to simplify them.  Remember, it’s OK to have more than one data structure to store the same information, if the purpose is to provide fast access to the data – this is particularly useful if reading happens more frequently than updating.  In our case, we might want to store our MappedFields in several Maps keyed by different values.

In Summary

If you see iteration code (for loops or Streams API calls) you should question whether this is really needed.  If the data structure is always being iterated over in order to find some value, you may be better off using a Set or a Map. This will give you not only simpler, cleaner code, but usually better performance as well.

Remember when choosing a data structure that there are two main access patterns to consider – adding/updating data, and reading the data.  How frequently you read vs write and what sorts of update operations you do will determine which is the most efficient data structure for you.

Symptoms:

  • For loops iterating over a collection looking for some value
  • Streams API calls with findFirst/findAny/anyMatch.

Possible Solutions:

  • If the code is checking if a value exists in a List, consider using a Set instead, if the values are unique or duplicates are ignored.
  • If the code is always looking for an Object with a particular property, a Map of the Objects keyed by that property will be much more efficient.
  • The Collections API gives you a lot of useful methods too, so even if you can’t change the data structure directly, you should check if there’s a Collection method that does what you want.  Our first example could have continued being a List, if duplicates were expected, but we could have used List.contains() instead of writing out own iteration code.  This is more concise, and the Collections implementation is usually the most efficient version for the data structure.
  • If, for some reason, you really do need to write your own iteration code, it may be best to encapsulate the implementation details of how data is written/read, and provide nice helpfully named methods for your domain objects to use this. This way at least the iteration code is kept out of your business logic.
image description