Code Smells: Iteration

This is part of a series investigating code that looks suspicious (“code smells”), and exploring possible alternatives.

Last time we looked at some suspicious nested code, and I suggested that the best way to deal with this was to move some of it into a different class to stop the original class having to understand the internals of how the data was stored.

The Smell: Iteration

But that’s not the end of the story.  In this article I want to explore a different problem that code iteration might be pointing to.  The nested iteration in the last example suggested the logic was in the wrong place.  The existence of iteration at all in our new hasName method (whether implemented as a for loop or using the Streams API) suggests another problem: this might not be the correct data structure to store the names in.

This may even be hidden (or at least less obvious) when using the Streams API because this gives us a readable way to query our collection.  But it doesn’t necessarily give us a fast way to query our collection.

Example 1: Set instead of List

Let’s take a closer look at the code in question. In the Deeply Nested Code article we created hasName() in MappedField

I’m going to expand it back out to a for loop so it’s more obvious what’s happening

Turn Streams API call into a for loop

Turn Streams API call into a for loop

We end up with

The getLoadNames method returns a List of Strings, which we iterate over in order to see if a particular String value is in there.  If we look at the other places getLoadNames is used, the usage is pretty similar – we iterate over the list to find something we want.

Find usages of getLoadNames()

Find usages of getLoadNames()

This seems pretty wasteful – why iterate over the same list again and again, when you can store it in a data structure which lets you ask immediately if it contains what you want? Given that in this particular case, the names are either unique, or duplicates are effectively ignored, it seems that Set might be the data structure we want instead.

The getLoadNames method is not a simple getter, as you might expect from the name (but misleading naming is not the topic of this blog post, nor is caching the results of an operation that only needs to be done once, so let’s gloss over those smells for now).  It processes some data in order to calculate the list of names. The implementation details aren’t important for the purpose of this article, we’re just going to do the simplest thing that works and change the method to return a Set.

Return a Set from getLoadNames()

Return a Set from getLoadNames()

Because Set is also a Collection (like the original return type List), all of the code that calls this method still passes – the three callers only used the method in a for loop. Going back to our original method hasName, we can simplify it to:

That’s it.  No more looping, just a simple check.

We should look at the other callers to see if they can be simplified as well, but if they need to iterate over the Set (for any reason) we can leave them as they are.

Running all the tests for the project shows everything still works as expected so we can commit these changes.

Example 2: Map instead of List

Here’s a very similar example, with a slightly different solution:

In this method we’re iterating over all the fields, and looking for one with the same javaFieldName as some given value.  This sounds like something which would be better handled by a different data structure. In this case, a Map of String name to MappedField would let us immediately find the right field with this name.

The persistenceFields field is used in a number of places in this class, so it’s best to take a safe approach to migrating its data structure.  I’m going to replace it with a new field, fields, which is a Map of String to MappedField.

Create a new Map field

Create a new Map field

I use highlight usages to find everywhere in this class that persistenceFields was used. The first step is to initialise the Map in the same place the List was initialised

Initialise the new Map

Initialise the new Map

Then I can find all the places that queried persistenceFields, and the simplest way to migrate them to use the new field is to use the values() Collection from the Map, so all the existing code works the same way it used to

Use Map.values() instead of original list

Use Map.values() instead of original list

Most of these changes only impact this class and not other code.  The exception is the getter, which we have to change to return a Collection rather than a List. A quick look at the usages of this method show that the List is only used for iteration, or other cases where a Collection will work just as well.  So we change this to return the Collection

Change the return type of the getter

Change the return type of the getter

When we’ve replaced the usages of the original List with the new Map, we can safely remove the old List and rename the Map (if we like) to the name of the original field.

Remove original list

Remove original list

Rebuilding the project and re-running the tests shows everything still compiles and runs as expected.

Now we can go back to the original method with the smell, and make the use of the map.  The updated method is:

Much simpler!

If there are other methods that were iterating over the list to find the name, we can make the same change (or simply delegate to getMappedFieldByJavaField).  All callers that were iterating for any other reason still work exactly the same as before, just iterating over the Map’s values(). We may want to look at these other methods later to see if there’s some way to simplify them.  Remember, it’s OK to have more than one data structure to store the same information, if the purpose is to provide fast access to the data – this is particularly useful if reading happens more frequently than updating.  In our case, we might want to store our MappedFields in several Maps keyed by different values.

In Summary

If you see iteration code (for loops or Streams API calls) you should question whether this is really needed.  If the data structure is always being iterated over in order to find some value, you may be better off using a Set or a Map. This will give you not only simpler, cleaner code, but usually better performance as well.

Remember when choosing a data structure that there are two main access patterns to consider – adding/updating data, and reading the data.  How frequently you read vs write and what sorts of update operations you do will determine which is the most efficient data structure for you.

Symptoms:

  • For loops iterating over a collection looking for some value
  • Streams API calls with findFirst/findAny/anyMatch.

Possible Solutions:

  • If the code is checking if a value exists in a List, consider using a Set instead, if the values are unique or duplicates are ignored.
  • If the code is always looking for an Object with a particular property, a Map of the Objects keyed by that property will be much more efficient.
  • The Collections API gives you a lot of useful methods too, so even if you can’t change the data structure directly, you should check if there’s a Collection method that does what you want.  Our first example could have continued being a List, if duplicates were expected, but we could have used List.contains() instead of writing out own iteration code.  This is more concise, and the Collections implementation is usually the most efficient version for the data structure.
  • If, for some reason, you really do need to write your own iteration code, it may be best to encapsulate the implementation details of how data is written/read, and provide nice helpfully named methods for your domain objects to use this. This way at least the iteration code is kept out of your business logic.

This entry was posted in Tips & Tricks and tagged , . Bookmark the permalink.

23 Responses to Code Smells: Iteration

  1. James says:

    I’m not sure how good this advice is. From a code simplicity perspective, by all means, use the correct functions for the job. If the work you’re doing in a loop is just “contains” or “get,” then it’s normally best to call those methods.

    What I’m not so sure about is the performance aspect of this advice. Linear searches over small array lists will often be faster than building a HashSet or HashMap from that list, then searching it. If this list is very large, persists, is searched much more often than is built, or otherwise makes sense to store in a set, by all means, do so, but this advice has to be applied judiciously.

    • Trisha Gee says:

      Linear searches over an array can be very efficient, yes, but iterating over Lists (particularly LinkedLists) can be a bit painful. But you are correct, that’s why in the summary I explicitly mention that the choice of data structure really needs to consider its full use, e.g cost of read vs write.

  2. What to do when you really have to iterate? I know that a map will be really good by taking advantage of O(1) complexity in case of a hashmap, but when you really need iterations what can you do to optimize the process?

    • Trisha Gee says:

      I’m not suggesting you remove iteration from your code completely. It’s a very useful tool and often the only way to get what you want. Even in these examples I left the iteration intact in a number of places.

      The point is to consider if you really need to iterate, or if a different data structure will give you what you need for free.

      • Jagannath says:

        “but when you really need iterations what can you do to optimize the process?”

        That was a question rather than contesting your suggestion :-)

        • Trisha Gee says:

          Ah good point, sorry I missed it :)

          If you need to iterate, arrays of primitives are the fastest things to iterate over. This is the kind of data structure that we use in low latency applications, because the language and CPUs are really good at optimising those kinds of operations.

          Having said that, in my experience many applications try to over-optimise things without needing to. The right data structure can improve performance but iterating over a collection of a dozen or so elements usually isn’t going to slow down performance in the bigger picture. Where you need to be really careful is with very large collections, particularly collections that grow either over time or with the number of users/products etc.

    • Jagannath says:

      May be sort the ArrayList once and then try loop or search for the item you want ? This saves space as well runtime efficient.

    • Nova Terata says:

      Just use LinkedHashMap or LInkedHashSet if both searching and iterating are a priority.

  3. Med Medin says:

    I’m Java beginner and you have really good talents and great knowledge, but I feel your articles focus on advertising IntelliJ IDEA IDE features instead of giving more smart and detailed solutions to common Java problems. This article contains many gifs than show the power of the IDE but they are disturbing to follow inside static text, and can simply replaced by the final code.
    Thanks.

    • Trisha Gee says:

      Thanks for the feedback. I’m glad the content is useful for Java developers in general (indeed, that is one of the goals), but this is the IntelliJ IDEA blog, and most of our readers expect the advice to be accompanied with details on how to tackle these problems using the IDE.

      I also know gifs aren’t super user friendly in a blog, but unfortunately in this format it’s the best way to show a small snippet of how to do something in a way that’s easier to follow than a long series of images.

      • Marcos says:

        I kindly disagree. A Series of screen shots would do a better job and in general you do not need more than 3 of them. Gifs make harder to see where things begin and to follow.

        • Trisha Gee says:

          Screenshots work fine for simple step-by-step instructions, for more complex refactorings videos are probably best. GIFs are both the best and the worst of all worlds.

          I will try screenshots in some later blog posts, but getting the right ones in the right size at the right time is very time-consuming to produce.

          • Evgeny Antaev says:

            Trisha, thank you for this series. It is interesting and insightful.
            As to GIFs, they usually convey the idea very well. However, sometimes I need to go through several cycles to figure out the initial state. It would be nice to have some kind of a in the initial state. Or 1, 2, 3… markers on each state.

    • Mark Cerqueira says:

      Developers leverage tools like IDEs to write better, correct code faster. Knowing the tools you’re using is second only to knowing the language itself in my book.

  4. Flavio says:

    Hi Trisha,

    To say that iteration should be “considered a code smell” is not a good thing to do for the programming community.

    Present-day CPUs operate on structures by iterating. On a high-level programming language, they are at the end done by either (a) iteration or (b) recursion. If the compiler does not do automatic tail call optimization; then (b) recursion can’t be made as optimal (in terms of speed and resources) than (a). The latest Java compiler does not perform tail call optimization and probably never will be able to do it.

    Thus, iteration IS a very important tool for enhancing performance, and real-world (production) systems might have very important performance requirements.

    Iteration, in a code, should just be considered… iteration. No kind of smells.

    In the past, there was a very important computer scientist called Edsger Dijkstra wrote a paper called “Go To Statement Considered Harmful”, GOTO being considered a “code smell”: Dijkstra is a very important guy; an unit of measure, the nanoDijkstra, was named in his honor. However, even afterwards, new programming languages did include a GOTO statement (or some form), because there are special cases when they are needed for higher performance or (believe it or not!) producing more readable code.

    Greetings,
    F.

    • Trisha Gee says:

      Thanks for the feedback. You’re right, of course, iteration is an important part of programming, which is why all basic computer science courses teach it.

      The point of this series of Code Smells is to be able to identify code that *may* be a candidate for refactoring. I’m the case of iteration, this quite a low level way of interacting with the computer, and although it can be optimised by compilers and CPUs, it doesn’t necessarily belong scattered around the code. Iteration can suggest you’re using the wrong data structure (as I talked about in this post) or that you’re exposing internal storage details at a higher level than you should. Iteration should be limited to the places in the code that know how the data is stored.

      Perhaps it would be better to describe these as Design Smells rather than code smells, since if you discover iteration scattered throughout the code, your design can be cleaner and abstract these details away from you.

  5. Stefan Rother-Stübs says:

    “Iteration Over a Container” may be longer but provides neccessary context.

    • Trisha Gee says:

      Not sure I understand the difference? Iteration over any type (array, Collection, or custom collection) can indicate either the use of the wrong data structure or poor choice of location for the iteration/search code.

  6. Hi Trisha,

    Suppose we have the following Category class:


    public class Category {

    private final String name;
    private final String rootCategoryName;

    private Category(final String name,
    final String rootCategoryName) {
    this.name = name;
    this.rootCategoryName = rootCategoryName;
    }

    public String getName() {
    return name;
    }

    public String getRootCategoryName() {
    return rootCategoryName;
    }

    public boolean isRoot() {
    return rootCategoryName == null || rootCategoryName.equals("");
    }
    }

    And we are provided a list of categories, which some of them are root and other ones are children categories. Suppose we want to build a map from the root categories to its children categories, like this:


    public Map<String, List> categoriesByRootCategory(final List categories) {
    Map<String, List> categoriesByRootCategory = new HashMap();

    for (Category category : categories) {
    if (category.isRoot()) {
    categoriesByRootCategory.computeIfAbsent(category.getName(), key -> new ArrayList());
    } else {
    categoriesByRootCategory.get(category.getRootCategoryName()).add(category);
    }
    }

    return categoriesByRootCategory;
    }

    The above iteration is not a code smell, but in your opinion, should it be possible to convert to Java 8?

    Thanks,

  7. netikras says:

    Correct me if I’m wrong, but.. HashSets are HashMaps, and HashMap.get() will make AT LEAST (HashMap.size() * 3) conditionals to find an object in worst case scenario.

    Unles one is using a different Map implementation, that is more efficient in lookups. Which map would that be?

    The only advantage I see here is that HashMap uses a primitive (the hash itself) in one of the comparisons which should be faster than object comparison using equals(). Apart from that I honestly do not see how HM.contains() could be faster than ARRL.contains() or for_item_in_array (not to mention that Maps and Sets (AFAIK) are bigger in size than simple arrays..). Unless maybe JVM is doing some optimizations, or author of this article had some particular implementation of Map/Set in mind..
    Or maybe I am completely wrong :)

    • Trisha Gee says:

      You’re talking about the worst case, which is very useful to understand (especially if it’s more likely than not that the thing you’re searching for is not there). But in this code in particular, and in a lot of business cases, we’re searching for something that we know is in the list, and *hopefully* we’ve also provided a useful hashing function. In which case either a Map lookup (or HashSet contains) is generally likely to be faster.

      In most code, this kind of consideration is probably an optimisation anyway (performance-wise), but the right choice of data structure isn’t just about performance, it’s about using the right tool for the job. This can create cleaner code, but even if it doesn’t (in the case of ArrayList.contains vs HashSet.contains) the correct data structure conveys to other developers the design intentions. In the case of Set, for example, it’s that duplicates are not allowed (or don’t need to be stored, just one instance is OK).

      • netikras says:

        Well if *we know* the item is in the collection then there’s pretty much no point in calling contains() :)

        Anyways I see now what you meant. Lists, Sets and Maps are fundamentally different concepts and one should choose the right weapon for the battle, otherwise bugs will start knocking on the doors right away.

        However if one is not writing for_item_in_… loop himself that does not mean that iterations will be avoided. There are plenty iterations hidden under those pretty contains() and get() methods. But you are right, the code with those methods is much cleaner and readable. Although performance-wise….. I’d really like to see some tests if there are any :)

Leave a Reply

Your email address will not be published. Required fields are marked *