Code Smells: Too Many Problems

Or: argh! Where do I start?!

This is part of a series investigating code that looks suspicious (“code smells”), and exploring possible alternatives.

For the middle articles of this series, we’ve been focusing on a single method, and demonstrating how to improve this method, one smell at a time.  While it has been… useful… that a single method has demonstrated so many smells, if you came across that method in real life it wouldn’t be obvious what to do with it.

As a reminder, let’s look at the original method.  This is actually before we added the Optional return from getMappedField.

I know there are developers out there who can either parse this code and understand it, or who have the ability to only focus on the areas of the code that concern the thing they are working on right now and can easily ignore less important problems.  But I get easily distracted by complexity, and every time I’ve stumbled over this method I’ve wanted to do something about it.  Actually I’ve tried to do something about it more than once.

The problem is that the method is so long, and so complex, that it’s overwhelming.  It’s difficult to see what is the best/right/simplest step to take first.  While I was working on this series of posts I spent a lot of time going in circles and trying to weigh up the pros and cons of various approaches. So for this article, instead of highlighting a specific smell and what to do about it, I want to take a step back and talk about how to get started.

The Smell: So Many Problems

Code has issues

I’ve covered three of the problems in individual articles so far (mutation,  multi-responsibility methods, and if statements), and we still have more.  There’s also poor naming; a poor domain model; (possibly) premature optimisation; and procedural programming, amongst others. In fact, there’s only a couple of smells in the series that aren’t in this method.  Where to start?

Approach 1: Break the Method into Smaller Pieces

My first instinct was to break it up into smaller, nicely named methods.  Then even in a procedural piece of code like this, at least the steps are “documented” in methods.  Although we can do this for some of the code, we saw that some of the smells make this impossible for other pieces.

This approach doesn’t work as well as I hoped for this particular method, but if you’re facing a similar problem you may find simply extracting methods for logical chunks of functionality increases readability and lets you refactor each smaller section as and when it makes sense to do so.  You may even be able to move some of these methods into different classes to improve the domain model.

Approach 2: Work on One Smell at a Time

This is the approach we ended up taking for the articles.

Firstly, we “fixed” the null problem by using Optional to document that the MappedField mf may not have a value.

Secondly, we identified the issues caused by mutability and addressed many of those.

Thirdly, we managed to split out parts of this method into their own methods, the first was enabled by reducing the amount of mutable state and the second by identifying the two different functions the method was performing and replacing a parameter with explicit methods.

Finally we could take a look at the meat of the method, which was the long series of if statements.  We worked out that some of them were being used for flow control, which we replaced with a traditional for loop (incidentally removing the “smell” of having to turn off checkstyle) and simplified the if statements to something easier to reason about.

At the end of all of this, we had simplified this method to something that fits on a single screen, and it is easier to reason about.  This version is even shorter than that in the last article, thanks to several readers who pointed out further simplifications.

This code still demonstrates several smells, and can benefit from further refactoring, but it’s a definite improvement on the original.

What we haven’t really done is address some of the larger problems, particularly around the domain model being difficult to work with. It may be the case that it’s hard to make further improvements to the method without addressing underlying design issues.  We did introduce a new class, ValidatedQuery, which may help us to evolve our model in the right direction, but ultimately we didn’t address some of the root problems that caused this in the first place. For example, the need to iterate over a String to identify a particular MappedField (instead of having a model that allows us to navigate directly to a field from a path); and the fact that it’s very hard to get a MappedClass for a MappedField, even though in theory every field lives in some class.  Our process of working on smaller, lines-of-code-level smells may have helped us get closer to identify these design smells, but they haven’t given us much of an opportunity to do anything about them.

Approach 3: Step Back and Try to Model the Problem

I’m not going to cover this in any real detail, but if you’re interested my first serious attempt to refactor this method can be found in this branch here.  With this approach, instead of looking at individual smells (e.g. mutation) or individual lines of code (e.g. how can I simplify this if statement), I looked at what the code was doing and tried to model it more accurately.  For example, I introduced an enumeration to represent the for loop, and moved any code that was responsible for iteration logic into that class. I introduced a FieldPathElement to encapsulate all the information about each thing we were iterating over.

I was happy with this for a while, particularly as I could imagine some of the new types (which I introduced as inner classes but intended to set free as “real” classes) could be valuable for refactoring other areas of the code.  But as I went further and further,  I effectively moved all the complexity that was in validateQuery into my new classes – more or less the same complexity existed, just in different places.

This approach can be a good way of encouraging a bit of redesign, and may lead to new classes that can help you to refactor other areas of your code. But if you’re not careful you may find you have all the same problems as before, just in different places.

In Summary

The hardest thing about tackling complex code (other than working out what the code is actually supposed to do) is figuring out where to start.  It can be really overwhelming to see blocks of code with multiple problems, and it’s hard to know what’s the best thing to do.

I have listed three different approaches here, and there are no doubt many more (please feel free to suggest them in the comments).  Each approach has its pros and cons, and some developers will be more comfortable with one than another.  Despite my preference for approach 3, as I think it is a) fun to model problems “correctly” and b) good to attempt to tackle design-level issues, I actually found approach 2 much easier and gave me simpler code.  It was also less risky and easier to reason about the impact of each step.

There’s nothing to stop you combining approaches either.  As a result of tackling specific code smells, I was then able to apply step 1 to break the problem into smaller pieces.  Now the core of the method is simpler and easier to reason about, I may have more success at applying approach 3 (which we already made a start on with the introduction of ValidatedQuery) to help improve the domain model.

Symptoms

  • Very long and/or complicated methods or classes
  • Seeing more than one smell when you first glance at the code
  • That feeling of “Argh! What’s this???”

Possible Solutions

  • Break the code into smaller sections
  • Attempt to fix just one smell at a time
  • Introduce new domain objects to model what’s happening in the code
  • Rename variables/fields, and extract very small well-named methods, to “document” what the code’s doing as you work with it. Even writing comments can help.
  • As you discover what features the code is providing, and even more importantly as you identify edge cases it may be trying to cater for, write tests to document these.
  • Combining approaches may lead to the best version of the code

This entry was posted in Tips & Tricks and tagged , . Bookmark the permalink.

8 Responses to Code Smells: Too Many Problems

  1. Ted Vinke says:

    Great read, Trisha!

    Maybe another tip: if you find yourself refactored into a dead alley, go back a few steps and try again. You can undo any refactoring and/or restore to a previous version from version control.

    • Trisha Gee says:

      Yes that’s a great tip! One of the things that’s REALLY important with refactoring is to keep the commits tiny, very frequent, and to be sure that they pass all the tests. That way you can easily step back, as you say, if you end up somewhere you don’t want to be.

      I also used Git branching A LOT for refactoring this method, I needed to try out a number of different things. I think I used something like 15 or more branches at different stages to experiment.

  2. PKua says:

    There seems to be a problem with this post. The link from Facebook is leading to If Statements and the there’s no link to this post in table of contents (just plain text). I managed to get here only after googling it.

  3. Stefan Rother-Stübs says:

    Great Article. This is the first article I can remember reading about practical problems during refactoring apart from “how to refactor this piece of code” articles.

  4. Joseph M. Rice says:

    Some things I’ve found helpful over the years to avoid this exact problem, and write cleaner and more structure code.

    use the rule of 3 – Don’t go past 3 levels of indention in a method/function. if you have to go past 3 levels past your method/function level, time for another method/function to break down the problem into smaller testable chunks.

    fail early fail often – try to structure your code to fail early and fail often. for example instead of If (success = true) { … //do stuff } else { … //failure logging} , go with if (!success) { … //failure logging; //return error state;} // success state; … // do stuff. this will help with the rule of 3, plus keep your code more linear and easier to read. It will also keep your head on track with any manual memory allocations (if you roll with a no GC language).

    80/120 – Don’t write a bunch of code on a single line, try to limit to your code to 80 or 120 characters per line. This will help you read the code later in a terminal, or split screen in your editor of choice. this helps so that you don’t have to continually scroll to the right to see what is going on in the code.

    Comments – Yes write comments, it helps you keep you state of mind so you can get right back into the “flow” you don’t have to write comments per line, but the important bits. Like describing what a “prop” is for example. Document your method/functions purpose, parameters and returns. This is important as that if this is a social contract to your self and other developers on the exact intended purpose of your method/function. It’s also a license to break fingers for anyone who changes the contract without a very good reason for changing behavior.

    the GOTO – this applies to C programming. Use the goto statement in your failure states. if (!success) { // log error; ret=failure_code; goto failure_state;} // do stuff; failure_state: // free memory here; return ret; The goto statement is not a bad idea, no-no, or a bad smell; as long as you reserve it to keeping memory safe state. think of it like a catch in a language that does not support try.

    • Trisha Gee says:

      All good advice. It would be much better if everyone wrote clean code to begin with rather than have to refactor this sort of thing.

  5. Hi Trisha! Great post!

    Just one point, on the last code block of the for loop, it is possible to invert the first “if” to not nest it to the other ones, like this:

    BTW, when facing large and complicated methods, I like the approach of extracting code blocks into smaller methods and improve each one of these in baby steps, always running unit tests to ensure the refactoring is safe. Sometimes I inline the new version of the code back whether it is simple enough or move it to the domain model to become more object-oriented.

Leave a Reply

Your email address will not be published. Required fields are marked *