Analyzing Dataflow with IntelliJ IDEA
Here I am describing how these features work and how they can help you the Code Archeologists better understand your code.
I am using Apache Tomcat source code as an example.
Let’s have a look at the SingleSignOnMessage class and its obscure String authType field.
This is the first obvious question which comes to my mind and which IntelliJ IDEA can help to answer:
Where are values assigned to this field are coming from?
To reveal the mystery, click the field and invoke the Dataflow to this action from Analyze menu.
You can refine the search scope, e.g. to ignore all values coming from the test code.
Now, the traces of values assigned to this field are shown in the following tool window, organized in a tree-like hierarchical fashion.
First lines of the tree mean the following:
The value for the field String authType in SingleSignonMessage comes from
this.authType = authType assignment statement in SingleSignOnMessage.setAuthType() method
called from ClusterSingleSignOn.register() method with authType parameter
passed as the ClusterSingleSignOn.registerLocal() method argument
obtained from the msg.getAuthType() expression in ClusterSingleSignOnListener.messageReceived() method
… and so on.
The more you expand the tree, the deeper you dig in the chain of assignments and method calls that all lead to our field in question.
Notice the nodes with gray background — they denote duplicates, that is, the usages that are already present in the tree in another location. This can occur, for example, when a value comes from a single source (e.g. returned by a function), but is assigned to a field in several different ways, e.g. initialization in a constructor and a setter method call.
Duplicate nodes are highlighted to help differentiate paths we already inspected from those we did not.
And here is where we get another question:
What are all possible values our field can have?
To help you with that, IntelliJ IDEA can analyze all the paths in data flow hierarchy and group them by value — just click Group by leaf value button:
This is what we have found in our example:
Now it’s obvious that values of the field authType come from String final fields declared in the Constants class. Plus, the field can have a null value with which it is initialized.
Of course, IntelliJ IDEA cannot always determine the actual values a field can have. When we try to find all originating values for the password field in the same class SingleSingnonMessage, we’ll get the following:
As you see, the possible values are:
- null — initial value
- session.getNote(Constants.SESS_PASSWORD_NOTE) — looks like password retrieved from the session and IntelliJ IDEA has no knowledge what it actually is.
- new String(buf, colon + 1, authorizationCC.getEnd() – colon – 1) — some pretty complex computation involving encoded byte buffer we’ve retrieved from an authorization request. Again, we can know only this much.
And the final question we might be interested in:
What are the places this expression can flow into?
For example, we might want to know whether this variable, holding huge amount of data will ever be stored as a field anywhere?
Or, if I passed null here, will I eventually get a NullPointException anywhere in my code?
This is where the Dataflow from this action from Analyze menu is helping us. Let’s try it on a method call:
We get a hierarchical view similar to what we seen before, but the values flow in the opposite direction, from the method parameter.
As we can see, null argument can potentially wind up in the place of context parameter in the ELResolver.message() method, where it will immediately be dereferenced, meaning that we just found a path that leads to a potential NullPointException. Though, I am pretty sure it will never occur in real world 🙂