Analyzing Dataflow with IntelliJ IDEA

Upcoming IntelliJ IDEA version, Maia brings you an improved version of the Dataflow to this feature and the completely new Dataflow from this.

Here I am describing how these features work and how they can help you the Code Archeologists better understand your code.

I am using Apache Tomcat source code as an example.
Let’s have a look at the SingleSignOnMessage class and its obscure String authType field.

This is the first obvious question which comes to my mind and which IntelliJ IDEA can help to answer:

Where are values assigned to this field are coming from?

To reveal the mystery, click the field and invoke the Dataflow to this action from Analyze menu.

You can refine the search scope, e.g. to ignore all values coming from the test code.

Now, the traces of values assigned to this field are shown in the following tool window, organized in a tree-like hierarchical fashion.

First lines of the tree mean the following:

The value for the field String authType in SingleSignonMessage comes from

this.authType = authType assignment statement in SingleSignOnMessage.setAuthType() method

called from ClusterSingleSignOn.register() method with authType parameter

passed as the ClusterSingleSignOn.registerLocal() method argument

obtained from the msg.getAuthType() expression in ClusterSingleSignOnListener.messageReceived() method

… and so on.

The more you expand the tree, the deeper you dig in the chain of assignments and method calls that all lead to our field in question.

Notice the nodes with gray background — they denote duplicates, that is, the usages that are already present in the tree in another location. This can occur, for example, when a value comes from a single source (e.g. returned by a function), but is assigned to a field in several different ways, e.g. initialization in a constructor and a setter method call.

Duplicate nodes are highlighted to help differentiate paths we already inspected from those we did not.

And here is where we get another question:

What are all possible values our field can have?

To help you with that, IntelliJ IDEA can analyze all the paths in data flow hierarchy and group them by value — just click Group by leaf value button:

This is what we have found in our example:

Now it’s obvious that values of the field authType come from String final fields declared in the Constants class. Plus, the field can have a null value with which it is initialized.

Of course, IntelliJ IDEA cannot always determine the actual values a field can have. When we try to find all originating values for the password field in the same class SingleSingnonMessage, we’ll get the following:

As you see, the possible values are:

  • null — initial value
  • session.getNote(Constants.SESS_PASSWORD_NOTE) — looks like password retrieved from the session and IntelliJ IDEA has no knowledge what it actually is.
  • new String(buf, colon + 1, authorizationCC.getEnd() – colon – 1) — some pretty complex computation involving encoded byte buffer we’ve retrieved from an authorization request. Again, we can know only this much.

And the final question we might be interested in:
What are the places this expression can flow into?
For example, we might want to know whether this variable, holding huge amount of data will ever be stored as a field anywhere?

Or, if I passed null here, will I eventually get a NullPointException anywhere in my code?

This is where the Dataflow from this action from Analyze menu is helping us. Let’s try it on a method call:

We get a hierarchical view similar to what we seen before, but the values flow in the opposite direction, from the method parameter.

As we can see, null argument can potentially wind up in the place of context parameter in the ELResolver.message() method, where it will immediately be dereferenced, meaning that we just found a path that leads to a potential NullPointException. Though, I am pretty sure it will never occur in real world :)

This entry was posted in New Features and tagged , . Bookmark the permalink.

15 Responses to Analyzing Dataflow with IntelliJ IDEA

  1. Marcus Brito says:

    Heh, looks like I’m not the only one still fond of the old theme.

    This is one of the best features I’ve seen so far. This alone would justify IDEA’s price tag for any serious developer. Tracing dataflow is a tedious task that you need to perform often, particularly when working with third party code — and when you’re working with a large team, all code is potentially third party (as in, not yours).

    Great work, guys. I hope to see this feature trickle down to other languages (*cough* Ruby *cough*) as well.

  2. Alexander Alekhin says:

    What is number of build which has this feature?

  3. Great upgrade, folks. Here’s a wish: although I’m already a heavy user of the old-style ‘dataflow to this’, you lost me on ‘dataflow from this’. I understand what it does, but the naming is making it harder than it is. Knowing how hard it can be at times, still, mind producing a nicer and ‘simpler’ name for the ‘from this’ dataflow?

  4. Alexey Kudravtsev says:

    The feature available since build #10626

  5. Alexey Kudravtsev says:

    Andrew, what do you suggest?
    I’ve come across the ‘backward slicing’ and ‘forward slicing’ names for the dataflow to/from actions respectively, see http://people.brunel.ac.uk/~csstmmh2/exe1.html
    But I am not sure whether they are any more understandable.

  6. Taras Tielkes says:

    Alexey, great work.

    Can’t wait to get my hands on #10626 and try it out.

  7. Hi Alexey,

    ‘Slicing’ is great, but still in the domain of code forensics :) Not trying to win you over with this, but maybe ‘dataflow to usages’ is more helpful? That is, if dataflow is still on the table at all.

    Cheers!
    Andrew

  8. Tero says:

    This is just great! Keep up the good work guys!

    Looks like Maia is going to be the version even I’d find something useful for my daily work for long long time. I’ve purchase all versions from v6 to v9 already, but only with Maia I’m having hard times to wait the release to be released ;)

    Cheers,
    -Tero

  9. dsha says:

    How about adding a verb to the feature name? Something like ‘Trace dataflow from’ and ‘Trace dataflow to’. This would actually be consistent with the menu where it appears, for everything else there starts with a verb.

  10. AlexL says:

    Alexey,

    This looks very impressive! I love to see IDEA developing more analysis and refactoring tools for java, compared to all the effort on non-java languages and integrations in the past releases.

    One question: Will the analysis use multiple threads? On a large code base, this kind of analysis can take time, so it would be great if it can take advantage of multi-core processors.
    Thanks,

  11. Dirk Dittert says:

    Please keep up the good work! Blog posts like this one are very important to learn the full power of IDEA!

    How about creating a screencast for this feature?

  12. AlexL says:

    Re: the naming of the commands, I immediately understood what “Dataflow to this” and “Dataflow from this” meant. By explicitly mentioning the direction tot he point of reference (this), it was very clear.

    In contrast, I’m always confused between “Analyze Dependencies” and “Analyze Backward Dependencies”, and I think the issue there is it doesn’t explicitly indicate the direction to/from the object in question, e.g. “Dependencies on this” or “Dependencies of this”.

    So, please whatever you do, don’t name it:

    “Analyze Forward Dataflow”
    “Analyze Backward Dataflow”

    You can talk yourself into doing that, reasoning that the “to this” and “from this” are implicit, e.g.
    “Analyze Forward Dataflow” [to this]
    “Analyze Backward Dataflow” [from this]

    But after awhile you forget about the “to this” and “from this” and then it starts getting confusing. I would much rather make the direction to/from the reference point explicit.

    If you want to make it consistent with the other menu items, how about this:
    “Analyze Dataflow to this”
    “Analyze Dataflow from this”.

    My 2cent. Thanks,

  13. Eugene Kirpichov says:

    Guys, IDEA is rapidly rising to the first positions in my personal list of most respected programs; so is JetBrains. This is simply terrific, please keep up the recent spree of inspiration that seems to go into MAIA.

  14. David Castañeda says:

    This funtionality is great, ….

    I have a question… how can I see the flow to a method because this is only for data.

    Say for example something like if I search for mx() it shows…

    ma() -> mb() -> mx()
    mc() -> md() -> me() -> mx()

    So I can determinate all places in application code that, if a change is made mx(), tell me what can be affected.

    If there is no such an option at this time, do you think it can be done with a plugin?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">