Go logo

GoLand

A cross-platform Go IDE with extended support for JavaScript, TypeScript, and databases

Data Flow Analysis for Go 

GoLand 2023.3 comes with support for data flow analysis (DFA). In this post, we’ll introduce the feature, explain how it works, and show some real-world examples of how DFA can detect bugs on the fly!

Thanks to the CLion team for helping us by porting their powerful DFA engine. For now, the GoLand engine only implements a limited number of DFA features, but more will be added in subsequent releases. The CLion team has also covered a variety of other topics, including a deeper dive into implementation specifics, on their blog.

What is data flow analysis?

DFA is a type of static code analysis that analyzes how data flows through a program. In basic terms, it calculates the possible values of variables at different points in the program’s execution. With this information, you can find various potential bugs, such as nil dereferences, endless loops, constant conditions, and other incorrect or atypical program behavior.

Let’s look at a straightforward example of DFA in action in the form of a simple function:

func dummy(initializeResource bool) {
1    var resource *Resource = nil
2    init := false
3   
4    if initializeResource {
5       resource = new(Resource)
6       init = true
7    }
8 
9    r := resource
10   _ = r.Name
   }

A common way to perform DFA starts with building a control flow graph (CFG) of the analyzed function. The CFG of this dummy function is provided below. For clarity, all statements (lines of code) are numbered according to their corresponding code snippet.

You can think about CFG as a simple graph that reflects the function’s execution. The graph nodes correspond to code blocks, and the edges reflect conditional and unconditional jumps between them. You don’t need to know the exact formal definition of a CFG or how to build them for this article, but if you’d like to learn about CFGs, you can visit this link.

Once the CFG has been built, the main stage of the analysis can begin. During this stage, the DFA computes all of the possible values of variables that can follow each function statement. Roughly speaking, this can be done by propagating values through statements (such as assignments) and the edges of the CFG, taking into account the reachability of the CFG’s nodes.

For example, the possible value set of the variable r after statement 9 is {nil, new(Resource)}. The nil value is obtained from statement 1 by propagating it through the path 1 -> 2 -> 4 -> 9. The new(Resource) value is obtained by propagating it through the path 1 -> 2 -> 4 -> 5 -> 6 -> 9, taking into account assignments on lines 5 and 9.

We can use the information obtained about variable values to find potential bugs in the corresponding programs. For example, since we know that, after statement 9, the variable r can be nil, that means there can be a nil dereference on line 10.

Challenges of data flow analysis

As an introduction to the potential pitfalls and difficulties associated with implementing data flow analysis, let’s take a look at a slightly modified version of our previous example function:

func dummy(initializeResource bool) {
1    var resource *Resource = nil
2    init := false
3   
4    if initializeResource {
5       resource = new(Resource)
6       init = true
7    }
8 
9    r := resource
10   if init {
11      _ = r.Name
12   }
13 }

The corresponding CFG looks like this:

DFA, like other static analysis tools, over-approximates the behavior of programs. In our case, this means that DFA computes the upper bound of possible variable values. In this example, data flow analysis infers that the possible value set of the variable init in statement 10 is {false, true}. Hence, both the branches of the condition on line 10 are reachable, which means an execution can reach statement 11. In statement 11, the value set of variable r is {nil, new(Resource)}. Thus, we can infer that there is a potential nil dereference on line 11. 

But that’s not really true. In fact, the variable init can take both true and false values at statement 10. However, the reachability of the condition init == true also depends on the condition initializeResource == true. If the latter is met, then init can only take the value true, and if it isn’t met, then init can only be false. To identify such cases, we must use contexts. Let’s assume that we’re analyzing a function in two different contexts. The first context corresponds to a case where initializeResource is true, the second one corresponds to a case where initializeResource is false. These contexts are most easily described as a clone of the CFG:

Control flow graph 2

As you can see, there are two different contexts (surrounded by a dashed border). In each context, we can analyze statements 9, 10, and 11 in different ways. For example, in statement 10 of the context initializeResource == true, the variable init can only take a true value, and variable r can only take a new(Resource) value. Therefore, in this context, statement 11 is reachable, but a nil dereference isn’t possible. In statement 10 of the context initializeResource == false, variable r can take a nil value. Since the value of init can only be false, statement 11 is not reachable in this context and therefore a nil dereference isn’t possible. As such, a nil dereference cannot occur in either context.

Using contexts in static analysis allows us to improve the quality of the analysis and weed out false-positives. To support this, exit statements of the if statement are split into two different contexts, duplicating the subsequent nodes of the control flow graph and analyzing them independently to identify all the possible data paths.

The capabilities of data flow analysis in GoLand

Constant conditions detection. Constant conditions represent a crucial type of data flow inspection. The constant condition inspection uses the DFA execution data to determine if certain conditions are constant. Here are two examples:

Example 1: In this example, DFA has deduced that the condition err != nil is always false. To show that this is indeed the case, let’s consider what values the err variable can take in the condition on line 191. There are two main cases – when allF is true and when allF is false after the for loop is executed. If allF is true after the for loop is executed, then err will be nil on line 191, otherwise there will be a return from the function on line 186. The remaining eventuality, when allF is false after the for loop is executed, can only happen if line 177 is reachable. After line 177 is executed, we can be sure of two things: firstly, that err is nil (otherwise the execution of the loop would have continued on line 175), and secondly, that the execution of the loop has been interrupted, and therefore the variable err won’t be assigned any other value. Hence, in cases where allF is false, the variable err can only ever be nil. Thus, the condition err != nil is always false


Example 2: Here is a simpler example in which DFA deduces that the condition r0k != nil on line 193 is always true. This happens because the implicit dereference r0k.License is present on line 191, after which the variable r0k cannot be nil. Although the derived constant condition inspection does not accurately show a real problem in the code, it reveals strange behavior in the program. In fact, issues could become apparent at runtime after the potential nil dereference of the variable r0k on line 191, as the author of the code implies that r0k can be nil.

These examples show how the constant condition inspection can allow you to identify peculiar points or strange behavior in the program’s code.

Potential nil dereference. DFA can detect a nil dereference for a variable even in code that seems absolutely normal to the naked eye. Let’s see how this is possible:

Example 3: In this example, DFA infers that there can be a nil dereference of the variable conf on line 273. This may seem strange because the code takes into account cases where the variable conf is nil. In such a case, there should be a return from the function on line 269, and so dereference of the conf variable shouldn’t be reachable. However, there are actually two different conf variables. The first one is declared on line 249 and the second on line 261. Thus, the last declaration shadows the previous one.

This can lead to a nil dereference. Let’s assume that the first conf variable (line 249) is nil but the second conf variable (line 261) is not nil, and let’s also assume that the corresponding error variable is nil. In this case, there is no return from the function on lines 262 and 269, which means we reach the conf dereference on line 273. This dereference corresponds to the first conf variable and causes issues at runtime. For this reason, we should be careful with nested short variable declarations!


Example 4: DFA deduces a potential nil dereference of the variable pod on line 260. This function takes into account the nilablility of the variable pod (there is a condition pod != nil on line 254), so the author of the code implies that the variable pod can be nil within the if statement. But in this case, there will be a nil dereference on line 260.

Error may be not nil. This inspection reports cases in which variables might have nil or an unexpected value because of the associated error that is not checked for being non-nil.

The analysis is currently intra-procedural and does not consider user-imposed contracts on the function. Therefore, in specific situations, there may be false positives. For cases such as these, you can use a quick-fix to ask the DFA not to analyze or report these errors.

func _() {
    file, err := os.Open("file.txt")
    // Error check is omitted here
    name := file.Name()
    print(name, err)
}

In the example provided, the variable file could either have the value nil or an unexpected value if err is not nil.

Try DFA for yourself!

You can try out these improvements now in the GoLand 2024.1 Release Candidate or wait for the 2024.1 version. It’s also available for 2023.3 users in early access, but is disabled by default. To enable it, go to Settings | Editor | Inspections | Go | Data Flow Analysis (experimental)

image description