Striving For Better C++ Code, Part I: Data Flow Analysis Basics
CLion comes with a built-in data flow analyzer, which runs constantly when you are writing your code and helps improve your code’s quality. It can reveal various code problems that might later lead to runtime issues, security breaches, and other vulnerabilities. Examples of these useful checks are checks for constant conditions, dead code, null pointer dereferences, memory leaks, and array index issues. We’re starting a series of blog posts to explain how some of these inspections work in CLion.
- Striving For Better C++ Code, Part I: Data Flow Analysis Basics (this one)
- Striving For Better C++ Code, Part II: Function Summaries to Speed Up the Data Flow Analysis
Today, we’ll look at the basics of data flow analysis, including how it works in general, while presenting several real-world examples where it can help you write better code.
Control Flow Graph
All data flow inspections rely on the control-flow graph. This is a graph on which vertices are the statements in the program and edges are the control flow jumps between these statements (direct code execution, conditional jumps, loops, breaks, gotos, etc.).
For example, the control-flow graph at the right represents the function
foo on the left:
CLion builds the corresponding graphs for each function. Each graph has one start node and one exit node, which correspond to the function’s entry and exit. By visiting the nodes of this graph from the start node towards the exit node, CLion can collect some valuable information.
For instance, CLion remembers which values may be stored in each variable for each statement. In the example above, CLion knows that at nodes 0 and 1, the parameter
x always equals 1. This is because there’s only one call site for the function
foo, which passes the value 1 in the argument. This being the case, CLion concludes that the condition
x == 1 at node 1 will always be
true, and so the control flow never goes to node 3. In node 4, the variable y may only hold the value 2, since the control flow may come only from node 2 and never from node 3. Thus, CLion concludes that:
fooalways returns the value 2
x == 1is always
y = 3is never reachable
Now, let’s look at a more complex example:
Here we have two
if blocks, and the way the first block is executed influences the functionality of the second block. To support this kind of evaluation, CLion splits the exit statements of the
if statement into two different contexts:
The subsequent nodes of the control flow graph are duplicated. They appear twice – one for the
Then branch of the
if statement and the second for the
Else branch. In the first “clone” variable,
x holds the value 1 (since it corresponds to the positive branch of the
if statement) and
y holds the value 2 (which was stored in the node 2). In the second “clone”,
x ! = 1 and
y is 3.
The second condition
x == 1, corresponds to the two cloned nodes 4 and 5. In node 4, the condition always holds true, since
x == 1. Meanwhile, in node 5, it is always false. Hence, nodes 8 and 10 are never reachable, and condition
y == 2 has only one reachable clone – node 9. In this node
y ! = 2, and hence this condition is always false.
Data flow analysis in action
Let’s see how these techniques help CLion find subtle bugs in C++ programs! We decided to analyze the Z3 theorem prover, and here are the findings from our data flow analysis in CLion.
Here, the variable
u is initialized to
null_lpvar and then possibly reassigned to the same value (because
j == null_lpvar in the
if condition). Hence the condition
u == null_lpvar is always
true. Since there is a return in the
true branch of this
if clause, all subsequent code is marked as non-reachable (reported as #6951):
Another case can be found below. Here, the unsigned variable
i is always equal to or greater than zero and, in the
else branch, it is non zero. Hence, the
i > 0 condition is always true (reported as #6952):
In this blog post we have covered one of our dataflow inspections – Constant conditions. There are many other dataflow inspections produced by CLion and we will cover some of them in upcoming blog posts. In which cases do you find code analysis useful? Share your examples with us in the comments below!
Subscribe to Blog updates
Thanks, we've got you!
Striving For Better C++ Code, Part II: Function Summaries to Speed Up the Data Flow Analysis
This is the second blog post in the series dedicated to Data Flow Analysis (DFA) and its implementation in CLion. Read the first part here: Striving For Better C++ Code, Part I: Data Flow Analysis Basics Striving For Better C++ Code, Part II: Function Summaries to Speed Up the Data Flow Analysi…
IDE Features Trainer and Other Improvements in CLion 2023.3 EAP4
Brush up your knowledge of essential IDE functionality easily by using the brand-new IDE Features Trainer.
Introducing the New CLion Conan Plugin
We are thrilled to announce the launch of the new version of the Conan CLion Plugin, now compatible with the 2.X versions of Conan.