Tips & Tricks

Striving For Better C++ Code, Part II: Function Summaries to Speed Up the Data Flow Analysis

This is the second blog post in the series dedicated to Data Flow Analysis (DFA) and its implementation in CLion. Read the first part here:

Introducing function summaries in data flow analysis

In CLion 2023.3, which is currently in Early Preview, we’ve completely reworked CLion’s DFA to make the analysis faster and more accurate. We’ve implemented something called a function summaries approach, which allowed us to distinguish different function contexts for arbitrary nested call chains, resulting in more accurate analysis. Let’s take a look at how this was done.

In the previous CLion release, our DFA was already interprocedural. It handled function calls and the flow of data through functions’ arguments/parameters and return values. For each function, it formed several “external” contexts defined by the function call site. That means all call sites for a given function were treated differently. Although that approach did have greater precision than the context-insensitive analysis (which doesn’t take the call site into account), it had several disadvantages.

First, only contexts of depth 1 were handled. If you called function A from another function B and then called function B from several call sites, their contexts were merged and that led to a loss of accuracy:

void callee() {}

void caller1() {

void caller2() {

Second, there were too many contexts. Obviously, for any non-trivial program, there are too many call sites. This led to slow analysis and high memory usage.

The current function summaries approach uses “internal” contexts rather than “external”. These “internal” contexts are defined by the function itself rather than its callees. If a function depends on certain input values (parameters, global variables, fields, or conditions on these input values), then it has several contexts – one for each input. For example, the following function has two contexts, one for c.x > 10 and another one for c.x <= 10:

struct C {
   int x;

int foo(C c) {
   if (c.x > 10)
       return 1;
       return 2;

These contexts don’t depend on the function’s call sites and completely describe the function’s semantics. The behavior of the function in different contexts is usually called the function summary.

The introduction of function summaries helped us make analysis faster and more precise.

Adopting this approach also allowed us to support fields in our analysis. Now, the summaries of each function keep track of which fields are input and which are output, so the heap of the program is now modeled more precisely.

All the described changes made our analysis truly interprocedural and simultaneously more performant. To demonstrate the power of the new analysis in CLion, let’s look at the following example:

class Tree {
   Tree *left;
   Tree *right;
   int value;

   Tree(Tree* l, Tree* r, int val): left(l), right(r), value(val) {}
   explicit Tree(int val): Tree(nullptr, nullptr, val) {}
   int getValue() const { return value; }
   Tree* getLeft() { return left; }
   Tree* getRight() { return right; }

void test() {
   auto *root = new Tree(new Tree(new Tree(1), new Tree(2), 3), new Tree(4), 5);
   /*    5
    *  3   4
    * 1 2

   if (root->getLeft()->getRight()->getValue() == 2) //Condition is always true
   if (root->getRight()->getLeft()->getValue() == 2) //Pointer may be null

All existing analyses also benefit from the changes. The Array index is out of bounds inspection now handles arrays passed through fields and parameters, as is also the case for Null dereference, Constant conditions, and other inspections. There is also a new check called Field was not initialized, which keeps track of the fields whose values were not initialized before their usage:

Init fields

Reworking low-level data structures

Under the hood, the DFA in CLion uses a graph-like data structure, which represents the program at a rather low level. Even for an innocent-looking function, CLion ends up building a pretty large graph, and it gets larger for bigger functions. To give you an example: One of the code samples that we benchmark the DFA against is the P_DamageMobj function from DOOM/p_inter.c. The execution can take many branches, so it takes 1.66 million intermediate graph nodes to analyze the function and obtain a summary. Note that the total number of nodes needed to represent the function summary is much smaller – for this function, it is only 11,000.

One of the significant improvements that we’ve introduced is to the way these nodes are stored in memory. For a long time, each node was represented by a regular Java object allocated on the JVM heap. When the space used for bookkeeping is factored in, the estimated memory footprint of each node ended up close to 56 bytes.

As a result of our internal rework, we were able to pack the entire node object into a single 64-bit primitive, and allocate the nodes within a single array as a result. The memory footprint per node was reduced to 32 bytes (again with bookkeeping included), a 43% decrease. But more importantly, this change greatly reduces the pressure on the JVM garbage collector and improves the CPU cache locality, resulting in improved performance, too. Our benchmarks showed an average analysis time reduction of 65% to 70%, thanks to less indirection and a more cache-friendly way of storing the data.

We were able to achieve an additional speed boost by changing the bookkeeping structures mentioned above. Without going into much detail, we were able to bypass storing the majority of graph nodes in a hash map, bringing the average node footprint from 32 bytes down to 11 bytes and reducing the average time by an additional third. This led to an overall average time reduction of up to 80%, equivalent to a 5x boost in performance.

Long Benchmarks

These two low-level changes also resulted in a 82% reduction in memory consumption, all while using the same high-level algorithm.

Final optimization results

The transition to summaries, together with the internal low-level optimizations, brings great improvements both to analysis accuracy and to memory consumption and performance. Here is the final comparison of the analysis time on some subset of files from several open-source projects showing the combined speed-up:

Project Before After Performance
LLVM 120.2 s 80.7 s x1.5
OpenJpeg 155 s 32.5 s x4.8
Z3 132 s 31.5 s x4.2
SoundTouch 4.2 s 1.1 s x3.8
Zlib 48.6 s 4.9 s x9.9
Doom 61 s 9.1 s x6.7

Try out

You can try out these improvements in CLion 2023.3 Release Candidate or in CLion Nova.


image description