{"id":459890,"date":"2024-03-26T12:31:42","date_gmt":"2024-03-26T11:31:42","guid":{"rendered":"https:\/\/blog.jetbrains.com\/?post_type=go&#038;p=459890"},"modified":"2024-03-26T13:16:06","modified_gmt":"2024-03-26T12:16:06","slug":"data-flow-analysis-for-go","status":"publish","type":"go","link":"https:\/\/blog.jetbrains.com\/zh-hans\/go\/2024\/03\/26\/data-flow-analysis-for-go","title":{"rendered":"Data Flow Analysis for Go\u00a0"},"content":{"rendered":"\n<p>GoLand 2023.3 comes with support for data flow analysis (DFA). In this post, we\u2019ll introduce the feature, explain how it works, and show some real-world examples of how DFA can detect bugs on the fly!<\/p>\n\n\n\n<p>Thanks to the CLion team for helping us by porting their powerful DFA engine. For now, the GoLand engine only implements a limited number of DFA features, but more will be added in subsequent releases. The CLion team has also covered a variety of other topics, including a deeper dive into implementation specifics, <a href=\"https:\/\/blog.jetbrains.com\/zh-hans\/clion\/2023\/11\/striving-for-better-cpp-code-part-i-data-flow-analysis-basics\">on their blog<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is data flow analysis?<\/h2>\n\n\n\n<p>DFA is a type of static code analysis that analyzes how data flows through a program. In basic terms, it calculates the possible values of variables at different points in the program\u2019s execution. With this information, you can find various potential bugs, such as <code>nil<\/code> dereferences, endless loops, constant conditions, and other incorrect or atypical program behavior.<\/p>\n\n\n\n<p>Let\u2019s look at a straightforward example of DFA in action in the form of a simple function:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\">func dummy(initializeResource bool) {\n1    var resource *Resource = nil\n2    init := false\n3   \n4    if initializeResource {\n5       resource = new(Resource)\n6       init = true\n7    }\n8 \n9    r := resource\n10   _ = r.Name\n   }<\/pre>\n\n\n\n<p>A common way to perform DFA starts with building a control flow graph (CFG) of the analyzed function. The CFG of this dummy function is provided below. For clarity, all statements (lines of code) are numbered according to their corresponding code snippet.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1600\" height=\"730\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2024\/03\/tApzCFHtxLztP0Qc7TUhClAX7OR8pvZgtl9EJfKY1iEKI8qglOMkwkR8nrcQi7crWjdqmig2oAp6UywTZfSdfy9ireAdxANLVUfHf0a_Ikq8KVvUGLSQGSTKD8d6.png\" alt=\"\" class=\"wp-image-460531\"\/><\/figure>\n\n\n\n<p>You can think about CFG as a simple graph that reflects the function\u2019s execution. The graph nodes correspond to code blocks, and the edges reflect conditional and unconditional jumps between them. You don\u2019t need to know the exact formal definition of a CFG or how to build them for this article, but if you\u2019d like to learn about CFGs, you can visit this <a href=\"https:\/\/ics.uci.edu\/~lopes\/teaching\/inf212W12\/readings\/rep-analysis-soft.pdf\" target=\"_blank\" rel=\"noopener\">link<\/a>.<\/p>\n\n\n\n<p>Once the CFG has been built, the main stage of the analysis can begin. During this stage, the DFA computes all of the possible values of variables that can follow each function statement. Roughly speaking, this can be done by propagating values through statements (such as assignments) and the edges of the CFG, taking into account the reachability of the CFG\u2019s nodes.<\/p>\n\n\n\n<p>For example, the possible value set of the variable r after statement 9 is <code>{nil, new(Resource)}<\/code>. The <code>nil<\/code> value is obtained from statement <code>1<\/code> by propagating it through the path 1 -&gt; 2 -&gt; 4 -&gt; 9. The <code>new(Resource)<\/code> value is obtained by propagating it through the path 1 -&gt; 2 -&gt; 4 -&gt; 5 -&gt; 6 -&gt; 9, taking into account assignments on lines 5 and 9.<\/p>\n\n\n\n<p>We can use the information obtained about variable values to find potential bugs in the corresponding programs. For example, since we know that, after statement 9, the variable <code>r<\/code> can be <code>nil<\/code>, that means there can be a <code>nil<\/code> dereference on line 10.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Challenges of data flow analysis<\/h2>\n\n\n\n<p>As an introduction to the potential pitfalls and difficulties associated with implementing data flow analysis, let\u2019s take a look at a slightly modified version of our previous example function:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\">func dummy(initializeResource bool) {\n1    var resource *Resource = nil\n2    init := false\n3   \n4    if initializeResource {\n5       resource = new(Resource)\n6       init = true\n7    }\n8 \n9    r := resource\n10   if init {\n11      _ = r.Name\n12   }\n13 }<\/pre>\n\n\n\n<p>The corresponding CFG looks like this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1600\" height=\"1165\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2024\/03\/image-63.png\" alt=\"\" class=\"wp-image-460402\"\/><\/figure>\n\n\n\n<p>DFA, like other static analysis tools, over-approximates the behavior of programs. In our case, this means that DFA computes the upper bound of possible variable values. In this example, data flow analysis infers that the possible value set of the variable init in statement 10 is <code>{false, true}<\/code>. Hence, both the branches of the condition on line 10 are reachable, which means an execution can reach statement 11. In statement 11, the value set of variable <code>r<\/code> is <code>{nil, new(Resource)}<\/code>. Thus, we can infer that there is a potential <code>nil<\/code> dereference on line 11.&nbsp;<\/p>\n\n\n\n<p>But that&#8217;s not really true. In fact, the variable <code>init<\/code> can take both <code>true<\/code> and <code>false<\/code> values at statement 10. However, the reachability of the condition <code>init == true<\/code> also depends on the condition <code>initializeResource == true<\/code>. If the latter is met, then <code>init<\/code> can only take the value true, and if it isn\u2019t met, then <code>init<\/code> can only be false. To identify such cases, we must use contexts. Let&#8217;s assume that we\u2019re analyzing a function in two different contexts. The first context corresponds to a case where <code>initializeResource<\/code> is <code>true<\/code>, the second one corresponds to a case where <code>initializeResource<\/code> is <code>false<\/code>. These contexts are most easily described as a clone of the CFG:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1600\" height=\"1225\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2024\/03\/bDv8apZgdOALPpChNGJ524wHHeQwPz83UyM65KSI64jBe9xumOdwewiPkjSEhauZg6Xgzu0yROh8OObBzxQnDfsTUFSwELQvUo8w60lhCd_ZvhidPKauvJNTWhWv.png\" alt=\"Control flow graph 2\" class=\"wp-image-460469\"\/><\/figure>\n\n\n\n<p>As you can see, there are two different contexts (surrounded by a dashed border). In each context, we can analyze statements 9, 10, and 11 in different ways. For example, in statement 10 of the context <code>initializeResource == true<\/code>, the variable init can only take a <code>true<\/code> value, and variable <code>r<\/code> can only take a <code>new(Resource)<\/code> value. Therefore, in this context, statement 11 is reachable, but a <code>nil<\/code> dereference isn\u2019t possible. In statement 10 of the context <code>initializeResource == false<\/code>, variable <code>r<\/code> can take a <code>nil<\/code> value. Since the value of <code>init<\/code> can only be <code>false<\/code>, statement 11 is not reachable in this context and therefore a <code>nil<\/code> dereference isn\u2019t possible. As such, a <code>nil<\/code> dereference cannot occur in either context.<\/p>\n\n\n\n<p>Using contexts in static analysis allows us to improve the quality of the analysis and weed out false-positives. To support this, exit statements of the if statement are split into two different contexts, duplicating the subsequent nodes of the control flow graph and analyzing them independently to identify all the possible data paths.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The capabilities of data flow analysis in GoLand<\/h2>\n\n\n\n<p><strong>Constant conditions detection. <\/strong>Constant conditions represent a crucial type of data flow inspection. The constant condition inspection uses the DFA execution data to determine if certain conditions are constant. Here are two examples:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1987\" height=\"1099\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2024\/03\/1-example.png\" alt=\"\" class=\"wp-image-460482\"\/><\/figure>\n\n\n\n<p><strong>Example 1: <\/strong>In this example, DFA has deduced that the condition <code>err != nil<\/code> is always <code>false<\/code>. To show that this is indeed the case, let&#8217;s consider what values the <code>err<\/code> variable can take in the condition on line 191. There are two main cases \u2013 when <code>allF<\/code> is true and when <code>allF<\/code> is false after the <code>for<\/code> loop is executed. If <code>allF<\/code> is <code>true<\/code> after the <code>for<\/code> loop is executed, then <code>err<\/code> will be <code>nil<\/code> on line 191, otherwise there will be a <code>return<\/code> from the function on line 186. The remaining eventuality, when <code>allF<\/code> is <code>false<\/code> after the <code>for<\/code> loop is executed, can only happen if line 177 is reachable. After line 177 is executed, we can be sure of two things: firstly, that <code>err<\/code> is <code>nil<\/code> (otherwise the execution of the loop would have continued on line 175), and secondly, that the execution of the loop has been interrupted, and therefore the variable <code>err<\/code> won\u2019t be assigned any other value. Hence, in cases where <code>allF<\/code> is <code>false<\/code>, the variable <code>err<\/code> can only ever be <code>nil<\/code>. Thus, the condition <code>err != nil<\/code> is always <code>false<\/code>.&nbsp;<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"1600\" height=\"252\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2024\/03\/Wi6SohvYgU1jtcAY9i9BXshCEdLExqxdR2MtTbI0mh3diJ9rHYhjyji5-MfWky0CUUdwo4ujmR_32dXL3BRELhtcFj0rvB6G3YkHGwuuyd5yMCDEXQ381-vEqiU_.png\" alt=\"\" class=\"wp-image-460515\"\/><\/figure>\n\n\n\n<p><strong>Example 2:<\/strong> Here is a simpler example in which DFA deduces that the condition <code>r0k != nil<\/code> on line 193 is always <code>true<\/code>. This happens because the implicit dereference <code>r0k.License<\/code> is present on line 191, after which the variable <code>r0k<\/code> cannot be <code>nil<\/code>. Although the derived constant condition inspection does not accurately show a real problem in the code, it reveals strange behavior in the program. In fact, issues could become apparent at runtime after the potential nil dereference of the variable <code>r0k<\/code> on line 191, as the author of the code implies that <code>r0k<\/code> can be <code>nil<\/code>.<\/p>\n\n\n\n<p>These examples show how the constant condition inspection can allow you to identify peculiar points or strange behavior in the program\u2019s code.<\/p>\n\n\n\n<p><strong>Potential <code>nil<\/code> dereference. <\/strong>DFA can detect a <code>nil<\/code> dereference for a variable even in code that seems absolutely normal to the naked eye. Let\u2019s see how this is possible:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"2432\" height=\"1324\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2024\/03\/vault-3.png\" alt=\"\" class=\"wp-image-460504\"\/><\/figure>\n\n\n\n<p><strong>Example 3:<\/strong> In this example, DFA infers that there can be a <code>nil<\/code> dereference of the variable conf on line 273. This may seem strange because the code takes into account cases where the variable <code>conf<\/code> is <code>nil<\/code>. In such a case, there should be a return from the function on line 269, and so dereference of the <code>conf<\/code> variable shouldn\u2019t be reachable. However, there are actually two different <code>conf<\/code> variables. The first one is declared on line 249 and the second on line 261. Thus, the last declaration shadows the previous one.<\/p>\n\n\n\n<p>This can lead to a nil dereference. Let&#8217;s assume that the first <code>conf<\/code> variable (line 249) is <code>nil<\/code> but the second conf variable (line 261) is not <code>nil<\/code>, and let\u2019s also assume that the corresponding <code>error<\/code> variable is <code>nil<\/code>. In this case, there is no <code>return<\/code> from the function on lines 262 and 269, which means we reach the <code>conf<\/code> dereference on line 273. This dereference corresponds to the first <code>conf<\/code> variable and causes issues at runtime. For this reason, we should be careful with nested short variable declarations!<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"3096\" height=\"946\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2024\/03\/k8s-2.png\" alt=\"\" class=\"wp-image-460493\"\/><\/figure>\n\n\n\n<p><strong>Example 4<\/strong>: DFA deduces a potential <code>nil<\/code> dereference of the variable <code>pod<\/code> on line 260. This function takes into account the nilablility of the variable <code>pod<\/code> (there is a condition <code>pod != nil<\/code> on line 254), so the author of the code implies that the variable <code>pod<\/code> can be <code>nil<\/code> within the <code>if<\/code> statement. But in this case, there will be a <code>nil<\/code> dereference on line 260.<\/p>\n\n\n\n<p><strong>Error may be not <code>nil<\/code>. <\/strong>This inspection reports cases in which variables might have <code>nil<\/code> or an unexpected value because of the associated error that is not checked for being <code>non-nil<\/code>.<\/p>\n\n\n\n<p>The analysis is currently intra-procedural and does not consider user-imposed contracts on the function. Therefore, in specific situations, there may be false positives. For cases such as these, you can use a quick-fix to ask the DFA not to analyze or report these errors.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\">func _() {\n&nbsp;&nbsp;&nbsp;&nbsp;file, err := os.Open(&quot;file.txt&quot;)\n&nbsp;&nbsp;&nbsp;&nbsp;\/\/ Error check is omitted here\n&nbsp;&nbsp;&nbsp;&nbsp;name := file.Name()\n&nbsp;&nbsp;&nbsp;&nbsp;print(name, err)\n}<\/pre>\n\n\n\n<p>In the example provided, the variable file could either have the value nil or an unexpected value if err is not nil.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Try DFA for yourself!<\/h2>\n\n\n\n<p>You can try out these improvements now in the GoLand <a href=\"https:\/\/www.jetbrains.com\/go\/nextversion\/\" target=\"_blank\" rel=\"noopener\">2024.1 Release Candidate<\/a> or wait for the <a href=\"https:\/\/www.jetbrains.com\/go\/download\/\" target=\"_blank\" rel=\"noopener\">2024.1 version<\/a>. It\u2019s also available for 2023.3 users in early access, but is disabled by default. To enable it, go to <em>Settings | Editor | Inspections | Go | Data Flow Analysis (experimental)<\/em>.&nbsp;<\/p>\n","protected":false},"author":1455,"featured_media":459902,"comment_status":"closed","ping_status":"closed","template":"","categories":[],"tags":[1817,5699],"cross-post-tag":[],"acf":[],"_links":{"self":[{"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/go\/459890"}],"collection":[{"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/go"}],"about":[{"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/types\/go"}],"author":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/users\/1455"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/comments?post=459890"}],"version-history":[{"count":10,"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/go\/459890\/revisions"}],"predecessor-version":[{"id":460542,"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/go\/459890\/revisions\/460542"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/media\/459902"}],"wp:attachment":[{"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/media?parent=459890"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/categories?post=459890"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/tags?post=459890"},{"taxonomy":"cross-post-tag","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/zh-hans\/wp-json\/wp\/v2\/cross-post-tag?post=459890"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}