{"id":196208,"date":"2021-10-28T17:24:02","date_gmt":"2021-10-28T16:24:02","guid":{"rendered":"https:\/\/blog.jetbrains.com\/?post_type=scala&#038;p=196208"},"modified":"2023-06-13T10:42:29","modified_gmt":"2023-06-13T09:42:29","slug":"dataflow-analysis-for-scala","status":"publish","type":"scala","link":"https:\/\/blog.jetbrains.com\/ko\/scala\/2021\/10\/28\/dataflow-analysis-for-scala","title":{"rendered":"Dataflow Analysis for Scala"},"content":{"rendered":"\n<p>One of the things IntelliJ IDEA is most famous for is its huge variety of helpful inspections and warnings. They make the lives of programmers significantly easier, frequently showing them errors in logic or style that they wouldn\u2019t have noticed otherwise.<\/p>\n\n\n\n<p>Those inspections come in various shapes and sizes, ranging from simple pattern searches to highly sophisticated inspections that require tons of complex static code analysis behind the scenes. Among the most advanced examples of the latter are those offered with the help of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Data-flow_analysis\" target=\"_blank\" rel=\"noopener\"><strong>dataflow analysis<\/strong><\/a>.<\/p>\n\n\n\n<p>These inspections have been available for Java for a long time and have proved invaluable in finding numerous obscure bugs. However, no such inspections have ever been implemented for Scala. This became the focus of my internship with the <a href=\"https:\/\/plugins.jetbrains.com\/plugin\/1347-scala\" target=\"_blank\" rel=\"noopener\">IntelliJ Scala<\/a> team this summer, and now we\u2019re happy to announce that we\u2019re bringing dataflow-based inspections to Scala developers.<\/p>\n\n\n\n<p>They come enriched with special support for many important Scala-specific structures, like case classes or sequences. You can already try them out in our regular <a href=\"https:\/\/confluence.jetbrains.com\/display\/SCA\/Scala+Plugin+Nightly\" target=\"_blank\" rel=\"noopener\">Nightly builds<\/a> or wait until the next major IntelliJ IDEA release at the end of November 2021.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Dataflow analysis in a nutshell<\/h2>\n\n\n\n<p>Dataflow analysis consists of two major phases: <strong>control flow building<\/strong> and <strong>proper dataflow analysis<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Control flow building<\/h3>\n\n\n\n<p>First, we transform the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Abstract_syntax_tree\" target=\"_blank\" rel=\"noopener\"><strong>abstract syntax tree<\/strong><\/a> of the program into something that will give us more information about its actual execution flow \u2013 the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Control_flow\" target=\"_blank\" rel=\"noopener\"><strong>control flow graph<\/strong><\/a>. In IntelliJ IDEA, we represent control flow using our own stack-based <a href=\"https:\/\/en.wikipedia.org\/wiki\/Intermediate_representation\" target=\"_blank\" rel=\"noopener\"><strong>intermediate representation<\/strong><\/a>, which is similar to Java bytecode. In short, we compile the user code into this bytecode-like representation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Proper dataflow analysis<\/h3>\n\n\n\n<p>Each method is then analyzed separately. We partially execute its body, using the body&#8217;s control flow representation and keeping track of useful semantic information, especially <strong>possible variable values<\/strong>. However, most of those are unknown and will only become known at runtime (e.g. values of method parameters). That is why often we can only obtain and process abstract information like \u201c<em>x<\/em> is even and greater than <em>10\u201d <\/em>or <em>\u201cy <\/em>is an immutable list with five elements<em>\u201d<\/em>. For this reason, this procedure is called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Abstract_interpretation\" target=\"_blank\" rel=\"noopener\"><strong>abstract interpretation<\/strong><\/a>.<\/p>\n\n\n\n<p>Once we collect all of that <strong>dataflow information<\/strong>, we can <strong>analyze it<\/strong> to identify <strong>suspicious places<\/strong> and <strong>possible bugs<\/strong> in the code, like a complex expression that always evaluates to the same value or an access to a collection with an index larger than its size.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How it works<\/h3>\n\n\n\n<p>To illustrate this more concretely, let\u2019s have a look at this very simple piece of code.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image12.png\" alt=\"\" class=\"wp-image-198114\" width=\"955\"\/><\/figure>\n\n\n\n<p>When we analyze the first method, we know everything about <em>x.<\/em> The condition <em>2 &gt; 3<\/em> is always false, so <em>x<\/em> will always be <em>7<\/em>. We can propagate this information and use it later when we see <em>x<\/em> again in the code \u2013 in this case in the returned expression.<\/p>\n\n\n\n<p>In the second method however, we don\u2019t know much about <em>y<\/em>. The values of <em>a<\/em> and <em>b<\/em> are unknown, so there are two possible paths: <em>y<\/em> can either be <em>5<\/em> or <em>7<\/em>, but we don\u2019t know which one.&nbsp; But this is already a lot of information \u2013 we can\u2019t say anything about <em>y == 7<\/em>, but we can confidently say that <em>y &gt; 4<\/em> is always true in any possible program execution scenario.<\/p>\n\n\n\n<p>This example is very simple, but the power of dataflow analysis lies in the fact that we can choose to track many different kinds of information about those variables. Those pieces of information can then be combined with each other, leading us to powerful insights about the code being analyzed. Let\u2019s briefly take a look at several examples of what Scala DFA is capable of in its current state.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><br>Current capabilities of Scala DFA<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Tracking relations<\/h3>\n\n\n\n<p>One cool thing about the IntelliJ DFA framework, on top of which we built our Scala DFA, is that it can <strong>track relations<\/strong> between variables, even if nothing else is known about them.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image7-1.png\" alt=\"\" class=\"wp-image-198126\" width=\"955\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large is-resized is-style-default\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image4-1.png\" alt=\"\" class=\"wp-image-198137\" width=\"955\" height=\"290\"\/><\/figure>\n\n\n\n<p>In the first example, dataflow analysis is able to figure out the relation between the otherwise unknown <em>a<\/em> and <em>c<\/em> by applying <strong>transitivity<\/strong> to the information it has collected from the outer condition. In the second one, it knows that inside the body of the <em>else<\/em> branch, <em>!(a != b), <\/em>which means<em> a == b<\/em>, and therefore <em>a &#8211; b == 0<\/em>.<\/p>\n\n\n\n<p>A major advantage of this DFA is that many of its features easily <strong>extend to all types<\/strong>, not just simple ones like <em>Int<\/em>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"955\" height=\"418\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image6-1.png\" alt=\"\" class=\"wp-image-198148\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Support for known methods<\/h3>\n\n\n\n<p>A number of <strong>Scala-specific methods, <\/strong>including some <strong>standard math functions<\/strong>, receive their own <strong>special support<\/strong> in Scala DFA.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image3-2.png\" alt=\"\" class=\"wp-image-198170\" width=\"955\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Support for Scala collections<\/h3>\n\n\n\n<p>We have also added <strong>special support for Scala collections<\/strong>. Dataflow analysis can, for example, tell you that two sequences cannot be equal because they\u2019re of different sizes, or that your collection access will produce an <em>IndexOutOfBoundsException<\/em> or <em>NoSuchElementException<\/em>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image5-1.png\" alt=\"\" class=\"wp-image-198181\" width=\"955\"\/><\/figure>\n\n\n\n<p>Several of the most common sequence methods are supported as well.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image1-1.png\" alt=\"\" class=\"wp-image-198192\" width=\"955\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Interprocedural analysis&nbsp;<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Java DFA<\/h3>\n\n\n\n<p>Dataflow analysis in IntelliJ IDEA analyzes the body of each method separately. As such, it inherently <strong>doesn\u2019t support interprocedural analysis<\/strong>. This means that code like this won\u2019t be fully analyzed in Java (the analysis won\u2019t go inside <em>absoluteDifference<\/em>).<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image10-1.png\" alt=\"\" class=\"wp-image-198203\" width=\"955\"\/><\/figure>\n\n\n\n<p>To be fair, the team responsible for Java DFA has been trying to overcome this limitation in many creative and efficient ways, like through static constant evaluation, getter inlining, contracts, etc., but in the vast majority of cases, it won\u2019t do any interprocedural analysis.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Scala DFA<\/h3>\n\n\n\n<p>In Scala, a fundamentally functional language, this is a much more significant limitation, and we decided we needed to overcome it. We\u2019ve laid a solid foundation for interprocedural analysis that, in the current state, supports two major features: <strong>interprocedural analysis for invocations of private and final methods<\/strong> (including methods in final classes and objects) and <strong>tracking class parameters<\/strong>. In effect, if we translate the above code to Scala, Scala DFA will go inside the external method and analyze it completely.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image2-1.png\" alt=\"\" class=\"wp-image-198214\" width=\"955\"\/><\/figure>\n\n\n\n<p>Another advantage of our analysis is that it supports all of the many kinds of Scala invocations and their syntactic sugars, like infix notation, <em>apply <\/em>methods, right-associativity, or named and default parameters. So it will still work if you do something like this.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image11-1.png\" alt=\"\" class=\"wp-image-198225\" width=\"955\"\/><\/figure>\n\n\n\n<p>As mentioned before, our DFA can also process and track class parameters.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image9-1.png\" alt=\"\" class=\"wp-image-198236\" width=\"955\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Architecture and Invocation Info<\/h2>\n\n\n\n<p>The overall scope of the Scala DFA project is large. A lot has already been done, but the number of potential additions or improvements is even greater. That is why, from the very beginning, one of our main focuses was to lay a strong foundation that would make it easy to add new things in the future. Along the way, we encountered several major challenges, and many of the design ideas we came up with to overcome them are very interesting in their own right. Probably the best example of this is <strong>Invocation Info<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The challenge<\/h3>\n\n\n\n<p>The problem we faced was that, in Scala, a huge variety of things are some kind of method invocation. In Java, a method invocation is always either <em>something.method(arg1, arg2)<\/em> or <em>Something.method(arg1, arg2)<\/em>. In Scala, both of these are present as well, but all of the following are also method invocations.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/blog.jetbrains.com\/wp-content\/uploads\/2021\/11\/image8-1.png\" alt=\"\" class=\"wp-image-198247\" width=\"955\"\/><\/figure>\n\n\n\n<p>This is nice for a programmer who uses the language, but it is a nightmare for any compiler programmer, especially if you add named, default, and implicit parameters, multiple parameter lists, varargs, etc. Imagine needing to extract precise information about an invocation while having to remember and depend on every possible variant of this syntax in every possible place where we\u2019re dealing with invocations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The solution<\/h3>\n\n\n\n<p>This is why we designed and implemented an abstraction called <strong>Invocation Info<\/strong>. We have a factory that takes any possible invocation as its input, in any syntactic variant, and produces an <strong>InvocationInfo <\/strong>instance.<\/p>\n\n\n\n<p>This class represents all possible Scala invocations in a standardized, convenient, and syntax-agnostic way. All sugared and desugared versions of equivalent invocations (like <em>3 + 8<\/em> and <em>3.+(8)<\/em>) generate the same InvocationInfo. There are some fine details here, for example <em>x :: list <\/em>is almost equivalent to <em>list.::(x) <\/em>(because<em> ::<\/em> is right-associative), but argument expressions in Scala are always evaluated from left to right and this creates a slight difference in the semantics of these two expressions.<\/p>\n\n\n\n<p>Invocation Info is designed to collect and offer all necessary information about the original invocation, including multiple argument lists, by-name arguments, evaluation order, <em>this<\/em> argument, etc.<\/p>\n\n\n\n<p>The details of Invocation Info\u2019s design and how it is used in the control flow builder and the analysis itself would make for another long and interesting article. But the key takeaway is that implementing all of that special support for various methods would have been impossible without it. We strongly believe that this tool and our control flow builder are so useful that they may find many different applications outside of dataflow analysis in the future development of the Scala plugin.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Learn more about DFA<\/h2>\n\n\n\n<p>I highly recommend watching the <a href=\"https:\/\/www.youtube.com\/watch?v=xOgGhF4OB3U\" target=\"_blank\" rel=\"noopener\">IntelliJ IDEA Conf talk by Tagir Valeev<\/a>, which summarizes how our dataflow analysis in IntelliJ IDEA works behind the scenes, what it can do, and how we manipulate it to extract as many useful insights and high-quality inspections as possible.<\/p>\n\n\n\n<p>If you\u2019re curious about what kinds of bugs dataflow analysis can help you find, read <a href=\"https:\/\/blog.jetbrains.com\/idea\/2018\/01\/fumigating-the-idea-ultimate-code-using-dataflow-analysis\/\">Tagir\u2019s JetBrains Blog article<\/a>, where he discusses lots of real-life examples.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Credits<\/h2>\n\n\n\n<p>Special thanks to <a href=\"https:\/\/github.com\/SrTobi\" target=\"_blank\" rel=\"noopener\">Tobias Kahlert<\/a> who supervised my project, in particular for a great theoretical and practical introduction to the DFA field and many really fruitful design discussions and ideas.<\/p>\n\n\n\n<p>Achieving such good results also wouldn\u2019t have been possible if I hadn\u2019t built on the existing, robust, and extensible IntelliJ DFA framework. Most of it has been created by <a href=\"https:\/\/github.com\/amaembo\" target=\"_blank\" rel=\"noopener\">Tagir Valeev<\/a>, with whom I\u2019ve also had several helpful discussions throughout my project. Tagir has also been implementing DFA for Kotlin, which was helpful for me as a reference in the early stages of my work.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Feedback<\/h2>\n\n\n\n<p><strong>We need your feedback!<\/strong> If you encounter any bugs in this feature, please report them to our <a href=\"https:\/\/youtrack.jetbrains.com\/issues\/SCL\" target=\"_blank\" rel=\"noopener\">YouTrack<\/a>. If you have any questions, feel free to ask them in our <a href=\"https:\/\/gitter.im\/JetBrains\/intellij-scala\" target=\"_blank\" rel=\"noopener\">Gitter<\/a>.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"author":1298,"featured_media":196678,"comment_status":"closed","ping_status":"closed","template":"","categories":[808,8058,6945],"tags":[6713,3369,6796,83,31,225,6799],"cross-post-tag":[],"acf":[],"_links":{"self":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/scala\/196208"}],"collection":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/scala"}],"about":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/types\/scala"}],"author":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/users\/1298"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/comments?post=196208"}],"version-history":[{"count":7,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/scala\/196208\/revisions"}],"predecessor-version":[{"id":198352,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/scala\/196208\/revisions\/198352"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/media\/196678"}],"wp:attachment":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/media?parent=196208"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/categories?post=196208"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/tags?post=196208"},{"taxonomy":"cross-post-tag","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/cross-post-tag?post=196208"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}