The Dark Secrets of Fast Compilation for Kotlin

Posted on by Andrey Breslav

Compiling a lot of code fast is a hard problem, especially when the compiler has to perform complex analyses such as overload resolution and type inference with generics. In this post, I’ll tell you about a huge and largely invisible part of Kotlin which makes it compile much faster on relatively small changes that happen a lot in an everyday run-test-debug loop.

Also, we are looking for Senior Developers to join the team at JetBrains working on fast compilation for Kotlin, so if you are interested, look at the bottom of this post.

Let’s start with the obligatory XKCD comic #303

XKCD comic 303: Compiling

This post is about one very important aspect of every developer’s life: How long does it take to run a test (or just hit the first line of your program) after you make a change in the code. This is often referred to as time-to-test.

Why this is so important:

  • If time-to-test is too short, you are never forced to get some coffee (or a sword fight),
  • If time-to-test is too long, you start browsing the social media or get distracted in some other ways, and lose track of what that change you made was.

While both situations arguably have their pros and cons, I believe it’s best to take breaks consciously and not when your compiler tells you to. Compilers are smart pieces of software but healthy human work schedules is not what compilers are smart at.

Developers tend to be happier when they feel productive. Compilation pauses break the flow and make us feel stuck, stopped in our tracks, unproductive. Hardly anybody enjoys that.

Why does compilation take so long?

There are generally three big reasons for long compilation times:

  1. Codebase size: compiling 1 MLOC usually takes longer than 1 KLOC.
  2. How much your toolchain is optimized, this includes the compiler itself and any build tools you are using.
  3. How smart your compiler is: whether it figures many things out without bothering the user with ceremony or constantly requires hints and boilerplate code.

The first two factors are kind of obvious, let’s talk about the third one: the smartness of the compiler. It is usually a complicated tradeoff, and in Kotlin, we decided in favor of clean readable type-safe code. This means that the compiler has to be pretty smart, and here’s why.

Where Kotlin stands

Kotlin is designed to be used in an industrial setting where projects live long, grow big and involve a lot of people. So, we want static type safety to catch bugs early and get precise tooling (completion, refactorings and find usages in the IDE, precise code navigation and such). Then, we also want clean readable code without unneeded noise or ceremony. Among other things, this means that we don’t want types all over the code. And this is why we have smart type inference and overload resolution algorithms that support lambdas and extensions function types, we have smart casts (flow-based typing), and so on. The Kotlin compiler figures out a lot of stuff on its own to keep the code clean and type-safe at the same time.

Can one be smart and fast at the same time?

To make a smart compiler run fast you certainly need to optimize every bit of the toolchain, and this is something we are constantly working on. Among other things, we are working on a new-generation Kotlin compiler that will run much faster than the current one. But this post is not about that.

However fast a compiler is, it won’t be too fast on a big project. And it’s a huge waste to recompile the entire codebase on every little change you make while debugging. So, we are trying to reuse as much as we can from previous compilations and only compile what we absolutely have to.

There are two general approaches to reducing the amount of code to recompile:

  • Compile avoidance — only recompile affected modules,
  • Incremental compilation — only recompile affected files.

(One could think of an even finer grained approach that would track changes in individual functions or classes and thus recompile even less than a file, but I’m not aware of practical implementations of such an approach in industrial languages, and altogether it doesn’t seem necessary.)

Now let’s look into compile avoidance and incremental compilation in more details.

Compile avoidance

The core idea of compile avoidance is:

  • Find “dirty” (=changed) files
  • Recompile the modules these files belong to (use the results of the previous compilation of other modules as binary dependencies)
  • Determine which other modules may be affected by the changes
    • Recompile those as well, check their ABIs too
    • Repeat until all affected modules are recompiled

The algorithm is more or less straightforward if you know how to compare ABIs. Otherwise, we get to recompile those modules that got affected by the changes. Of course, a change in a module that nobody depends on will compile faster than a change in the ‘util’ module that everybody depends on (if it affects its ABI).

Tracking ABI changes

ABI stands for Application Binary Interface, and it’s kind of the same as API but for the binaries. Essentially, the ABI is the only part of the binary that dependent modules care about (this is because Kotlin has separate compilation, but we won’t go into this here).

Roughly speaking, a Kotlin binary (be it a JVM class file or a KLib) contains declarations and bodies. Other modules can reference declarations, but not all declarations. So, private classes and members, for example, are not part of the ABI. Can a body be part of an ABI? Yes, if this body is inlined at the call site. Kotlin has inline functions and compile-time constants (const val’s). If a body of an inline function or a value of a const val changes, dependent modules may need to be recompiled.

So, roughly speaking, the ABI of a Kotlin module consists of declarations, bodies of inline functions and values of const vals visible from other modules.

A straightforward way to detect a change in the ABI is

  • Store the ABI from the previous compilation in some form (you might want to store hashes for efficiency),
  • After compiling a module, compare the result with the stored ABI:
    • If it’s the same, we are done;
    • If it’s changed, recompile dependent modules.

Pros and cons of compile avoidance

The biggest advantage of compile avoidance is its relative simplicity.

This approach really helps when modules are small, because the unit of recompilation is an entire module. If your modules are big, recompilations will be long. So, basically, compile avoidance dictates that we have a lot of small modules, and as developers we may or may not want this. Small modules don’t necessarily sound like a bad design but I’d rather structure my code for people, not machines.

Another observation is that many projects have something like a ‘util’ module where many small useful functions reside. And virtually every other module depends on ‘util’ at least transitively. Now, let’s say I want to add another tiny useful function that is used three times across my codebase. It adds to the module ABI, so all dependent modules are affected, and I get into a long corridor sword fight because my entire project is being recompiled.

On top of that, having a lot of small modules (each of which depends on multiple others) means that the configuration of my project may become huge because for each module it includes its unique set of dependencies (source and binary). Configuring each module in Gradle normally takes some 50-100ms. It is not uncommon for large projects to have more than 1000 modules, so total configuration time may be well over a minute. And it has to run on every build and every time the project is imported into the IDE (for example, when a new dependency is added).

There are a number of features in Gradle that mitigate some of the downsides of compile avoidance: configurations can be cached, for example. Still, there’s quite a bit of room for improvement here, and this is why, in Kotlin, we use incremental compilation.

Incremental compilation

Incremental compilation is more granular than compile avoidance: it works on individual files rather than modules. As a consequence, it does not care about module sizes nor does it recompile the whole project when an ABI of a “popular” module is changed insignificantly. In general, this approach does not restrict the user as much and leads to shorter time-to-test. Also, developers’ swords get neglected and rusty and beg for being used at least once in a while.

Incremental compilation has been supported in JPS, IntelliJ’s built-in build system since forever. Gradle only supports compile avoidance out-of-the-box. As of 1.4, the Kotlin Gradle plugin brings a somewhat limited implementation of incremental compilation to Gradle, and there’s still a lot of room for improvement.

Ideally, we’d just look at the changed files, determine exactly which files depended on them, and recompile all these files. Sounds nice and easy but in reality it’s highly non-trivial to determine this set of dependent files precisely. For one thing, there can be circular dependencies between source files, something that’s not allowed for modules in most modern build systems. And dependencies of individual files are not declared explicitly. Note that imports are not enough to determine dependencies because of references to the same package and chain calls: for A.b.c() we need to import at most A, but changes in the type of B will affect us too.

Because of all these complications, incremental compilation tries to approximate the set of affected files by going in multiple rounds, here’s the outline of how it’s done:

  • Find “dirty” (=changed) files
  • Recompile them (use the results of the previous compilation as binary dependencies instead of compiling other source files)
  • Check if the ABI corresponding to these files has changed
    • If not, we are done!
    • If yes, find files affected by the changes, add them to the dirty files set, recompile
    • Repeat until the ABI stabilizes (this is called a “fixpoint”)

Since we already know how to compare ABIs, there are basically only two tricky bits here:

  1. using results of the previous compilation to compile an arbitrary subset of sources, and
  2. finding files affected by a given set of ABI changes.

Both are at least in part features of Kotlin’s incremental compiler. Let’s look at them one-by-one.

Compiling the dirty files

The compiler knows how to use a subset of the previous compilation results to skip compiling non-dirty files and just load the symbols defined in them to produce the binaries for the dirty files. This is not something a compiler would necessarily be able to do if not for incrementality: producing one big binary from a module instead of a small binary per source file is not so common outside of the JVM world. And it’s not a feature of the Kotlin language, it’s an implementation detail of the incremental compiler.

When we compare ABIs of the dirty files with the previous results, we may find out that we got lucky and no more rounds of recompilation are needed. Here are some examples of changes that only require recompilation of dirty files (because they don’t change the ABI):

  • Comments, string literals (except for const val’s) and such
    • Example: change something in the debug output
  • Changes confined to function bodies that are not inline and don’t affect return type inference
    • Example: add/remove debug output, or change the internal logic of a function
  • Changes confined to private declarations (they can be private to classes or files)
    • Example: introduce or rename a private function
  • Reordering of declarations

As you can see, these cases are quite common when debugging and iteratively improving code.

Widening the dirty file set

If we are not so lucky and some declaration has been changed, it means that some files that depend on the dirty ones could produce different results upon recompilation even though not a single line was changed in their code.

A straightforward strategy would be to give up at this point and recompile the whole module. This will bring all the issues with compile avoidance to the table: big modules become a problem as soon as you modify a declaration, and tons of small modules have performance costs too, as described above. So, we need to be more granular: find the affected files and recompile them.

So, we want to find the files that depend on the parts of the ABI that actually changed. For example, if the user renamed foo to bar, we only want to recompile the files that care about names foo and bar, and leave other files alone even if they refer to some other parts of this ABI. The incremental compiler remembers which files depend on which declaration from the previous compilation, and we can use this data sort of like a module dependency graph. This, again, is not something non-incremental compilers normally do.

Ideally, for every file we should store which files depend on it, and which parts of the ABI they care about. In practice, it’s too costly to store all dependencies so precisely. And in many cases, there’s no point in storing full signatures.

Consider this example:

File: dirty.kt

File: clean.kt

Let’s assume that the user renamed the function changeMe to foo. Note that, although clean.kt is not changed, the body of bar() will change upon recompilation: it will now be calling foo(Int) from dirty.kt, not foo(Any) from clean.kt, and its return type will change too. This means that we have to recompile both dirty.kt and clean.kt. How can the incremental compiler find this out?

We start by recompiling the changed file: dirty.kt. Then we see that something in the ABI has changed:

  • there’s no function changeMe any more,
  • there’s function foo that takes an Int and returns an Int.

Now we see that clean.kt depends on the name foo. This means that we have to recompile both clean.kt and dirty.kt once again. Why? Because types cannot be trusted.

Incremental compilation must produce the same result as would a full recompilation of all sources. Consider the return type of the newly appeared foo in dirty.kt. It’s inferred, and in fact it depends on the type of bar from clean.kt which is a circular dependency between files. So the return type could change when we add clean.kt to the mix. In this case we’ll get a compilation error, but until we recompile clean.kt together with dirty.kt, we don’t know about it.

The big rule of the state-of-the-art incremental compilation for Kotlin: all you can trust is names. And this is why for each file, we store

  • the ABI it produces, and
  • the names (not full declarations) that were looked up during its compilation.

Some optimizations are possible in how we store all this. For example, some names are never looked up outside the file, e.g. names of local variables and in some cases local functions. We could omit them from the index. To make the algorithm more precise, we record which files were consulted when each of the names was looked up. And to compress the index we use hashing. There’s some space for more improvements here.

As you have probably noticed, we have to recompile the initial set of dirty files multiple times. Alas, there’s no way around this: there can be circular dependencies, and only compiling all the affected files at once will yield a correct result. In the worst case this gets quadratic and incremental compilation may do more work than compile avoidance would, so there should be heuristics in place guarding from it.

Incremental compilation across module boundaries

The biggest challenge to date is incremental compilation that can cross module boundaries.

Say, we have dirty files in one module, we do some rounds and reach a fixpoint there. Now we have the new ABI of this module, and need to do something about the dependent modules.

Of course, we know what names were affected in our initial module’s ABI, and we know which files in the dependent modules looked these names up. Now, we could apply essentially the same incremental algorithm but starting from the ABI change and not from a set of dirty files. BTW, if there’re no circular dependencies between modules, it’s enough to recompile the dependent files alone. But then, if their ABI has changed, we’d need to add more files from the same module to the set and recompile the same files again.

It’s an open challenge to implement this fully in Gradle. It will probably require some changes to the Gradle architecture, but we know from past experience that such things are possible and welcomed by the Gradle team.

Things not covered in this blog post

My goal here was to give you a taste of the fascinating machinery of fast compilation for Kotlin. There’re many more things at play there that I deliberately left out, including but not limited to

  • Build Cache
  • Configuration Cache
  • Task Configuration Avoidance
  • Efficiently storing incremental compilation indexes and other caches on disk
  • Incremental compilation for mixed Kotlin+Java projects
  • Reusing javac data structures in memory to avoid reading Java dependencies twice
  • Incrementality in KAPT and KSP
  • File watchers for quickly finding dirty files

Summary

Now, you have a basic idea of the challenges that fast compilation in a modern programming language poses. Note that some languages deliberately chose to make their compilers not that smart to avoid having to do all this. For better or worse, Kotlin went another way, and it seems that the features that make the Kotlin compiler so smart are the ones the users love the most because they provide powerful abstractions, readability and concise code at the same time.

While we are working on a new-generation compiler front-end that will make compilation much faster by rethinking the implementation of the core type-checking and name resolution algorithms, we know that everything described in this blog post will never go away. One reason for this is the experience with the Java programming language that enjoys the incremental compilation capabilities of IntelliJ IDEA even having a much faster compiler than kotlinc is today. Another reason is that our goal is to get as close as possible to the development roundtrip of interpreted languages that enjoy instantaneous pick up of changes without any compilation at all. So, Kotlin’s strategy for fast compilation is: optimized compiler + optimized toolchain + sophisticated incrementality.

Join the team!

If you are interested in working on this kind of problems please consider the job opening we currently have on the Kotlin’s fast compilation team at JetBrains. Here’s the job listing in English, and one in Russian. Prior experience working on compilers or build tools in NOT required. We are hiring in all JetBrains’ offices (Saint Petersburg, Munich, Amsterdam, Boston, Prague, Moscow, Novosibirsk) or you can work remotely from anywhere in the world. Will be good to hear from you!