Kotlin/Native Memory Management Update

TL;DR: The original Kotlin/Native memory management approach was very easy to implement, but it created a host of problems for developers trying to share their Kotlin code between different platforms.

Back in 2020, we published our plan to rework the approach to memory management in Kotlin/Native. The original blog post describes the history that brought us to where we are now, listed the problems we were facing, and explained why we had to change our approach. 

Now it is time to give an update on our progress and share some details about memory management design. Moreover, we plan to present a development preview by the end of summer 2021.

Read on for details:

Basics of automatic memory management

The very concept of automatic memory management is as old as programming languages. Early programming languages did not have the concept of a heap at all. All the memory was either statically or stack allocated. Developers did not have to think about memory allocation and deallocation at run time, as memory was either always allocated (statically) or automatically managed (via the stack) and nothing had to be done “manually”. This made things easy and simple, but it limited software to working with data of fixed sizes, and limits on the number of processed items had to be hard-coded in the source code.

PL/I in 1966 and Algol-68 were the first two major languages that added the concept of dynamically allocated memory as we know it today, establishing the tradition of manual memory management. This left it up to developers to write explicit code that allocates and frees memory in the heap. 

However, LISP had appeared even earlier. In LISP you could work with dynamically-sized data structures without having to declare limits on their sizes upfront and without having to write code to allocate and free up the underlying memory. That was all done automatically by a garbage collector. The very concept of garbage collection as a form of automated management was invented by John McCarthy for LISP in 1959.

Since that time, countless garbage collection algorithms have been created and implemented in various systems. The goal of these algorithms is to automatically find and reclaim garbage – memory regions that the running program no longer needs. 

Algorithms differ with respect to how they identify garbage and generally fall into two broad categories: 

  • Reference-counting garbage collection algorithms start from the leafs of an object graph, keeping track of a number of references to a specific memory region. A live object, which is still used by a program, will have a non-zero reference count.
  • Tracing garbage collection algorithms start from the roots of an object graph, traversing the graph to identify all live objects. A dead object, which is not used by a program, will not be reached.

The science of garbage collection has a whole spectrum of algorithms, many of which are hybrids of the two approaches. If you want to learn more, a good place to start is The Garbage Collection Handbook: The Art of Automatic Memory Management by Richard Jones et al.

Reference-counting and tracing garbage collection

A common misconception is that garbage collection is entirely distinct from reference counting. In fact, reference counting is a just name for a wide family of garbage collection algorithms. Garbage collection in this context often refers more specifically to tracing garbage collection.

Another popular misconception is that reference-counting garbage collectors reclaim free memory as soon as it is no longer needed. While it is possible to implement a reference-counting garbage collector with such a property, this is rarely done because immediate reference-counting adds considerable overhead to reference-heavy code. 

Choosing a garbage collector algorithm involves finding the right balance between the garbage collector’s throughput (it’s overhead per number of allocated or reclaimed objects) and the promptness of memory reclamation (or how much more memory headroom it needs to operate). Practical reference-counting garbage collection implementations typically use deferred counting and other similar techniques to improve runtime performance while sacrificing memory usage in the process.

Kotlin/Native garbage collector

The original Kotlin/Native automatic memory manager uses a deferred reference-counting garbage collector. We chose it for its simplicity – not because of its memory consumption. But this original choice is now a roadblock to improving Kotlin/Native’s performance and developer experience.

There are also issues with collection pauses. Contrary to another popular myth, reference-counting garbage collectors are not inherently pause-less. Prompt memory allocation and reclamation has lower throughput and can cause significant allocation delays. Moreover, these collectors cannot be easily moved to background threads to avoid blocking the critical threads of an application. When trying to reduce overall reference-counting overhead by deferring and grouping deallocations, like Kotlin/Native currently does, pauses become rarer but longer.

In contrast to reference-counting, modern tracing garbage collection algorithms are far more flexible and tunable than reference-counting garbage collectors, and they are easier to adapt to meet the needs of multi-threaded applications. However, all tracing garbage collectors have one common weak point – they require fairly complex infrastructure from a programming language runtime and a compiler.

New garbage collector infrastructure

Currently, we are working on the new infrastructure. Our first task is to identify roots – all the locations in memory where a reference to a dynamically allocated memory can be stored. This will allow us to start tracing an object graph. Roots lurk in global variables, thread-locals, and the call stack, as well as on the boundary with platform libraries.

We also need a special infrastructure to implement concurrent garbage collection algorithms. Why bother? Because the main usage scenario for our team is running UI applications. UI applications have a latency-sensitive main thread, so a design that only supports stop-the-world garbage collection is a no-go for Kotlin/Native.

During program execution, the set of roots that a tracing garbage collector must know constantly changes. We’ve decided to use a so-called safe points approach, where the compiled code is colored as safe or unsafe according to whether all roots are stored in predictable locations. These locations are known to the runtime, meaning that garbage collection can run concurrently with the code that is in a safe state.

Another goal of the design involves implementing seamless interoperability with platform libraries, which requires adding support for keeping track of handles to automatically-managed memory that leaked to the non-managed world, support for weak references, and support for running additional deallocation code in cases when an automatically-managed Kotlin object has an attached platform-specific object.

To test the new infrastructure, we’re writing a simple stop-the-world mark and sweep garbage collector. This is not an algorithm for use in production, but it utilizes the entirety of the infrastructure that any tracing garbage collector implementation will rely on. By running a large number of tests with this implementation, we expect to find all the potential bugs in the underlying implementation (like root references that were somehow forgotten, or an unsafe region of code that was not marked as such). The first implementation will not support multi-threaded applications and will be used only for testing.

Next steps

The next step is to write a multithreading-capable garbage collection implementation, port the kotlinx.coroutines library to it, and test it. The goal is for you to be able to freely launch coroutines using a multithreaded background dispatch queue on Darwin without having to freeze objects that work in background threads. This will become the first public milestone that we plan to present as a development preview by the end of summer 2021. It will not be production-ready, as it will be focused on correctness, not performance.

Once we have a correct implementation of our tracing garbage collector, we will focus on performance and navigating the various tradeoffs between more deterministic deallocation, memory usage, and throughput. At this point we’ll be able to use many different garbage collector implementations, including hybrid ones, since our underlying infrastructure is specifically designed to support pluggable garbage collection algorithms, both reference counting and tracing.

We plan to implement a production-ready garbage collection implementation that supports the unimpeded sharing of objects between threads and meets all other design goals. However, in the future, we may have a number of supported garbage collection algorithms that are all optimized for different use cases. 

At first, we will continue to support the original Kotlin/Native memory management scheme to simplify migration of existing Kotlin/Native applications. You’ll be able to choose your garbage collection implementation when building your Kotlin/Native application. The source code of your application will not change, as the choice will be made via compiler flags.

Read more

Stay tuned and check for updates in this roadmap item and on our blog!

Discover more