Ecosystem

How We Test Concurrent Primitives in Kotlin Coroutines

Today we would like to share how we test concurrency primitives in Kotlin Coroutines.

Many of our users are delighted with the experience of using coroutines to write asynchronous code. This does not come as a surprise, since with coroutines we are able to write straightforward and readable code, with almost all the asynchronicity happening under the hood. For simplicity, we can think of coroutines in Kotlin as super-lightweight threads with some additional power, such as cancellability and structured concurrency. However, coroutines also make the code much safer. Traditional concurrent programming involves manipulating a shared mutable state, which is arguably error-prone. As an alternative, coroutines provide special communicating primitives, such as Channel, to perform the synchronization. Using channels and a couple of other primitives as building blocks, we can construct extremely powerful things, such as the recently introduced Flows. However, nothing is free. While the message passing approach is typically safer, the required primitives are not trivial to implement and, therefore, have to be tested as thoroughly as possible. Simultaneously, testing concurrent code may be as complicated as writing it.

That is why we have Lincheck – a special framework for testing concurrent data structures on the JVM. This framework’s main advantage is that it provides a simple and declarative way to write concurrent tests. Instead of describing how to perform the test, you specify what to test by declaring all the operations to examine along with the required correctness property (linearizability is the default one, but various extensions are supported as well) and restrictions (e.g. “single-consumer” for queues).

Lincheck Overview

Let’s consider a simple concurrent data structure, such as the stack algorithm presented below. In addition to the standard push(value) and pop() operations, we also implement a non-linearizable (thus, incorrect) size(), which increases and decreases the corresponding field following successful push and pop invocations.

To write a concurrent test for this stack without any tool, you have to manually run parallel threads, invoke the stack operations, and finally check that some sequential history can explain the obtained results. We have used such manual tests in the past, and all of them contained at least a hundred lines of boilerplate code. But with Lincheck, this machinery is automated, and tests become short and informative.

To write a concurrent test with Lincheck, you need to list the data structure operations and mark them with a special @Operation annotation. The initial state is specified in the constructor (here, we create a new TriebeStack<Int> instance). After that, we need to configure the testing modes, which can be done using special annotations on the test class – @StressCTest for stress testing and @ModelCheckingCTest for model checking mode. Finally, we can run the analysis by invoking the Linchecker.check(..) function on the testing class.

Let’s write a test for our incorrect stack:

There are just 12 lines of easy-to-follow code that describe the testing data structure, and that is all you need to do with Lincheck! It automatically:

  1. Generates several random concurrent scenarios.
  2. Examines each of them using either the stress or the model checking strategy, performing the number of scenario invocations specified by the user.
  3. Verifies that each of the invocation results satisfies the required correctness property (e.g. linearizability).For this step, we use a sequential specification, which is defined with the testing data structure by default; though, a custom sequential specification can be set.

If an invocation hangs, fails with an exception, or the result is incorrect, the test fails with an error similar to the one below. Here we smoothly come to the main advantage of the model checking mode. While Lincheck always provides a failure scenario with the incorrect results (if any are found), the model checking mode also provides a trace to reproduce the error. In the example below, the failing execution starts with the first thread, pushes 7 into the stack, and stops before incrementing the size. The execution switches to the second thread, which retrieves the already pushed 7 from the stack and decreases the size to -1. The following size() invocation returns -1, which is incorrect since it is negative. While the error seems obvious even without a trace, having the trace helps considerably when you are working on real-world concurrent algorithms, such as synchronization primitives in Kotlin Coroutines.

Scenario Generation
In Lincheck, we provide the numbers of parallel threads and operations within them that can be configured in @<Mode>CTest annotations. Lincheck generates the specified number of scenarios by filling the threads with randomly chosen operations. Note that the push(..) operation on the stack above takes an integer parameter value, which specifies the element to be inserted. Simultaneously, the error message provides a specific scenario with an input value to the push(..) invocation. This parameter is also randomly generated in a specified range, which can also be configured. It is also possible to add restrictions like "single-consumer" so that the corresponding operations cannot be invoked concurrently.

Testing Modes
Lincheck runs the generated scenario in either stress testing or model checking mode. In stress testing mode, the scenario is executed on parallel threads many times to detect interleavings that yield incorrect results. In model checking mode, Lincheck examines many different interleavings with a bounded number of context switches. Compared to stress testing, model checking increases test coverage as well as the failing execution trace of the found incorrect behavior. However, it assumes the sequential consistency memory model, which means it ignores weak memory model effects. In particular, it cannot find a bug caused by a missed @Volatile. Therefore, in practice we use both stress testing and model checking mode.

Minimizing Failing Scenarios
When writing a test, it makes sense to configure it so that several threads are executed, each with several operations. However, most bugs can be reproduced with fewer threads and operations. Thus, Lincheck “minimizes” the failing scenario by trying to remove an operation from it and checking whether the test still fails. It continues to remove operations until it is no longer possible to do so without also causing the test not to fail. While this greedy approach is not theoretically optimal, we find it works well enough in practice.

Model Checking Details

Let’s dig deeper into model checking,one of Lincheck’s most impressive features, and see how everything works under the hood. If you want to try Lincheck before going further, just add the org.jetbrains.kotlinx:lincheck:2.12 dependency to your Gradle or Maven project!

Before we started working on model checking, we already had Lincheck with stress testing mode. While it provided a nice way to write tests and significantly improved their quality, we were still spending countless hours trying to understand how to reproduce a discovered bug. Spending all this time motivated us to find a way to automate the process. That is how we came to adopt the bounded model checking approach to our practical Lincheck. The idea was simple — once we implemented model checking, Lincheck would be able to provide an interleaving that re-produced the found bug automatically, meaning we would no longer need to spend hours trying to understand it. One of the main challenges was finding a way to make everything work without having to change the code.

In our experience, most bugs in complicated concurrent algorithms can be reproduced using the sequential consistency memory model. Simultaneously, the model-checking approaches for weak memory models are very complicated, so we decided to use a bounded model checking under the sequential consistency memory model. This was originally inspired by the one used in the CHESS framework for C#, which studies all possible schedules with a bounded number of context switches by fully controlling the execution and putting context switches in different locations in threads. In contrast to CHESS, Lincheck bounds the number of schedules to be studied instead of the number of context switches. This way, the test time is predictable independently of scenario size and algorithm complexity.

In short, Lincheck starts by studying all interleavings with one context switch but does this evenly, trying to explore a variety of interleavings simultaneously. This way, we increase the total coverage if the number of available invocations is not sufficient to cover all the interleavings. Once all the interleavings with one context switch have been reviewed, Lincheck starts examining interleavings with two context switches, and repeats the process, increasing the number of context switches each time, until the available invocations exceed the maximum or all possible interleavings are covered. This strategy increases the testing coverage and allows you to find an incorrect schedule with the lowest number of context switches possible, marking a significant improvement for bug investigation.

Switch Points
Lincheck inserts switch points into the testing code to control the execution. These points identify where a context switch can be performed. The interesting switch point locations are shared memory accesses, such as field and array element reads or updates in the JVM. These reads or updates can be performed by either the corresponding byte-code instructions or special methods, like the ones in AtomicFieldUpdater or Unsafe classes. To insert a switch point, we transform the testing code via the ASM framework and add our internal function invocations before each access. The transformation is performed on the fly using a custom class loader, which means that technically we’ve created a transformed copy of the testing code.

Interleaving Tree
To explore different possible schedules, Lincheck constructs an interleaving tree, the edges of which represent choices that can be performed by the scheduler. The figure below presents a partially built interleaving tree with one context switch for the scenario from the overview. First, Lincheck decides where the execution should be started. From there, several switch points are available to be examined. In the figure, only the first switch point related to top.READ is fully explored.

To explore interleavings evenly, Lincheck tracks the percentage of explored interleavings for each subtree, using weighted randomness to make its choices about where to go. The weights are proportional to the ratios of unobserved interleavings. In our example above, Lincheck is more likely to start the next interleaving from the second thread. Since the exact number of interleavings is unknown, all children have equal weights initially. After all the current tree’s interleavings have been explored, Lincheck increases the number of context switches, thus increasing the tree depth.

Execution Trace
The key advantage of model checking mode is that it provides an execution trace that reproduces the error. To increase its readability, Lincheck captures the arguments and the results of each of the shared variable accesses (such as reads, writes, and CAS-s) and the information about the fields. For example, in the trace in the listing from the overview, the first event is a reading of null from the top field, and the source code location is also provided. At the same time, if a function with multiple shared variable accesses is executed without a context switch in the middle, Lincheck shows this function invocation with the corresponding arguments and result instead of the events inside it. This also makes the trace easier to follow, as evidenced in the second thread in the trace.

To Sum Up

We hope you enjoyed reading this post and found Lincheck interesting and useful. In Kotlin, we successfully use Lincheck both to test existing concurrent data structures in Kotlin Coroutines and to develop new algorithms more productively and enjoyably. As a final note, we would like to say that Lincheck tests should be treated like regular unit tests — they can check your data structure for the specified correctness contract, nothing more and nothing else. So, even if you have Lincheck tests, do not hesitate to add integration tests when your data structure is used in the real world.

Want to try Lincheck? Just add the org.jetbrains.kotlinx:lincheck:2.12 dependency to your Gradle or Maven project and enjoy!

Lincheck on Github

image description