Coding Native

Application development in Kotlin/Native

In this blog post we are discussing development of Kotlin/Native applications. Today we take a look on basic video player, using FFMPEG audio/video decoder and SDL2 for rendering. Hopefully, it will be useful guide for Kotlin/Native development enthusiasts and will explain intended mechanisms of using the platform.

As main focus in our tutorial is on Kotlin/Native, we will give only cursory view on how videoplayer shall be developed. Please see this excellent tutorial, called “How to Write a Video Player in Less Than 1000 Lines” for reference on how it could be done in C language. If you’re interested in comparing how coding in C differs to coding in Kotlin/Native, I would recommend starting with this tutorial.

Every video player in theory does rather simple job: reads input stream with interleaved video and audio frames, decodes frames and shows video frames, synchronising them with the audio stream. Typically, this job is being done by multiple threads, performing stream decoding, video and audio playback. Doing it right requires thread synchronisation and certain real-time guarantees, as if audio stream is not being decoded in time, playback sounds choppy, and if video frame is not available when it’s needed, movie doesn’t look smooth.

Kotlin/Native doesn’t encourage you to use threads, and doesn’t provide a way to share Kotlin objects between threads. However, we believe that concurrent, soft-realtime programming in Kotlin/Native shall be easy, so we decided to design our player in concurrent manner from the very beginning. Let’s see how we achieved that.

Kotlin/Native computational concurrency is built around  workers. Worker is higher level concurrency concept than thread, and instead of object sharing and synchronisation it allows object transfer, so that every moment only single workers have access to particular object. It means, no synchronisation shall be required to access object data, as access could never be concurrent. Workers can receive execution requests, which may accept objects and perform job as needed, and then transfer result back to whoever need computation’s result. Such model ensures that many typical concurrent programming mistakes (such as unsynchronised access to shared data, or deadlocks because of unordered taking of locks)  just cannot be made.

Let’s see, how it translates to the video player architecture. We need to perform decoding of some container format, like .avi, .mkv or .mpg, which perform demultiplexing of interleaved audio and video streams, decoding and then feed decompressed audio to SDL audio thread. Decompressed video frames shall be rendered in sync with sound playback. To achieve that goal, worker’s concept seems to be pretty natural. We spawn a worker for the decoder, and ask it for video and audio data whenever we need it. On multicore machines it means that decoding can happen in parallel with the playback. So decoder worker is a data producer from both UI thread and audio thread.

Whenever we need to fetch next audio or video data chunk, we rely upon almighty schedule() function. It schedules chunk of work to be executed by a particular worker, provides input argument and return Future instance, that could be waited on, until job is executed by the target worker. Future object could be consumed, so that produced object is taken from worker thread back to requester thread.

Kotlin/Native runtime is conceptually thread-bound, so when running multiple threads calling function konan.initRuntimeIfNeeded() is required before other operations, so we do that in audio thread callback. To simplify audio playback we always resample audio frames to 2 channel signed 16-bit integer stream with 44100 samples per second.

Video frames are being decoded to whatever size is requested by the user, with defaults of encoded video size, and bit depth depends on user’s desktop defaults.  Please note Kotlin/Native specific operations for manipulating C pointers, i.e.

private val resampledAudioFrame: AVFrame =
        disposable(create = ::av_frame_alloc, dispose = ::av_frame_unref).pointed
...
with (resampledAudioFrame) {
    channels = output.channels
    sample_rate = output.sampleRate
    format = output.sampleFormat
    channel_layout = output.channelLayout.signExtend()
}

We declare resampledAudioFrame being a disposable resource in the C world created with FFMPEG API call av_frame_alloc() and disposed with av_frame_unref(). Then we set fields of whatever it points to to the desired values. Note that we can use defines declared by FFMPEG (such as AV_PIX_FMT_RGB24) as Kotlin constants. But as they do not have type information and are Int by default, if certain field has different type (i.e. channel_layout) adapter function signExtend() must be called. It is compiler’s intrinsic, which inserts appropriate conversions.

After decoder is set up, we start the playback loop. It does nothing fancier but retrieves the next frame, renders it to the texture and shows this texture on the screen. As a result – video frame is rendered. Audio is being actively handled by audio thread callback which fetches next samples buffer from the decoder and feeds it back to audio engine.

Audio/video synchronisation is pretty basic, it just ensures that we don’t have not too many unplayed audio frames.  Real multimedia player probably shall rely on frame timestamps instead, which we compute, but never use.  Here interesting place is using

val ts = av_frame_get_best_effort_timestamp(audioFrame.ptr) * 
  av_q2d(audioCodecContext.time_base.readValue())

It manifests how to use APIs accepting C language struct’s. It is declared in libavutil/rational.h as

static inline double av_q2d(AVRational a){
    return a.num / (double) a.den;
}

Thus, to pass it by value, we first need to use readValue() on the field.

So, as a wrap up, we have implemented a simple audio/video player supporting multiple input formats, thanks to FFMPEG library, with relatively minor efforts. We also discussed some basics of  C language interoperability in Kotlin/Native, along with concurrency approaches we consider easier to use and maintain.

image description