Kotlin

A concise multiplatform language developed by JetBrains

Visit the Kotlin Site

Ecosystem

Object Detection with KotlinDL and Ktor

Alexey Zinoviev

I presented the webinar “Object Detection and Image Recognition with Kotlin,” where I explored a deep learning library written in Kotlin, described how to detect objects of different types in images, and explained how to create a Kotlin Web Application using Ktor and KotlinDL that recognizes cars and persons on photos. I have decided there is more that I would like to share with you on the subject, and so here is an extended article.

If you are new to Deep Learning, don’t worry about it. You don’t need any high-level calculus knowledge to start using the Object Detection Light API in the KotlinDL library.

However, when writing this article, I did assume you would be familiar with basic Kotlin web-development fundamentals, e.g., HTML, web-server, HTTP, and client-server communications.

This article will take you through how to detect objects in different images and create a Kotlin Web Application using Ktor and KotlinDL.

What is Object Detection?

It’s a pretty simple term from the Deep Learning world and just means the task of detecting instances of objects of a certain class within an image.

You are probably already familiar with Image Recognition, where the idea is to recognize the class or type of only one object within an image without having any coordinates for the recognized object.

Unlike the Image Recognition, during Object Detection, we are trying to detect a few objects (sometimes it could be a significant number, 100 or even 1,000, for example) and their locations, which are usually presented as four coordinates of a rectangle (x_min, x_max, y_min, y_max) containing the detected object.

For example, this screenshot of the example application shows how a few objects have been recognized, and their positions annotated:

OK – now for the fun stuff! It’s time to write some Kotlin code to detect objects within an image.

Object Detection Example

Let’s say we have the following image. We see a typical street: several cars, pedestrians crossing, traffic lights, and even someone using the pedestrian crossing on a bicycle.

With a few rows of code, we can obtain a list of the detected objects, sorted by score or probability (the degree of confidence of the model that a certain rectangle contains an object of a certain type).

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))

val model = modelHub.loadPretrainedModel(ONNXModels.ObjectDetection.SSD)

model.use { detectionModel ->
   println(detectionModel)

   val imageFile = getFileFromResource("detection/image2.jpg")
   val detectedObjects = detectionModel.detectObjects(imageFile = imageFile, topK = 20)

   detectedObjects.forEach {
       println("Found ${it.classLabel} with probability ${it.probability}")
   }
}

This code prints the following:

Found car with probability 0.9872914
Found bicycle with probability 0.9547764
Found car with probability 0.93248314
Found person with probability 0.85994
Found person with probability 0.8397419
Found car with probability 0.7488473
Found person with probability 0.49446288
Found person with probability 0.48537987
Found person with probability 0.40268868
Found person with probability 0.3972058
Found person with probability 0.38047826
Found traffic light with probability 0.36501375
Found car with probability 0.30308443
Found traffic light with probability 0.30084336
Found person with probability 0.27078137
Found car with probability 0.26892117
Found person with probability 0.26232794
Found person with probability 0.23597576
Found person with probability 0.23156123
Found person with probability 0.21393918

OK, it looks like the model can detect objects, just like our eyes can do, but how do we go about marking the objects?

We can use the Swing framework to draw rectangles over the image. This also requires simple image preprocessing before visualization.

First, we need to add a simple visualization using JPanel, BufferedImage, and Graphics2D objects in the visualise function.

model.use { detectionModel ->
  …

   visualise(imageFile, detectedObjects)
}

Drawing rectangles on an image with the Graphics2D API may not be the best approach, but we can use it as a good starting point for our research.

private fun visualise(
   imageFile: File,
   detectedObjects: List<DetectedObject>
) {
   val frame = JFrame("Detected Objects")
   @Suppress("UNCHECKED_CAST")
   frame.contentPane.add(JPanel(imageFile, detectedObjects))
   frame.pack()
   frame.setLocationRelativeTo(null)
   frame.isVisible = true
   frame.defaultCloseOperation = JFrame.EXIT_ON_CLOSE
   frame.isResizable = false
}

class JPanel(
   val image: File,
   private val detectedObjects: List<DetectedObject>
) : JPanel() {
   private var bufferedImage = ImageIO.read(image)

   override fun paint(graphics: Graphics) {
       super.paint(graphics)
       graphics.drawImage(bufferedImage, 0, 0, null)

       detectedObjects.forEach {
           val top = it.yMin * bufferedImage.height
           val left = it.xMin * bufferedImage.width
           val bottom = it.yMax * bufferedImage.height
           val right = it.xMax * bufferedImage.width
           if (abs(top - bottom) > 300 || abs(right - left) > 300) return@forEach

           graphics.color = Color.ORANGE
           graphics.font = Font("Courier New", 1, 17)
           graphics.drawString(" ${it.classLabel} : ${it.probability}", left.toInt(), bottom.toInt() - 8)

           graphics as Graphics2D
           val stroke1: Stroke = BasicStroke(6f)
           graphics.setColor(Color.RED)
           graphics.stroke = stroke1
           graphics.drawRect(left.toInt(), bottom.toInt(), (right - left).toInt(), (top - bottom).toInt())
       }
   }

   override fun getPreferredSize(): Dimension {
       return Dimension(bufferedImage.width, bufferedImage.height)
   }

   override fun getMinimumSize(): Dimension {
       return Dimension(bufferedImage.width, bufferedImage.height)
   }
}

The result is the following image:

As you can see, the Object Detection Light API returns not only the class label and score but the relative image coordinates, which can be used for drawing rectangles or boxes around the detected objects.

Also, we could play a little bit with the paint palette and use different colors to differentiate people, bicycles, cars, and traffic lights.

when(it.classLabel) {
   "person" -> graphics.setColor(Color.WHITE)
   "car" -> graphics.setColor(Color.GREEN)
   "traffic light" -> graphics.setColor(Color.YELLOW)
   "bicycle" -> graphics.setColor(Color.MAGENTA)
   else -> graphics.setColor(Color.RED)
}

That looks significantly better!

You can continue experimenting with the visualization, but we need to move on!

Client-Server Application with Ktor

In this section, I will use Ktor to write two simple programs: client and server. The client application will send the image to the server application. If you have never used Ktor before, it’s an excellent time to see how easy it is to deal with classic web stuff like HTTP requests, headers, MIME types, and so on.

When the code below is run, the client application sends a POST request via the submitFormWithBinaryData method. You can read more about how this works in Ktor documentation. The result with the added boxes for the detected objects can be found in the clientFiles folder.

runBlocking {
   val client = HttpClient(CIO)

   val response: HttpResponse = client.submitFormWithBinaryData(
       url = "http://localhost:8001/detect",
       formData = formData {
           append("image", getFileFromResource("detection/image2.jpg").readBytes(), Headers.build {
               append(HttpHeaders.ContentType, "image/jpg")
               append(HttpHeaders.ContentDisposition, "filename=image2.jpg")
           })
       }
   )

   val imageFile = File("clientFiles/detectedObjects2.jpg")
   imageFile.writeBytes(response.readBytes())
}

Unfortunately, Ktor has no special API for receiving files from the server-side. But we’re programmers, right? Let’s just write the bytes obtained over the network to the File object.

The server part is a little more difficult. I’ll need to explain some parts of the code below.

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.ObjectDetection.SSD.pretrainedModel(modelHub)

Because model creation is a time-consuming step (due to loading and initializing), we need to create the model before we can run the server.

embeddedServer(Netty, 8001) {
   routing {
       post("/detect") {
           val multipartData = call.receiveMultipart()
           var newFileName = ""
           multipartData.forEachPart { part ->
               when (part) {
                   is PartData.FileItem -> {
                       val fileName = part.originalFileName as String
                       newFileName = fileName.replace("image", "detectedObjects")
                       val fileBytes = part.streamProvider().readBytes()
                       val imageFile = File("serverFiles/$fileName")
                       imageFile.writeBytes(fileBytes)

                       val detectedObjects =
                           model.detectObjects(imageFile = imageFile, topK = 20)

                       val filteredObjects =
                           detectedObjects.filter { it.classLabel == "car" || it.classLabel == "person" || it.classLabel == "bicycle" }

                       drawRectanglesForDetectedObjects(newFileName, imageFile, filteredObjects)

The intermediate result will be saved to the serverFiles folder. After that, the server application will send this file back to the client.

To send form data in a test POST/PUT request, you must set the Content-Type header and specify the request body. To do this, you can use the addHeader and setBody functions, respectively.

     
                  
                       call.response.header(
                           HttpHeaders.ContentDisposition,
                           ContentDisposition.Attachment.withParameter(ContentDisposition.Parameters.FileName, newFileName)
                               .toString()
                       )
                       call.respondFile(file)
                   }
               }
           }
       }
   }
}.start(wait = true)

At the end, we need to close our model to release all the resources.

   
model.close()

Run the server, and after that, try to make multiple runs of the client with the different images. Check clientFiles and serverFiles folders to find all the images that were sent with detected objects.

The complete example, including drawing and saving files to the serverFiles folder, can be found here in the GitHub repository.

Web Application

It’s time to write the whole Web Application with an HTML page rendered on the server, a few inputs, and a button. I’d like to upload an image, fill some input fields with the parameters, and press a button to download the image with the detected objects on my laptop.

The application will contain only the server part, but it has a few interesting aspects we will need to consider. It should handle two HTTP requests: the POST request, which handles multipart data with FileItem and FormItem handlers, and the GET request, which returns a simple HTML page.

From multipartData we can not only extract binary data like in the previous example but the values of the form parameters, too. These parameters, topK, and classLabelNames, will be explained later.

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.ObjectDetection.SSD.pretrainedModel(modelHub)

embeddedServer(Netty, 8002) {
   routing {
       post("/detect") {
           val multipartData = call.receiveMultipart()
           var imageFile: File? = null
           var newFileName = ""
           var topK = 20
           val classLabels = mutableListOf<String>()
           multipartData.forEachPart { part ->
               when (part) {
                   is PartData.FileItem -> {
                       val fileName = part.originalFileName as String
                       val fileBytes = part.streamProvider().readBytes()

                       newFileName = fileName.replace("image", "detectedObjects")
                       imageFile = File("serverFiles/$fileName")
                       imageFile!!.writeBytes(fileBytes)
                   }
                   is PartData.FormItem -> {
                       when (part.name) {
                           "topK" -> topK = if (part.value.isNotBlank()) part.value.toInt() else 20
                           "classLabelNames" -> part.value.split(",").forEach {
                               classLabels += it.trim()
                           }
                       }
                   }
                   is PartData.BinaryItem -> TODO()
               }
           }

           val detectedObjects =
               model.detectObjects(imageFile = imageFile!!, topK = topK)

           val filteredObjects = detectedObjects.filter {
                   if (classLabels.isNotEmpty()) {
                       it.classLabel in classLabels
                   } else {
                       it.classLabel == "car" || it.classLabel == "person" || it.classLabel == "bicycle"
                   }
               }

           drawRectanglesForDetectedObjects(newFileName, imageFile!!, filteredObjects)

           call.response.header(
               HttpHeaders.ContentDisposition,
               ContentDisposition.Attachment.withParameter(ContentDisposition.Parameters.FileName, newFileName)
                   .toString()
           )
           call.respondFile(File("serverFiles/$newFileName"))
       }

To describe the HTML page with this nice DSL, Ktor uses kotlinx.html as written in the documentation. This integration allows you to respond to a client with HTML blocks. With HTML DSL, you can write pure HTML in Kotlin, interpolate variables into views, and build complex HTML layouts using templates.

get("/") {
           call.respondHtml {
               body {
                   form(action = "/detect", encType = FormEncType.multipartFormData, method = FormMethod.post) {
                       p {
                           +"Your image: "
                           fileInput(name = "image")
                       }
                       p {
                           +"TopK: "
                           numberInput(name = "topK")
                       }
                       p {
                           +"Classes to detect: "
                           textInput(name = "classLabelNames")
                       }
                       p {
                           submitInput() { value = "Detect objects" }
                       }
                   }
               }
           }
       }
   }
}.start(wait = true)

model.close()

Run the server and open the page http://localhost:8002. Here you’ll find a form. Simply upload the image, fill inputs with the request parameters (or leave them empty), and press the button “Detect objects.” The new image will start downloading in a few seconds.

You also could play with the parameters topK, and classLabelNames to obtain different results. The topK parameter is used to determine how many detected objects (sorted by a score from highest to lowest) will be drawn on the image. The classLabelNames parameter takes as an input a list of labels (from the following list) separated by commas to filter categories of detected objects in the picture that will be enclosed in a rectangle.

The complete example can be found here in the GitHub repository.

This represents only a small fraction of what you can do with the full power of Ktor. For example, you can also build a REST API for Object Detection, Image Recognition, or build a helpful microservice. It is your choice!

In conclusion

Release 0.3 was shipped with only one effective object detection model: SSD. The new release, 0.4, brings seven new object detection models with different characteristics of velocity and accuracy as well as the ability to detect complex objects.

We strongly recommend using Compose for Desktop, instead of Swing, for your visualization needs. The community is working on moving these examples to the new framework.

This is not the only improvement you can expect in the Object Detection Light API. In future releases, we will add some helpful methods for filtering and unioning different boxes in the YOLO style to avoid having places in the image where a single object has multiple rectangles drawn on it.

If you have any thoughts or user experience related to this use case, just make an issue on GitHub or ask in the Kotlin Slack (kotlindl channel).

The new AWS SDK for Kotlin with Coroutines support Join the Kotlin Basics Track Tour With a JetBrains Academy Expert

Discover more

We’re excited to announce a deepened collaboration between JetBrains and the Spring team as part of our continued efforts to make Kotlin a top choice for professional server-side work.

This blog post explores the current state and future plans for Kotlin scripting.

Analyze your GitHub repo's star history with Kotlin. Discover trends, visualize growth, and optimize your project's impact using Kotlin DataFrame and Kandy.

Enhanced Column Selection DSL in Kotlin DataFrame

Explore new functions and improved syntax for selecting values from structured data in Kotlin DataFrame.

Kotlin

Object Detection with KotlinDL and Ktor

What is Object Detection?

Object Detection Example

Client-Server Application with Ktor

Web Application

In conclusion

Discover more

Strengthening Kotlin for Backend Development: A Strategic Partnership With Spring

State of Kotlin Scripting 2024

Track and Analyze GitHub Star Growth With Kandy and Kotlin DataFrame

Enhanced Column Selection DSL in Kotlin DataFrame

Kotlin

Object Detection with KotlinDL and Ktor

What is Object Detection?

Object Detection Example

Client-Server Application with Ktor

Web Application

In conclusion

Subscribe to Kotlin Blog updates

Discover more

Strengthening Kotlin for Backend Development: A Strategic Partnership With Spring

State of Kotlin Scripting 2024

Track and Analyze GitHub Star Growth With Kandy and Kotlin DataFrame

Enhanced Column Selection DSL in Kotlin DataFrame