{"id":5695,"date":"2019-12-20T21:04:05","date_gmt":"2019-12-20T20:04:05","guid":{"rendered":"https:\/\/blog.jetbrains.com\/kotlin\/?p=7665"},"modified":"2023-08-08T16:08:37","modified_gmt":"2023-08-08T15:08:37","slug":"making-kotlin-ready-for-data-science","status":"publish","type":"kotlin","link":"https:\/\/blog.jetbrains.com\/ko\/kotlin\/2019\/12\/making-kotlin-ready-for-data-science","title":{"rendered":"Making Kotlin Ready for Data Science"},"content":{"rendered":"<p>This year at KotlinConf 2019, Roman Belov gave an overview on Kotlin&#8217;s approach to data science. Now that the talk is <a href=\"https:\/\/www.youtube.com\/watch?v=APnyDVye4JA\" rel=\"noopener noreferrer\" target=\"_blank\">available<\/a> for everyone to see, we decided to recap it and share a bit more on the current state of Kotlin tools and libraries for data science.<\/p>\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=APnyDVye4JA\" target=\"_blank\" rel=\"noopener\">https:\/\/www.youtube.com\/watch?v=APnyDVye4JA<\/a><\/p>\n<p>How does Kotlin fit data science? Following the need to analyze large amounts of data, the last few years has brought a true renaissance to the data science discipline. All this renaissance of data science couldn&#8217;t be possible without proper tools. Before, you needed a programming language designed specifically for data science, but today you can already do it with general-purpose languages. Of course this requires general-purpose languages to make the right design decisions, not to mention getting the community to help in. All this made certain languages, such as Python, more popular for data science than others.<\/p>\n<p>With the concept of Kotlin Multiplatform, Kotlin aims to replicate its developer experience and extend its interoperability to other platforms as well. The major qualities of Kotlin by design include conciseness, safety, and interoperability. These fundamental language traits make it a great tool for a wide variety of tasks and platforms. Data science is certainly one of these tasks.<\/p>\n<p>The great news is that the community has already begun adopting Kotlin for data science, and this adoption is happening at a fast pace. The brief report below outlines how ready Kotlin is for data science, including the Kotlin libraries and Kotlin tools for data science.<\/p>\n<p><!--more--><\/p>\n<h3>Jupyter<\/h3>\n<p>First and foremost, thanks to their interactivity, <a href=\"https:\/\/jupyter.org\/\" rel=\"noopener noreferrer\" target=\"_blank\">Jupyter notebooks<\/a> are very convenient for transforming, visualizing, and presenting data. With the extensibility and the open-source nature of Jupyter, it has turned into a large ecosystem around data science and was integrated into tons of other solutions related to data. Among them is the <a href=\"https:\/\/github.com\/Kotlin\/kotlin-jupyter\" rel=\"noopener noreferrer\" target=\"_blank\">Kotlin kernel<\/a> for Jupyter notebooks. With this kernel, you can write and run Kotlin code in Jupyter notebooks and use third-party data science frameworks written in Java and Kotlin.<\/p>\n<p>An example of a reproducible Kotlin Jupyter notebook can be found in <a href=\"https:\/\/github.com\/cheptsov\/kotlin-jupyter-demo\/blob\/master\/index.ipynb\" rel=\"noopener noreferrer\" target=\"_blank\">this repo<\/a>. To quickly play with a Kotlin notebook, you can launch it on <a href=\"https:\/\/mybinder.org\/v2\/gh\/cheptsov\/kotlin-jupyter-demo\/master?filepath=index.ipynb\" rel=\"noopener noreferrer\" target=\"_blank\">Binder<\/a> (please note the environment will normally take a minute to set up).<\/p>\n<h3>Apache Zeppelin<\/h3>\n<p>Due to the strong support for Spark and Scala, <a href=\"http:\/\/zeppelin.apache.org\/\" rel=\"noopener noreferrer\" target=\"_blank\">Apache Zeppelin<\/a> is very popular among data engineers. Similar to Jupyter, Zeppelin has a plugin API (called Interpreters) to extend its core with support for other tools and languages. Currently, the latest release of Zeppelin (0.8.2) doesn\u2019t come with a bundled Kotlin interpreter. But anyway, it is available in the master branch of Zeppelin. To learn how to deploy Zeppelin with Kotlin support in a Spark cluster, see <a href=\"https:\/\/kotlinlang.org\/docs\/tutorials\/zeppelin-spark-cluster.html\" rel=\"noopener noreferrer\" target=\"_blank\">these instructions<\/a>.<\/p>\n<h3>Apache Spark<\/h3>\n<p>Since Spark has a robust Java API, you can already use Kotlin to work with the Spark Java API from both Jupyter and Zeppelin without any problems. However we\u2019re working on improving this integration by adding full support for Kotlin classes with Spark\u2019s Dataset API. Support for Kotlin with Spark&#8217;s shell is also in progress.<\/p>\n<h3>Libraries<\/h3>\n<p>Using Kotlin for data science alone, without libraries, makes little sense. Luckily, thanks to the recent efforts of the community, there\u2019s already a number of nice Kotlin libraries that you can use right away.<\/p>\n<p>Here are some of the most useful libraries:<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/thomasnield\/kotlin-statistics\" rel=\"noopener noreferrer\" target=\"_blank\">kotlin-statistics<\/a> is a library that provides a set of extension functions to perform exploratory and production statistics. It supports basic numeric list\/sequence\/array functions (from sum to skewness), slicing operators (e.g. countBy, simpleRegressionBy, etc), binning operations, discrete PDF sampling, naive bayes classifier, clustering, linear regression, and more.<\/li>\n<li><a href=\"https:\/\/github.com\/mipt-npm\/kmath\" rel=\"noopener noreferrer\" target=\"_blank\">kmath<\/a> is a library inspired by numpy; this library supports algebraic structures and operations, array-like structures, math expressions, histograms, streaming operations, wrappers around <a href=\"http:\/\/commons.apache.org\/proper\/commons-math\/\" rel=\"noopener noreferrer\" target=\"_blank\">commons-math<\/a> and <a href=\"https:\/\/github.com\/kyonifer\/koma\" rel=\"noopener noreferrer\" target=\"_blank\">koma<\/a>, and more.<\/li>\n<li><a href=\"https:\/\/github.com\/holgerbrandl\/krangl\" rel=\"noopener noreferrer\" target=\"_blank\">krangl<\/a> is a library inspired by R&#8217;s dplyr and Python&#8217;s pandas; this library provides functionality for data manipulation using a functional-style API; it allows you to filter, transform, aggregate, and reshape tabular data.<\/li>\n<li><a href=\"https:\/\/github.com\/JetBrains\/lets-plot\" rel=\"noopener noreferrer\" target=\"_blank\">lets-plot<\/a> is a library for declaratively creating plots based on tabular data. This library is inspired by R&#8217;s ggplot and <a href=\"https:\/\/www.amazon.com\/Grammar-Graphics-Statistics-Computing\/dp\/0387245448\/\" rel=\"noopener noreferrer\" target=\"_blank\">The Grammar of Graphics<\/a>, and is integrated tightly with the Kotlin kernel. It is multi-platform and can be used not just with JVM, but also from JS and Python.<\/li>\n<li><a href=\"https:\/\/github.com\/holgerbrandl\/kravis\" rel=\"noopener noreferrer\" target=\"_blank\">kravis<\/a> is another library inspired by R&#8217;s ggplot for visualizing tabular data.<\/li>\n<\/ul>\n<p>For a more complete list of useful links, please refer to <a href=\"https:\/\/github.com\/thomasnield\/kotlin-data-science-resources\" rel=\"noopener noreferrer\" target=\"_blank\">Kotlin data science resources<\/a> by Thomas Nield.<\/p>\n<p><strong>Lets-Plot for Kotlin<\/strong><\/p>\n<p><a href=\"https:\/\/github.com\/JetBrains\/lets-plot\" rel=\"noopener noreferrer\" target=\"_blank\">Lets-Plot<\/a> is an open-source plotting library for statistical data written entirely in Kotlin. Being a multiplatform library, it has an API designed specifically for Kotlin. You can familiarize yourself with how to use this API by reading its <a href=\"https:\/\/github.com\/JetBrains\/lets-plot-kotlin\/blob\/master\/docs\/guide\/user_guide.ipynb\" rel=\"noopener noreferrer\" target=\"_blank\">user guide<\/a>. <\/p>\n<p>For interactivity, Lets-Plot is tightly integrated with the Kotlin kernel for Jupyter notebooks. Once you have the Kotlin kernel installed and enabled, add the following line to a Jupyter notebook:<\/p>\n<p><code>%use lets-plot<\/code><\/p>\n<p>Then you will be able to call Lets-Plot API functions from your cells, and see the results immediately beneath the cells as you would normally have by using <a href=\"https:\/\/ggplot2.tidyverse.org\/reference\/ggplot.html\" rel=\"noopener noreferrer\" target=\"_blank\">ggplot<\/a> with R or Python:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/blog.jetbrains.com\/kotlin\/files\/2019\/12\/lets-plot.png\" alt=\"\" width=\"2797\" height=\"1513\" class=\"alignnone size-full wp-image-7670\" \/><\/p>\n<p><strong>Kotlin bindings for NumPy<\/strong><\/p>\n<p><a href=\"https:\/\/numpy.org\/\" rel=\"noopener noreferrer\" target=\"_blank\">NumPy<\/a> is a popular package for scientific computing with Python. It provides powerful capabilities for multi-dimensional array processing, linear algebra, Fourier transform, random numbers, and other mathematical tasks. <a href=\"https:\/\/github.com\/kotlin\/kotlin-numpy\/\" rel=\"noopener noreferrer\" target=\"_blank\">Kotlin Bindings for NumPy<\/a> is a Kotlin library that enables calling NumPy functions from Kotlin code by providing statically typed wrappers for NumPy functions.<\/p>\n<h3>Contribution<\/h3>\n<p>The entire Kotlin ecosystem is based on the idea of open source and would not be possible without the help of many contributors. Kotlin for data science is only emerging and needs your help now as ever! Here\u2019s how you can pitch in:<\/p>\n<ul>\n<li>Talk about your pain points and share your ideas on how to make Kotlin even better-suited for data-science tasks \u2013 your tasks. <\/li>\n<li>Contribute to the open source data-science-related libraries, and create your own libraries and tools \u2013 anything that you think can help Kotlin become a language of choice for data science.<\/li>\n<\/ul>\n<p>The <a href=\"https:\/\/kotlinlang.org\/community\/\" rel=\"noopener noreferrer\" target=\"_blank\">Kotlin community<\/a> has a dedicated channel called #datascience in its <a href=\"https:\/\/kotlinlang.slack.com\/\" rel=\"noopener noreferrer\" target=\"_blank\">Slack<\/a>. We invite you to join this channel to ask questions, find out in what areas help is needed and how you can contribute, and of course share your feedback and your work with the community.<\/p>\n<p>Keep in mind that Kotlin is still in the very early stages of becoming the tool of choice for data scientists. It\u2019s going to be an exciting and challenging journey! It will require building a rich ecosystem of tools and libraries, as well as adjusting the language design to meet the needs of data-related tasks. If you see things not working as you would expect, please share your experience \u2013 or get involved and help fix them. Give them a try, especially the Jupyter kernel and libraries, and share your feedback with us.<\/p>\n<h3>Resources<\/h3>\n<p>Most of the information in this post, and much more, can be found on the official <a href=\"https:\/\/kotlinlang.org\/docs\/reference\/data-science-overview.html\" rel=\"noopener noreferrer\" target=\"_blank\">Kotlin website<\/a>.<\/p>\n<p>KotlinConf 2019 had more inspiring talks about data science, including a <a href=\"https:\/\/www.youtube.com\/watch?v=LI_5TZ7tnOE\" rel=\"noopener noreferrer\" target=\"_blank\"><em>Kotlin for Science<\/em> by Alexander Nozik<\/a> and another one <a href=\"https:\/\/www.youtube.com\/watch?v=13eYMhuvmXE\" rel=\"noopener noreferrer\" target=\"_blank\"><em>Gradient Descent with Kotlin<\/em> by Erik Meijer<\/a>.<\/p>\n<p>We also recommend watching these talks from the past two KotlinConf conferences: a <a href=\"https:\/\/www.youtube.com\/watch?v=yjVW6uCmVBA\" rel=\"noopener noreferrer\" target=\"_blank\">talk by Holger Brandl<\/a> (the creator of <a href=\"https:\/\/github.com\/holgerbrandl\/krangl\" rel=\"noopener noreferrer\" target=\"_blank\">krangl<\/a>, Kotlin\u2019s analog of Python\u2019s pandas), and this <a href=\"https:\/\/www.youtube.com\/watch?v=-zTqtEcnM7A&#038;feature=youtu.be\" rel=\"noopener noreferrer\" target=\"_blank\">talk by Thomas Nield<\/a> (the creator of <a href=\"https:\/\/github.com\/thomasnield\/kotlin-statistics\" rel=\"noopener noreferrer\" target=\"_blank\">kotlin-statistics<\/a>).<\/p>\n<p><em>That\u2019s it for today (and probably for this year). Wrapping it all up, the community is adopting Kotlin for data science at a good pace, so now it\u2019s <em>your <\/em>turn.<\/em><\/p>\n<p><em>Let\u2019s Kotlin!<\/em><\/p>\n","protected":false},"author":63,"featured_media":0,"comment_status":"open","ping_status":"closed","template":"","categories":[909],"tags":[953,590,866],"cross-post-tag":[],"acf":[],"_links":{"self":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/kotlin\/5695"}],"collection":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/kotlin"}],"about":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/types\/kotlin"}],"author":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/users\/63"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/comments?post=5695"}],"version-history":[{"count":1,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/kotlin\/5695\/revisions"}],"predecessor-version":[{"id":139471,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/kotlin\/5695\/revisions\/139471"}],"wp:attachment":[{"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/media?parent=5695"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/categories?post=5695"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/tags?post=5695"},{"taxonomy":"cross-post-tag","embeddable":true,"href":"https:\/\/blog.jetbrains.com\/ko\/wp-json\/wp\/v2\/cross-post-tag?post=5695"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}