Kotlin
A concise multiplatform language developed by JetBrains
Track and Analyze GitHub Star Growth With Kandy and Kotlin DataFrame
Kotlin DataFrame and Kandy are two powerful tools for data analysis in Kotlin. Kotlin DataFrame simplifies data manipulation and processing, while Kandy allows you to create visualizations directly within your Kotlin projects.
In this post, we’ll show you how these tools can be used together within Kotlin Notebook to analyze the star history of GitHub repositories. This isn’t just a simple exercise for demonstration purposes – it’s a tutorial that can help you learn how to analyze your own repositories, understand their popularity trends, and visualize your data effectively. All examples from this post are available as a Kotlin Notebook on GitHub or a Notebook on Datalore, a data science platform by JetBrains.
Analyze your GitHub star history
Understanding the star history of a GitHub repository can provide insights into its popularity and growth over time. By analyzing this data, you can see how different events and activities impact the interest in your project. Our goal is to equip you with the knowledge and tools to perform this analysis on your own repositories.
Obtain repository stargazers data from GitHub
First, we need to gather data about the users who have starred a given repository. To achieve this, we’ll use the GitHub GraphQL API, which requires a GitHub access token. Here’s a simple function to request data about repo stars, including the starring time and user login:
import io.ktor.client.request.* import io.ktor.http.* /** * We need to specify the repository owner and name, as well as the access token. * There can be up to 100 results on one response page. * For this example, we'll take only the first 3 results. * `endCursor` points to the end of the previous page (`null` for the first one). */ fun fetchStarHistoryPage(owner: String, name: String, token: String, first: Int = 100, endCursor: String? = null): NotebookHttpResponse { // GraphQL query val query = """ query { repository(owner: "$owner", name: "$name") { stargazers(first: $first, after: $endCursor) { edges { starredAt node { login } } pageInfo { endCursor hasNextPage } } } } """.trimIndent() // `http` is the default Ktor `HttpClient` for Notebook; // it has the same methods but without `suspend` modifiers, // allowing you to make HTTP requests quickly and easily. // Make a "post" request to the API with this query return http.post("https://api.github.com/graphql") { // Set authorization header with token bearerAuth(token) // Set content type header contentType(ContentType.Application.Json) // Set query as body setBody(buildJsonObject { put("query", query) }) } }
A convenient and easy way to set an environment variable is through the Kotlin Notebook settings:
Next, specify the repository owner and name, and ensure your GitHub token is securely stored:
val ownerKotlin = "Kotlin" val repoKandy = "kandy" // Keep your token safe as an environment variable or a system property! // For example, you can place it in environment variables in Kotlin Notebook settings. val token = System.getenv("GITHUB_TOKEN")
To start, let’s query a single page with a few users to examine the data.
val rawResponse = fetchStarHistoryPage(ownerKotlin, repoKandy, token, first = 3) rawResponse
The response from the API looks like this:
HttpResponse[https://api.github.com/graphql, 200 OK]
Next, we’ll deserialize the JSON response to a Kotlin data class using the .deserializeJson()
extension provided by our Kotlin Notebook Ktor integration. This makes it easier to work with the response body data in Kotlin.
val starHistorySimplePage = rawResponse.deserializeJson() // Take the JSON string for further work with DataFrame val responseAsJson = starHistorySimplePage.jsonString starHistorySimplePage
The result is a structured object representing the data, which looks like this:
{ "data": { "repository": { "stargazers": { "edges": [ { "starredAt": "2022-07-13T22:46:16Z", "node": { "login": "manojselvam" } ... }
After executing the cell above, starHistorySimplePage
is converted to a data class, allowing us to easily access those of its properties that correspond to JSON fields. This seamless integration with IntelliJ IDEA autocompletion makes working with the response straightforward.
For example, we can extract all the starring times from the page:
starHistorySimplePage.data.repository.stargazers.edges.map { it.starredAt }
Output:
[2022-07-13T22:46:16Z, 2022-11-05T14:21:10Z, 2022-11-05T18:42:37Z]
Next, let’s parse the page data into a DataFrame.
val starHistoryPageDF = DataFrame.readJsonStr(responseAsJson) starHistoryPageDF
We need two columns: one showing the user logins and the other their starring times. We can retrieve these columns as follows:
starHistoryPageDF.data.repository.stargazers.edges .single() // the `edges` column contains a single DataFrame with current page stargazers .flatten() // `login` is a subcolumn of `node`, after `flatten()` it is a simple column
Additionally, we need page meta-information, including whether there is a next page and the current page end cursor.
with(starHistoryPageDF.data.repository.stargazers.pageInfo) { // Both are columns with a single value println("end cursor: ${endCursor.single()}") println("has next page: ${hasNextPage.single()}") }
This code outputs the following:
end cursor: Y3Vyc29yOnYyOpIAzhXiSlk= has next page: true
Now, let’s create a function that iteratively processes all pages with stargazers and returns a DataFrame with complete information:
// Casts DataFrame to the type of a given DataFrame so we can use // extension columns that have already been generated. // Temporary workaround, will be available in future DataFrame releases // (https://github.com/Kotlin/dataframe/pull/747) inline fun <reified T> AnyFrame.castTo(df: DataFrame<T>): DataFrame<T> { return cast<T>(verify = true) }
import io.ktor.client.statement.* // Provide repo owner, name, and access token fun fetchStarHistory(owner: String, name: String, token: String): AnyFrame { var hasNextPage: Boolean = true var endCursor: String? = null var buffer: DataFrame<*> = DataFrame.Empty while (hasNextPage) { val response = fetchStarHistoryPage(owner, name, token, 100, endCursor) // Cast type of DataFrame to the type of `starHistoryPageDF`, // so we can use its already-generated extensions val responseDF = DataFrame.readJsonStr(response.bodyAsText()).castTo(starHistoryPageDF) val stargazers = responseDF.data.repository.stargazers buffer = buffer.concat(stargazers.edges.first().flatten()) val pageInfo = stargazers.pageInfo endCursor = "\"${pageInfo.endCursor.single()}\"" hasNextPage = pageInfo.hasNextPage.single() } return buffer }
Using this function, we can now retrieve all the Kandy stargazers:
val kandyStargazers = fetchStarHistory(ownerKotlin, repoKandy, token) kandyStargazers
Look at the DataFrame summary using the .describe()
method, which shows meta-information and accumulated statistics about DataFrame columns:
kandyStargazers.describe()
All login values are unique, indicating that the dataset is correct. Additionally, there are no null values, so no further processing is needed.
Create a DataFrame for cumulative star count analysis
We now have two key pieces of information: user logins and the times they award stars. Our next step is to perform an initial analysis.
We’ll create a visualization showing the cumulative number of stars received over time, illustrating how user interest in our library grows and changes.
This approach will help us understand the dynamics of user engagement and the popularity of our library.
Here’s how to transform this data:
- Convert the
starredAt
column toLocalDateTime
. - Sort the DataFrame by
starredAt
, in ascending order. - Add a
starsCount
column to track the total number of stars over time.
Put the processing code into a function so that it can be reused later on.
fun AnyFrame.processStargazers(): AnyFrame { return castTo(kandyStargazers) // Convert `starredAt` column to `LocalDateTime` .convert { starredAt }.toLocalDateTime() // Sort rows by `starredAt` .sortBy { starredAt } // Add `starsCount` column with total stars count at each row. // The star count is simply the row index increased by 1 .add("starsCount") { index() + 1 } }
val kandyStarHistory = kandyStargazers.processStargazers() kandyStarHistory
Visualize star history: plot with Kandy
With the data processed, we can now visualize the star history using Kandy. Here’s a simple line plot to show how the number of stars has changed over time.
kandyStarHistory.plot { line { // The starring time corresponds to the `x` axis x(starredAt) { axis { // Set the name for the `x` axis name = "date" // Set the format for axis breaks breaks(format = "%b, %Y") } } // The stars count corresponds to the `y` axis y(starsCount) { // Set the name for the `y` axis axis.name = "GitHub stars" } } layout { title = "Kandy GitHub star history" size = 800 to 500 } }
The plot displays the cumulative growth of stars, reflecting how interest in the Kandy library has evolved. Key points of significant increase can often be associated with major announcements or events related to the library.
To better understand how user interest in our library evolves over time, we’ll animate this chart using the Kotlin Jupyter API. This dynamic visualization will help us see how engagement patterns shift and grow, providing deeper insights than a static chart could offer.
We’ll start by creating a function that builds a star history chart for the first n
star(s).
fun kandyStarHistoryPlot(n: Int) = kandyStarHistory.plot { line { x(starredAt.take(n)) { axis { name = "date" breaks(format = "%b, %Y") } } y(starsCount.take(n)) { axis.name = "GitHub stars" } } layout { title = "Kandy GitHub star history" size = 800 to 500 } }
Then, we’ll use the ANIMATE()
function to update the cell output for a given set of frames. Each frame will be a star history plot, starting with one star and incrementing by one star each frame until we reach the maximum number of stars.
ANIMATE(50.milliseconds, kandyStarHistory.rowsCount()) { frameID -> // frame with `frameID` contsins plot with `frameID + 1` stars kandyStarHistoryPlot(frameID + 1) }
Analyze key events
We’ll look at how different events influenced the growth of stars. We’ll add mark lines with the most important events related to Kandy, such as the Kotlin Notebook video, the Kandy introductory post, the Plotting Financial Data in Kotlin with Kandy post, and KotlinConf 2024. Such analysis helps to identify what drives interest and engagement with the project.
We’ll look at events starting from October 2023, which was when we initiated our marketing activities:
val starHistoryFiltered = kandyStarHistory.filter { starredAt >= LocalDateTime(2023, 10, 1, 0, 0, 0, 0) }
Then we’ll add mark lines with the events:
val ktnbYTVideodate = LocalDate(2023, 10, 25) val kandyIntroductoryPostDate = LocalDate(2023, 12, 14) val kandyFinancialPostDate = LocalDate(2024, 4, 9) val kotlinConf24Date = LocalDate(2024, 5, 22) val kandyEvents = listOf( "Kotlin Notebook\nYouTube video", "Kandy Introduction\nKotlin Blog post", "Financial Plotting\nMedium post", "KotlinConf 2024" ) val kandyEventsDates = listOf(ktnbYTVideodate, kandyIntroductoryPostDate, kandyFinancialPostDate, kotlinConf24Date)
To make the plot more visually engaging, we’ll create a custom color palette for these event markers.
val eventColors = listOf( Color.hex("#1f77b4"), Color.hex("#ff7f0e"), Color.hex("#d62728"), Color.hex("#2ca02c"), )
Finally, we’ll generate the plot with vertical lines representing these events, allowing us to see how each significant event influenced the star history.
starHistoryFiltered.plot { // add vertical marklines with event dates vLine { color(kandyEvents, "event") { scale = categorical(eventColors, kandyEvents) } xIntercept(kandyEventsDates) width = 1.5 alpha = 0.9 } line { x(starredAt) {axis.name = "date" } y(starsCount) { axis.name = "GitHub stars" } } layout { title = "Kandy GitHub star history & key events" size = 800 to 500 style { legend.position = LegendPosition.Bottom } } }
This plot shows the number of stars Kandy received each month, with different colors representing key events that influenced these numbers. For example, the introductory post and other significant updates coincide with noticeable increases in stars, highlighting the influence of these activities on community engagement.
Analyze monthly star growth
To analyze the monthly growth of stars, we will create a bar chart to visually display the changes in the number of stars received each month. This visualization will help us identify key growth periods and evaluate the effectiveness of our marketing strategies.
First, let’s define simple extension functions to convert the LocalDate/LocalDateTime
to a month and four-figure year format.
fun LocalDate.toMonthOfYear(): String = "$month, $year" fun LocalDateTime.toMonthOfYear(): String = "$month, $year"
Now, we’ll add the “month” column to our DataFrame:
val starHistoryWithMonth = starHistoryFiltered.add("month") { starredAt.toMonthOfYear() } starHistoryWithMonth
Next, we’ll group the DataFrame by the “month” column and count the number of stars in each group.
val starsCountMonthly = starHistoryWithMonth.groupBy { month }.count() starsCountMonthly
Next, we’ll add information about key events to the DataFrame. We’ll include the events in the corresponding months and set the value to null
if there were no events.
First, create a DataFrame with events and their corresponding months:
val eventsDF = dataFrameOf("event" to kandyEvents, "month" to kandyEventsDates.map { it.toMonthOfYear() })
Then, perform a left join with our main DataFrame at the month
column:
val starsMonthlyWithEvent = starsCountMonthly.leftJoin(eventsDF) { month } starsMonthlyWithEvent
Now, we can create a bar plot to visualize the distribution of new stars by month, along with the key events.
starsMonthlyWithEvent.plot { bars { x(month) y(count) alpha = 0.8 fillColor(event) { scale = categorical(eventColors, kandyEvents) } } // add horizontal markline with median of monthly count hLine { val medianMonthly = count.median() yIntercept.constant(medianMonthly) type = LineType.DASHED color = Color.hex("#4b0082") width = 2.0 } layout { title = "Kandy GitHub star history (monthly count)" size = 800 to 500 style { legend.position = LegendPosition.Bottom xAxis.text { angle = 30.0 } } } }
This plot shows the monthly distribution of stars, with bars representing the number of stars each month. The colors of the bars indicate key events, providing a clear visualization of how these events impacted the star counts. The dashed horizontal line represents the median star count per month.
Unlike the overall star history chart, which shows cumulative growth, the monthly statistics plot helps you pinpoint the exact timing and impact of key events. By creating similar plots for your own projects, you can better understand the effectiveness of your promotional efforts, identify seasonal patterns, and plan future activities more effectively.
Understand your audience
Understanding the top programming languages of your stargazers can provide insights into your audience. With this in mind, we’ll use the GitHub REST API to find out the most popular languages among Kandy stargazers and visualize this data as a pie chart.
Let’s write a function that requests user repositories:
import io.ktor.http.* fun getUserRepos(login: String): AnyFrame { return DataFrame.readJsonStr(http.get("https://api.github.com/users/$login/repos") { // Set authorization header with token bearerAuth(token) // Add GitHub API custom "accept" header header(HttpHeaders.Accept, "application/vnd.github.v3+json") }.deserializeJson().jsonString) }
Next, we’ll test this function on our sample repositories:
val myRepos = getUserRepos("Kotlin") myRepos
Each column in this DataFrame corresponds to a repository and contains different information about that repository. We are interested in the language
column. We can count the most frequent language using the .valueCounts()
method, where the first entry represents the most popular language:
val myLanguagesCounts = myRepos.language.valueCounts(dropNA = false) // Don't drop nulls myLanguagesCounts
Because the rows are sorted by count by default, identifying the most popular language is straightforward – it’s the first one.
myLanguagesCounts.language.first()
Kotlin
To generalize this process, we’ll write an extension function for a DataFrame obtained from the user’s repositories. This extension function will retrieve the most popular language (returning null
if the account is private, has no repositories, or lacks sufficient information).
fun AnyFrame.getTopLanguage(): String? { // Handle non-default response bodies (private account, no repositories, etc.) if (!containsColumn("language")) return null return castTo(myRepos).language .valueCounts(dropNA = false) .castTo(myLanguagesCounts) .language.let { languages -> val first = languages.firstOrNull() // Try to pick the second value if the first one is null if (first == null && languages.size() >= 2) { languages[1] } else first } }
Now, let’s retrieve the most popular languages for all stargazers. Note that this process might take some time to execute:
val stargazersLanguages = kandyStarHistory.select { login and login.map { login -> getUserRepos(login).getTopLanguage() }.named("language") }
stargazersLanguages
Next, we’ll count the occurrences of each language:
val languageCounts = stargazersLanguages.language.valueCounts() // Drops null by default languageCounts
Finally, let’s plot these counts as a pie chart. We’ll take the seven most popular languages and group the remaining ones into an “other” category:
languageCounts.let { val takeFirst = 7 it.take(takeFirst).concat( dataFrameOf("language" to listOf("other"), "count" to listOf(it.drop(takeFirst).sum {count})) ) }.plot { pie { slice("count") fillColor("language") size = 25.0 hole = 0.3 } layout { title = "Kandy stargazers' most popular languages" style(Style.Void) } }
The pie chart shows that Kotlin is the most popular language among Kandy stargazers, confirming our primary audience as Kotlin developers. The presence of Java suggests potential for further engagement with related ecosystems. The inclusion of less-common languages highlights the diversity of our user base, which is important for understanding different use cases and potential feature requests.
These insights can help tailor your project’s documentation, tutorials, and marketing efforts to better serve and expand your audience.
Compare star growth: Kandy vs. Kotlin DataFrame
Comparing star data across different projects can provide valuable insights into their popularity and user engagement. Here, we’ll look at the growth of stars for Kandy alongside Kotlin DataFrame. These two projects, launched within a year of each other, target the same audience of Kotlin developers.
To ensure a fair comparison, we’ll use the introduction post date as the starting point for both libraries and examine the six months that followed. This way, we can see how each project grew over the same timeframe, giving us a clearer picture of their growth patterns.
val repoDataframe = "dataframe" // Use the already written methods to get star history for DataFrame val dataFrameStarHistory = fetchStarHistory(ownerKotlin, repoDataframe, token).processStargazers()
Defining the introductory post date for DataFrame:
val dataFrameIntroductoryPostDate = LocalDate(2022, 6, 30)
Next, we’ll define a function to process the star history for the six months following the introduction post:
// Function that will slightly transform the dataframe with star history for a given library: // 1) Take a period of six months after the introduction post date; // 2) Add a column "daysAfterPost" with the number of days after the post date; // 3) Take the maximum number of stars for the day; // 4) Add a column "library" corresponding to the name of the library. fun AnyFrame.proccessAfterPostPeriod(introductionPostDate: LocalDate, library: String): AnyFrame { // Six-month period after `introductionPostDate` val period = (introductionPostDate - DatePeriod(days = 1))..(introductionPostDate + DatePeriod(months = 6)) return castTo(kandyStarHistory) // Only take stars placed during that period .filter { starredAt.date in period } // Add daysAfterPost column with number of days after post .add("daysAfterPost") { introductionPostDate.daysUntil(starredAt.date) } // Group by number of days and take the max value of `starsCount` for each group .groupBy("daysAfterPost").max { starsCount } // Add a column with library name .add("library") { library } }
Finally, we’ll combine the star histories for Kandy and DataFrame into a single DataFrame for comparison:
// Count six-month history for both libraries and concatenate them into one DataFrame val kandyAndDataFrameStarHistory = kandyStarHistory .proccessAfterPostPeriod(kandyIntroductoryPostDate, "Kandy") .concat( dataFrameStarHistory.proccessAfterPostPeriod(dataFrameIntroductoryPostDate, "DataFrame") ) kandyAndDataFrameStarHistory
Next, we’ll visualize the comparison:
kandyAndDataFrameStarHistory.plot { line { x(daysAfterPost) { axis { name = "days after post" } } y(starsCount) { axis.name = "GitHub stars" } color(library) } layout { title = "Kandy vs. DataFrame GitHub stars history\nwithin 6 months after the introductory post" size = 800 to 500 } }
From the initial observation, we can see that before the introduction post, both Kandy and Kotlin DataFrame had similar star counts. However, immediately after the post, Kandy showed a significantly higher growth rate, achieving nearly twice as many stars as DataFrame within the first six months.
This difference suggests several things. Firstly, it shows the growing interest in Kotlin for data projects. The period of time that elapsed from the initial DataFrame post and the Kandy post was about a year and a half. While DataFrame helped establish a community of Kotlin data enthusiasts, Kandy attracted a new audience interested in visualization.
Additionally, Kandy had more intense promotional activities within the six months following its first post, which likely contributed to its rapid growth.
Shared stargazers
It’s also interesting to see how many users have starred both Kandy and DataFrame. We hypothesize that there will be a significant overlap, since both libraries serve the same community of Kotlin developers. Here’s how we can analyze this and get the relevant data:
// inner join star history dataframes of repositories by login, // getting a dataframe with all common stargazers, taking its size to get a number of them val commonStargazers = kandyStarHistory.innerJoin(dataFrameStarHistory) { login }.rowsCount() val kandyTotalStargazers = kandyStarHistory.rowsCount() val kandyOnlyStargazers = kandyTotalStargazers - commonStargazers val dataFrameTotalStargazers = dataFrameStarHistory.rowsCount() val dataFrameOnlyStargazers = dataFrameTotalStargazers - commonStargazers
Plot this data as a pie chart:
plot { pie { slice(listOf(commonStargazers, kandyOnlyStargazers, dataFrameOnlyStargazers)) fillColor(listOf("Common", "Kandy only", "DataFrame only")) { scale = categorical( "Common" to Color.hex("#4A90E2"), "Kandy only" to Color.hex("#F5A623"), "DataFrame only" to Color.hex("#7ED321"), ) legend.name = "" } size = 25.0 } layout { title = "Kandy & DataFrame stargazers ratio" style(Style.Void) } }
The analysis shows that the majority of stargazers are unique to DataFrame, with fewer users starring both DataFrame and Kandy. Specifically, the share of DataFrame stargazers who also starred Kandy is quite small. This is probably because many users use DataFrame for data tasks that don’t involve visualization, making Kandy less relevant to them.
Interestingly, only about a quarter of Kandy stargazers have also starred DataFrame. This suggests that Kandy has attracted a new audience mainly interested in plotting, rather than data processing. This reveals a great opportunity to promote how both libraries can work together.
Using Kandy for visualization and DataFrame for data processing allows users to benefit from the strengths of both libraries. This combination, as we’ve shown in this post, can help create powerful and comprehensive data analysis solutions. By highlighting this synergy, we can encourage more users to explore how these tools can complement each other and enhance their data projects.
Conclusion
In this post, we explored how to use Kotlin DataFrame and Kandy to dive into the star history of GitHub repositories. But it wasn’t just about looking at the numbers – it was about uncovering the stories those numbers tell.
One big takeaway is how quickly Kandy gained traction after its launch, highlighting a growing interest in visualization tools within the Kotlin community. Yet, we also found that many Kandy users haven’t tried DataFrame, and vice versa. This shows there’s an opportunity to help developers see how these tools can complement each other.
We also noticed that certain events, like blog posts and conferences, had a noticeable impact on star counts. This kind of insight can help you time your own announcements to get the most attention.
What’s next?
Now it’s your turn! Apply these techniques to your own repositories, analyze their star history, and create your own visualizations within Kotlin Notebook. All examples from this post are available as a Kotlin Notebook on GitHub or a Notebook on Datalore.
We’d love to see your results and hear your feedback. Join us in the #datascience channel on Kotlin Slack, or reach out via GitHub issues for Kandy or Kotlin DataFrame.
If you find our repositories useful, we’d really appreciate it if you starred them. Your support helps us improve and develop these tools further.
What else to read and watch
For more information, check out the following resources:
- Kotlin for Data Analysis Overview
- Get started with Kotlin Notebook
- A Step-by-Step Guide to Performing Data Analysis With Kotlin DataFrame
- Data Analytics With Kotlin Notebooks, DataFrame, and Kandy