JetBrains Research
Research is crucial for progress and innovation, which is why at JetBrains we are passionate about both scientific and market research
Comparative Analysis of Development Cycle Speed in Java and Kotlin Based on IDE Telemetry Data
Introduction
Does the choice of programming language affect how fast developers deliver code? This question matters for engineering teams evaluating technology stacks, yet it is notoriously hard to answer. Self-reported surveys suffer from recall bias, lines-of-code comparisons conflate conciseness with productivity, and controlled experiments rarely scale beyond a handful of participants.
In 2024, Meta introduced Diff Authoring Time (DAT) – the wall-clock time from when a developer starts working on a code change to when they submit it for review – as a scalable, telemetry-based productivity metric. Inspired by that work, we adapted the concept for IntelliJ IDEA’s built-in usage telemetry (feature usage statistics) and constructed IDE-DAT: the time from first code edit to push, measured directly inside the IDE.
This post presents a large-scale observational study comparing development cycle speed in Java and Kotlin. We analyzed telemetry data from approximately 320,000 IntelliJ IDEA developers over 20 months (November 2023 – June 2025), covering roughly 28 million development cycles.
After controlling for user, project, overall time trend, and task size, we find that development cycles in Kotlin-oriented projects are generally shorter than comparable cycles in Java-oriented projects – roughly 15–20% shorter for everyday small, medium, and large tasks. In practice, the main pattern is not a dramatic one-time speedup, but slower degradation over time: as projects mature, cycle times in unmigrated Java contexts tend to grow, while Kotlin-oriented contexts deteriorate less.
A note on transparency. JetBrains is the creator of Kotlin, and we are aware that any study comparing Kotlin favorably to Java may be perceived as biased. For this reason, we rely on a rigorous statistical framework – longitudinal difference-in-differences on log-transformed outcomes, with multiple control groups and validity checks. We present the methodology, the data, the limitations, and the open questions in full so the reader can assess the strength of the evidence independently.
In the sections that follow, we describe the metric (Section 1), present the key finding and its practical magnitude (Section 2), walk through the detailed results (Section 3), examine threats to validity and open questions (Section 4), and document the full methodology (Section 5).
1. Measuring development speed: The IDE-DAT metric
1.1 The “first edit → push” cycle
IDE-DAT (IDE diff authoring time) is an adaptation of Meta’s DAT for IntelliJ IDE telemetry.
We measure the duration of a single development cycle:
Push₁ → [first edit **, …, edits, …, commits] → Push₂ **
- Cycle start = the moment of the first Java/Kotlin file edit after the previous push.
- Cycle end = the moment of the next push.
- IDE-DAT = wall-clock time between them.
This serves as a proxy for “time spent working on a single change” – from the moment a developer starts writing code to the moment they push the result.
1.2 How task size is determined
Within each cycle, we count the number of edit events – instances of file editing that the IDE reports, with a one-minute cooldown (after each report, the system remains silent for one minute, even if the developer continues typing). The number of edit events is a proxy for task size: roughly speaking, how many times the developer switched between reading and writing code during the cycle. We do acknowledge, though, that the number of edits may also depend on the seniority level of the developers studied.
Cycles are grouped into size buckets:
| Bucket | Number of edits | Typical cycle duration (median, Java) | What kind of tasks |
|---|---|---|---|
| S | 1–5 | ~10 min | Small fix, single-file change |
| M | 6–15 | ~30 min | Small feature, bug fix |
| L | 16–40 | ~1.5–2 h | Feature spanning multiple files |
| XL | 41+ | ~10 h | Large feature or refactoring |
1.3 How cycle language is determined
For each cycle, edit events are tallied by file type. If Java edits outnumber Kotlin edits, the cycle is classified as a Java cycle; if Kotlin edits outnumber Java edits, it is classified as a Kotlin cycle. Cycles with equal counts are excluded.
1.4 What the metric does not measure
- Time spent on code review, planning, discussions, or CI/CD.
- Code quality (bugs and reverts).
- The distinction between pushing to a feature branch vs. main (branch names are not reported).
- Code volume in lines (an edit event ≠ number of lines).
2. Key finding: Kotlin cycles are shorter
2.1 Development cycles in Kotlin are shorter for comparable tasks
Using the primary longitudinal log-DiD estimator on user-project × task-size contexts, the “first edit → push” cycle is shorter after migration to Kotlin than in comparable unmigrated Java contexts:
| Task size | Typical cycle (Java)* | Primary estimate | 95% CI | In absolute terms** |
|---|---|---|---|---|
| S: small fix | ~10 min | −15.7% | [−24.4%, −6.0%] | ~1–2 min faster |
| M: small feature | ~30 min | −20.3% | [−31.3%, −7.6%] | ~6 min faster |
| L: multi-file feature | ~1.5 – 2 h | −15.1% | [−26.8%, −1.6%] | ~15–20 min faster |
| XL: large feature | ~10 h | −11.0% | [−23.5%, +3.5%] | Directionally ~1 h faster, but imprecise |
* Approximate median cycle duration in the Kotlin migrants’ Java phase for that task-size bucket. Exact bucket medians are shown in Section 3.3.
** Approximate translation of the primary percentage estimate into minutes or hours for a typical Java cycle in that bucket.
How to read the table: A “small feature” (bucket M) is a cycle in which the developer made 6–15 editing sessions before pushing. A typical such cycle in Java lasts ~30 minutes from first edit to push. In the primary estimator, the corresponding post-migration Kotlin context is 20.3% shorter – approximately 24 minutes instead of 30.
The effect is obtained using a longitudinal difference-in-differences on log(DAT): For each user-project and task-size bucket, we compare the pre→post change among Java→Kotlin migrants with the corresponding change in the unmigrated ava control group. This subtracts the overall time trend and isolates the effect associated with the transition to Kotlin.
Important: In the stricter estimator, the bulk of the effect is still explained by degradation in the control group, rather than by dramatic speedups among migrants. Full details can be found in Section 2.3.
2.2 A case for conservative estimation
We control for task size by the number of edit events. At the same time, Kotlin includes a number of language features (data classes, default arguments, properties, extension functions, smart casts, etc.) that make code more concise. Consequently, the same logical task (one ticket in a tracker) might require, say, 20 edits in Java but only 15 in Kotlin. These would fall into different buckets.
Therefore, in cases where this assumption holds true, a direct comparison of “the same ticket” would make the gap between the Kotlin migrants and the Java control group even larger.
2.3 How the effect manifests
The stricter DiD estimate is composed of two components (with task size controlled via buckets):
- Java→Kotlin migrants improve modestly for small and medium tasks: In the primary log-scale model, pre→post change is about −8% for buckets S and M, and roughly flat for L and XL.
- Unmigrated Java contexts degrade across all buckets: In the same model, pre→post change is about +9% to +17%.
In other words, projects that migrated to Kotlin exhibit materially less cycle-time growth than projects that remain on Java.
This is echoed by a complementary comparison on absolute DAT without task-size normalization: Projects that have consistently stayed on Kotlin (never migrated) degrade by +14.5% at p90 of absolute DAT, whereas unmigrated Java projects degrade by +23.1% (details in Section 3.4). Both groups degrade (projects grow more complex over time), but Kotlin projects do so at roughly half the rate at p90.
2.4 Practical magnitude
In practical terms, the central estimate corresponds to roughly 1–2 minutes saved on a small fix, ~6 minutes on a small feature, and ~15–20 minutes on a multi-file feature. For XL tasks, the point estimate is also negative, but the interval is too wide for a firm claim. Compared with the earlier descriptive median contrast, the stricter estimator no longer supports a monotonic “bigger task → bigger effect” story; the stable conclusion is narrower: for comparable tasks, Kotlin-oriented contexts show substantially less cycle-time growth than unmigrated Java controls.
3. Evidence in detail
3.1 Java→Kotlin migrants: DAT by phase
1,501 users, 1,664 user-projects, ~76K cycles. For each month of a migrant’s activity, a phase is determined by the share of Kotlin edits: Java phase (<10%), Transition (10–50%), and Kotlin phase (>50%).
| Metric | Java phase (N=29,554) | Transition (N=11,657) | Kotlin phase (N=35,406) | Δ Java→Kotlin |
|---|---|---|---|---|
| p25 DAT | 6.3 min | 5.6 min | 5.8 min | −8.7% |
| Median DAT | 34.7 min | 32.2 min | 32.1 min | −7.5% |
| p75 DAT | 4.51 h | 4.13 h | 4.02 h | −10.9% |
| p90 DAT | 39.2 h | 36.7 h | 34.2 h | −12.7% |
| Avg DAT | 14.5 h | 14.0 h | 13.6 h | −6.4% |
| Edits/cycle | 22.5 | 19.8 | 24.3 | +8.0% |
A monotonic decrease in DAT across all percentiles, with a smooth transition through the Transition phase. Notably, the number of edits per cycle increases – tasks in the Kotlin phase are larger, yet the cycle is still shorter.
3.2 Control group: Unmigrated Java
320,248 users, 665,154 user-projects, ~28M cycles. Users who remained on Java (kotlin_share <10% at the start and end, ≥4 active months). Their history is divided into three equal time-based thirds.
| Metric | Early | Middle | Late | Δ Early→late |
|---|---|---|---|---|
| p25 DAT | 6.6 min | 6.2 min | 6.0 min | −8.4% |
| Median DAT | 38.9 min | 37.7 min | 35.7 min | −8.2% |
| p75 DAT | 5.25 h | 5.45 h | 5.35 h | +2.0% |
| p90 DAT | 31.7 h | 35.5 h | 39.0 h | +23.1% |
| Avg DAT | 12.7 h | 13.7 h | 14.5 h | +14.1% |
| Edits/cycle | 21.2 | 19.6 | 17.9 | −15.4% |
The unmigrated Java group exhibits degradation: the median decreases slightly, but the tails (p75, p90) and the mean increase substantially. Projects grow more complex, long cycles become even longer, and the number of edits per cycle declines.
3.3 Primary longitudinal log-DiD with task-size control
We compare the DAT of same-size cycles among migrants (Java phase vs. Kotlin phase) and the control group (early vs. late). The bucket-level median tables below are provided for descriptive context only. The primary effect size is estimated afterwards on the basis of pre/post changes in log(DAT) at the user-project × task-size-bucket level.
Java→Kotlin migrants – median DAT by bucket and phase:
| Bucket | Java phase | Kotlin phase | Δ |
|---|---|---|---|
| S: 1–5 edits | 10.4 min | 9.6 min | −7.5% |
| M: 6–15 edits | 33.9 min | 31.9 min | −5.8% |
| L: 16–40 edits | 1.82 h | 1.70 h | −6.3% |
| XL: 41+ edits | 11.4 h | 12.2 h | +6.7% |
unmigrated Java control group: median DAT by bucket:
| Bucket | Early | Late | Δ |
|---|---|---|---|
| S: 1–5 edits | 10.5 min | 11.0 min | +4.7% |
| M: 6–15 edits | 35.2 min | 38.8 min | +10.3% |
| L: 16–40 edits | 1.93 h | 2.27 h | +17.7% |
| XL: 41+ edits | 12.1 h | 15.3 h | +26.3% |
Primary estimator: log-DAT DiD on user-project × task-size-bucket contexts
Panel size: 978 migrant contexts and 400,425 control contexts, each with ≥3 cycles in both pre and post periods. Standard errors are clustered by machine_id.
| Task size | Migrants pre→post | unmigrated Java pre→post | Primary log-DiD effect | 95% CI |
|---|---|---|---|---|
| S: small fix (~10 min) | −8.1% | +9.0% | −15.7% | [−24.4%, −6.0%] |
| M: small feature (~30 min) | −7.3% | +16.4% | −20.3% | [−31.3%, −7.6%] |
| L: multi-file feature (~1.5–2 h) | −0.3% | +17.5% | −15.1% | [−26.8%, −1.6%] |
| XL: large feature (~10 h) | −0.1% | +12.2% | −11.0% | [−23.5%, +3.5%] |
The stricter estimator is less extreme than the earlier descriptive median contrast and does not support a monotonic increase with task size. The stable conclusion is narrower: For comparable tasks, Kotlin-oriented contexts show materially less cycle-time growth than unmigrated Java controls, with statistically supported negative estimates in S, M, and L, and the strongest precision in S and M. Pooling S/M/L contexts yields a primary estimate of about −17.1% (95% CI [−23.7%, −9.9%]).
As a robustness check, equal-weighting users rather than user-project contexts yields similar point estimates for S (−18.8%) and M (−20.3%), a weaker but still negative estimate for L (−13.7%), and again an imprecise estimate for XL (−11.0%). Thus, the sign is stable, while exact magnitudes depend on weighting, especially for larger tasks.
3.4 Complementary evidence from stable groups: Unmigrated Kotlin vs. unmigrated Java
Without involving migrants — comparing trends of stable groups:
| Metric | unmigrated Java Δ E→L | unmigrated Kotlin Δ E→L |
|---|---|---|
| Median DAT | −8.2% | −7.7% |
| p75 DAT | +2.0% | +0.8% |
| p90 DAT | +23.1% | +14.5% |
| Avg DAT | +14.1% | +9.9% |
| Edits/cycle | −15.4% | −12.4% |
Unmigrated Kotlin projects degrade roughly half as fast at p90 and ~4 pp slower on average. This provides complementary evidence from stable groups and shows a directionally similar pattern without using the migrant cohort.
3.5 Cross-sectional comparison: Within-month
The most controlled design: same user, same project, same month, and same task size. 1,801 users and 6,908 contexts.
| Bucket | Java (median DAT) | Kotlin (median DAT) | Δ |
|---|---|---|---|
| S: 1–5 edits | 10.1 min | 9.8 min | −2.0% |
| M: 6–15 edits | 31.6 min | 31.6 min | 0% |
| L: 16–40 edits | 1.69 h | 1.62 h | −3.9% |
| XL: 41+ edits | 9.76 h | 10.91 h | +11.8% |
The cross-sectional effect is more modest (−2% to −4% for S/L, 0% for M, opposite in XL) than the primary longitudinal estimate. This suggests Kotlin’s contribution is not primarily an instantaneous within-month speedup, but rather a gradual reduction of cycle time .
4. Validity checks, limitations, and open questions
4.1 Addressing selection bias: Stepwise confounder control
At each stage of the analysis, we progressively eliminated confounders (factors that could distort the comparison − for example, if Kotlin developers are inherently more experienced or work on simpler projects):
| Design | Kotlin vs. Java difference |
|---|---|
| All users (naïve comparison) | −6% |
| Within-user (same individuals) | −3.5% |
| Within-user + within-project + within-month | +12% (!) |
| …+ task-size control | −2% to −4% (for S/M/L) |
| Longitudinal log-DiD + task-size control | ≈ −15% to −20% (S/M/L point estimates); XL directional only |
Each step of confounder control changes the picture. The naïve comparison overstates the effect (selection bias). Within-month, without task-size control yields the opposite result (+12% − Kotlin appears slower) because Kotlin cycles in mixed-language projects contain ~15% more edit events. Our hypothesis: In such projects, new functionality tends to be written in Kotlin (larger cycles), while legacy maintenance is done in Java (smaller cycles), and without normalization, this creates an artifact. Only after controlling for task size and moving to a longitudinal design does a stable negative gap emerge. In the stricter estimator, S, M, and L all remain negative, with the strongest precision in S and M, while XL is too imprecise for a firm claim.
4.2 Additional robustness checks
4.2.1 Stability across months
Cross-sectional comparison of Java vs. Kotlin in mixed-language projects (same user, same project) over six months:
| Month | Java median | Kotlin median | Δ | Kotlin faster? |
|---|---|---|---|---|
| 2025-01 | 31.8 min | 29.6 min | −6.9% | ✅ |
| 2025-02 | 35.1 min | 32.5 min | −7.5% | ✅ |
| 2025-03 | 27.3 min | 28.3 min | +3.7% | ❌ |
| 2025-04 | 27.0 min | 29.0 min | +7.4% | ❌ |
| 2025-05 | 30.0 min | 28.5 min | −5.2% | ✅ |
| 2025-06 | 31.7 min | 31.1 min | −2.1% | ✅ |
The direction is inconsistent (4 out of 6 months favor Kotlin). The instantaneous effect is small and noisy − a robust effect is only visible in the longitudinal design.
4.2.2 Breakdown by project size
A separate descriptive split by total project size shows a similar pattern. Because this check is based on aggregate p90/avg trends rather than the primary log-DiD estimator, it should be read as exploratory. The pattern is most pronounced in L-projects (500–2,000 edits over the entire period):
| Project size | Descriptive gap on p90 | Descriptive gap on avg |
|---|---|---|
| XL (2,000+ edits) | −13.9% | −11.7% |
| L (500–2,000 edits) | −39.7% | −23.9% |
4.2.3 Java version as a proxy for engineering culture
The “active team effect” hypothesis posits that the slower degradation of Kotlin projects is explained not by the language itself but by the characteristics of the team. If this were the case, then teams with a stronger engineering culture within the unmigrated Java group should degrade more slowly, too.
To test this, we used project JDK version as a proxy for engineering culture. The MODULE_JDK_VERSION event from IDE telemetry contains the major Java version. Unmigrated Java user-projects were segmented into:
- old_java: maximum JDK version ≤ 11 (~308K user-projects).
- modern_java: maximum JDK version ≥ 17 (~348K user-projects).
DAT/edit degradation (median minutes per edit, early → late) by bucket:
| Bucket | old_java Δ | modern_java Δ | All unmigrated Java Δ | Java→Kotlin migrants Δ |
|---|---|---|---|---|
| S: 1–5 edits | +0.1% | +4.0% | +4.7% | −7.5% |
| M: 6–15 edits | +9.1% | +10.5% | +10.3% | −5.8% |
| L: 16–40 edits | +17.2% | +16.6% | +17.7% | −6.3% |
| XL: 41+ edits | +28.9% | +25.1% | +26.3% | +6.7% |
The relationship between Java version and degradation rate is mixed: for large tasks (XL), modern_java degrades 3.8 pp more slowly than old_java, but for small tasks (S) it degrades 3.9 pp faster. For buckets M and L, the difference between segments is minimal (≤1.4 pp). There is no systematic advantage for modern_java.
Descriptive DAT/edit contrast recalculated with modern_java as the control group:
| Bucket | Migrants Δ | Control: all unmigrated Java | Descriptive contrast (original) | Control: modern_java | Descriptive contrast (adjusted) |
|---|---|---|---|---|---|
| S | −7.5% | +4.7% | −12.2% | +4.0% | −11.5% |
| M | −5.8% | +10.3% | −16.1% | +10.5% | −16.3% |
| L | −6.3% | +17.7% | −24.0% | +16.6% | −22.9% |
| XL | +6.7% | +26.3% | −19.6% | +25.1% | −18.4% |
When using modern_java as the control group, the descriptive contrast changes little (deviation ≤1.2 pp across buckets). This check was performed on the simpler DAT/edit view, not on the primary log-DiD estimator, so it should be read as auxiliary evidence only.
Interpretation: within this descriptive check, Java version as a proxy for engineering culture is weakly associated with DAT/edit degradation rate, and substituting the control group with modern_java has almost no effect on the descriptive contrast. This weakens the hypothesis that the difference between migrants and the control group is explained solely by team characteristics − at least to the extent that JDK version reflects those characteristics.
However, JDK version is only one possible proxy for engineering culture. Other factors (code review practices, CI/CD pipelines, refactoring habits) may differ between Kotlin migrants and the control group, even though they do not correlate with the Java version used.
4.3 Confidence level
| Aspect | Status | Comment |
|---|---|---|
| User control | ✅ | Within-user comparison (same individual) |
| Project control | ✅ | Within-project comparison (same project) |
| Time-trend control | ✅ | Primary log-DiD with unmigrated Java control; stable-group comparison gives complementary evidence |
| Estimator form | ✅ | Main result based on log-DAT changes at the user-project × task-size level |
| Project-size robustness | ✅ | Descriptive split by total project size shows a directionally similar pattern across the available project-size segments; this is supportive evidence, not part of the primary estimator |
| Task-size control | ✅ | Bucketing by number of edits per cycle |
| Secondary comparison group | ✅ | unmigrated Kotlin provides an additional comparison and shows a directionally similar pattern, although it is not part of the primary estimator |
| Sample size | ✅ | 1,501 migrants, ~76K migrant cycles, ~28M control cycles |
| Temporal stability | ⚠️ | Cross-sectional month-by-month comparison is unstable; effect is visible in the longitudinal design |
| Weighting sensitivity | ⚠️ | Magnitudes vary across context-weighted and cycle-weighted aggregations |
| Large-task precision | ⚠️ | XL interval includes zero; L is weaker in user-level robustness checks |
| Branch information | ⚠️ | Cannot distinguish a push to a feature branch from a push to main |
| Cycle definition | ⚠️ | A single push may encompass multiple tasks |
| Causality | ⚠️ | Observational study, not an experiment |
4.4 Threats to validity
What could weaken the result?
- Unobserved team characteristics: Segmentation of unmigrated Java by JDK version showed that even this rough proxy for engineering culture only slightly narrows the descriptive contrast for individual buckets. Other unobserved factors that systematically differ between migrants and the control group may exist and could further reduce the gap.
- Unobserved confounders: Factors we cannot measure remain: concurrent refactoring, process changes, and dependency upgrades.
- Weighting sensitivity: Equal-weighting user-project contexts yields stronger magnitudes than cycle-weighted variants. The sign remains negative, but the exact effect size depends on weighting, especially for large tasks.
- Large-task precision: The XL bucket is directionally negative, but its 95% CI includes zero. The L bucket is negative in the primary model but weaker under user-level robustness checks.
- Push ≠ PR: A push is a proxy for delivery. A PR may be created later or through a web interface.
- Calendar time: DAT includes nights, weekends, and lunch breaks. This adds noise, but it affects both groups equally.
- Cycle definition: A cycle = the interval between pushes; a single push may encompass multiple logical tasks.
What could strengthen the result?
- Bucketing may work in Java’s favor. If Kotlin code is more concise, then the same logical task in Kotlin may require fewer edits than in Java. In that case, within each bucket, Kotlin tasks would be objectively larger. This is a hypothesis we cannot verify directly from the available data (we do not know which “logical task” underlies each cycle), but if it holds − the real effect is stronger than measured.
4.5 Open questions
- Pre-trends / event-study: Do migrant and control trends look parallel before the transition to Kotlin?
- Alternative controls in the primary estimator: Does the log-DiD result hold when using only modern_java or matched controls?
- Robustness check on thresholds: Does the effect hold under alternative migrant definitions (5/20% instead of 10%, 40/60% instead of 50%)?
- DAT of reverse migrants with task-size control: Do cycle durations worsen when moving away from Kotlin?
- Long-term dynamics: Does the effect continue to grow after 12+ months on Kotlin, or does it plateau?
- Other productivity metrics: Which parts of the development process — such as compilation errors, build times, and rebuild frequency — would best clarify the data we collected?
- Android Studio: Will the difference be the same for the segment of Android developers?
- Propensity score matching: Could matching each migrant with a “twin” from the control group with similar baseline characteristics (project size, initial speed, activity) yield a more precise DiD estimate?
5. Methodology
5.1 IDE telemetry events
| Event | group_id | event_id | Key field |
|---|---|---|---|
| File editing | file.types.usage | edit | file_type = “JAVA” / “Kotlin” |
| Push | actions | action.finished | action_id = “Vcs.Push” / “Git.Commit.And.Push.Executor” |
On the nature of the edit event: The edit event is reported with a 1-minute cooldown − after being sent, the system does not record new edits for one minute, even if the developer continues typing. Between edit events, a developer also reads code, navigates the project, runs tests, and discusses issues with colleagues − all of which are part of the work cycle included in DAT but do not generate edit events. Therefore, the number of edits is a proxy for task size, not a measure of time spent. DAT measures the full wall-clock time of the cycle.
5.2 Filtering
- DAT > 36 sec and < 14 days
- Only product_code = ‘IU’ (IntelliJ IDEA)
- Only recorder_code = ‘FUS’
- Non-empty machine_id and project_id
5.3 Defining migration groups
For each (machine_id, project_id) per month, kotlin_share = kotlin_edits / (java_edits + kotlin_edits) is calculated. A minimum of ≥10 Java/Kotlin edit events per month and ≥4 active months is required.
| Group | Definition |
|---|---|
| A: Java→Kotlin migrants | First month kotlin_share < 10%, last month > 50% |
| B: Unmigrated Java | First and last month kotlin_share < 10% |
| C: Unmigrated Kotlin | First and last month kotlin_share > 50% |
Migrant phases are determined monthly: Java phase (<10%), Transition (10–50%), and Kotlin phase (>50%).
unmigrated group phases: History is divided into three equal time-based thirds (early / middle / late) − analogous to migrant phases.
5.4 Difference-in-differences (DiD)
The primary estimator is a longitudinal log-DiD on user-project × task-size-bucket contexts:
- A context is defined as (machine_id, project_id, size_bucket).
- Only contexts with ≥3 cycles in both pre and post periods are kept.
- For each context, we compute ΔlogDAT = mean(log(DAT))_post − mean(log(DAT))_pre.
- We then estimate the treated-control gap: Primary log-DiD effect = exp(mean(ΔlogDAT)_migrants − mean(ΔlogDAT)_control) − 1.
Standard errors are clustered by machine_id, since one user may contribute multiple projects or buckets.
For interpretability, we also show bucket-level median DAT tables and simple percentage changes. These descriptive summaries are not the primary estimator.
5.5 Task-size control
Raw DAT is not appropriate for comparison: Kotlin cycles in mixed-language projects contain ~15% more edits (our hypothesis is that new functionality is written in Kotlin while legacy maintenance is done in Java).
The primary control method is bucketing: We compare only cycles of the same size. We also use DAT/edits as an auxiliary descriptive normalization in selected robustness checks.
Two control methods are used:
- DAT/edits: Cycle duration divided by the number of edits.
- Bucketing: Comparing cycles of the same size: S (1–5), M (6–15), L (16–40), XL (41+ edits).
5.6 Data
- Period: November 2023 – June 2025 (~20 months).
- Product: IntelliJ IDEA (product_code = ‘IU’).
- Volumes: ~28M control-group cycles, ~76K migrant cycles, ~2.5M unmigrated Kotlin cycles.
- Primary log-DiD panel: 978 migrant contexts and 400,425 control contexts with ≥3 cycles in both pre and post periods.
Conclusion
This study presents large-scale observational evidence that development cycles in Kotlin-oriented projects are shorter than comparable cycles in Java-oriented projects. The primary longitudinal log-DiD estimate, controlling for user, project, time trend, and task size, places the effect at roughly 15–20% for everyday tasks (S, M, and L buckets). For XL tasks, the point estimate is also negative but statistically imprecise.
The dominant pattern is not a sudden acceleration upon switching to Kotlin, but rather a difference in trajectory: Unmigrated Java projects tend to experience substantial cycle-time growth over the observation period (+9% to +17% in the primary model), while migrant projects show modest improvement or remain flat. The result is independently echoed by the unmigrated Kotlin vs. unmigrated Java comparison, where Kotlin projects degrade at roughly half the rate.
We want to be explicit about what this study does not establish. This is an observational study, not a randomized experiment, and we cannot make definitive causal claims. Teams that choose to migrate to Kotlin may differ from those that stay on Java in ways we cannot fully observe − though our validity checks (JDK-version segmentation, multiple control groups, stepwise confounder elimination) suggest these differences alone do not explain the gap. We encourage readers to examine the limitations in Section 4.4 and the open questions in Section 4.5 when forming their own assessment of the evidence.
Several directions for future work could strengthen or refine these findings: event-study analysis of pre-trends, propensity score matching for more precise control-group construction, and extension to Android Studio where Kotlin is the default language. We plan to pursue these in subsequent analyses.