JetBrains AI
Supercharge your tools with AI-powered features inside many JetBrains products
How We Use AlphaEvolve to Make Complex IDE Algorithms Faster
AlphaEvolve is a Google DeepMind algorithm-discovery system that uses Gemini to generate, test, and refine possible algorithm improvements. Its job is not to answer questions; it searches for faster ways to solve complex algorithmic problems. We tried it on a narrow but important part of IntelliJ-based IDEs: indexing, the background work that makes navigation, search, completion, refactorings, inspections, and other code insight available after a project opens.
That makes indexing speed a simple metric to say out loud and a hard metric to improve. It depends on the language, the framework, the shape of the project, background IDE work, and the storage layer underneath the indexes. Small changes can disappear in noise. Some wins are real in a microbenchmark and invisible in a full IDE run.
We already invest a lot of engineering time here, and that manual performance work continues. The experiment described in this post was not a replacement for engineering judgement, profiling, code review, or product validation. It was a test of an additional search method: could Google DeepMind’s AlphaEvolve help us find useful optimization candidates in code that had already been worked on for years?
Google DeepMind describes AlphaEvolve in its AlphaEvolve preview blog as a Gemini-powered coding agent for designing algorithms by combining LLM-generated code with automated evaluators. For this experiment, that evaluator was our performance and correctness setup.
The target: a B-tree in the indexing stack
We chose the B-tree at the foundation of our index implementation. The starting point was not a naive prototype. It was a deeply optimized piece of infrastructure where manual exploration had become expensive. Even a plausible change takes time to write, review, and validate, and a wrong change can be fast for the wrong reason.
The engineering description was deliberately plain: the original algorithm was essentially a classic B-tree, and the proposed candidates were mostly improved B-tree variants with optimizations around edge cases. That is the kind of problem AlphaEvolve is well suited for. There is code to change. There is a clear score. There are tests that reject broken ideas.
The loop: generate, score, validate
We gave AlphaEvolve an internal performance test suite for the storage layer. The suite is synthetic. It does not use real customer projects. It writes and reads synthetic data so that candidate changes can be tested quickly and repeatedly.
The score was based on the sum of median results across our mid-sized benchmarks. Unit tests acted as the correctness check. With that setup, most AlphaEvolve sessions with more than 50 iterations produced a 15-20% improvement in the synthetic performance score.
That was encouraging, but it was not enough. Synthetic benchmarks are useful because they are controlled. Users do not run controlled benchmarks. They run full IDEs, with background processes, language services, and project-specific behavior running at the same time. So we took the best generated candidates into integration tests.
For the full IDE step, the team used Kotlin Spring Petclinic and modified IntelliJ IDEA 2026.2 nightly builds. The reported baseline for total end-to-end indexing time was 17.4 ± 0.5 seconds. Out of five generated candidates, two showed statistically significant improvements, with reproducible results below 16.8 seconds.
Claim boundaries
Most 50+ iteration sessions improved the synthetic performance score by 15-20%. This is the strongest claim about the autonomous optimization loop because the benchmark was the optimization target.
What changed in the numbers
Our end-to-end run table contains two measured candidates. Solution 1 produced a mean result of 16.6 seconds, reported as ±0.2 seconds. Against the 17.4-second baseline, that is about 0.8 seconds faster, or roughly a 4.6% reduction in this integration scenario.
Solution 2 is useful for the story too, although not because it won the full IDE test. It measured at 17.5 ± 0.4 seconds, which is effectively baseline in this scenario. Both candidates improved the fast synthetic benchmark, but only one of these two showed a user-visible end-to-end improvement in the integration measurements.
That distinction matters. A performance workflow that only celebrates synthetic wins will eventually ship misleading claims. A workflow that pairs autonomous search with full IDE validation has a better chance of finding changes users can feel.
AlphaEvolve can change how we approach complex performance work. It turns optimizations that were once too time-consuming to explore into candidates we can test routinely. Engineers still own the benchmark, review, and release decision. The search space is what gets smaller.
Dmitrii Batkovich, Director of Engineering for IntelliJ Platform
What we measure next
The next step is product validation. The team plans to check whether improvements show up in the Mega Index metric, an internal KPI used to track indexing performance and user experience, especially whether users are more satisfied with the indexing process. That is the right bar. A faster internal benchmark is useful. A faster full IDE test is better. A better user experience is the result that matters.
For us, the important lesson is not that AlphaEvolve magically made indexing fast. It did something more practical. It helped generate and rank low-level optimization ideas in a space where manual exploration is slow. JetBrains engineers supplied the problem, the tests, the measurement discipline, and the judgement. AlphaEvolve expanded the search.
Acknowledgements
This project was a collaboration between the JetBrains team, including Denis Shiryaev and Dmitrii Batkovich, and the AI for Science and account teams at Google Cloud, including Anant Nawalgaria, Skander Hannachi, Kartik San, Laurynas Tamulevičius, Nicolas Stroppa, and Artemiy Yashin.