JetBrains AI
Supercharge your tools with AI-powered features inside many JetBrains products
Small Models, Big Impact: Why JetBrains is Betting on Focal LLMs
At AI Summit London 2025, Kris Kang, Head of Product for AI at JetBrains, gave a talk that questioned a common belief in AI development: that bigger means better.
The industry has focused heavily on massive, general-purpose language models. These models offer impressive capabilities, but the cost of building and running them at scale is top of mind for many enterprise decision-markers.
In his talk “Small Models, Big Impact”, Kris introduced an alternative: focal models. These compact, domain-specific LLMs aim to deliver strong performance while reducing energy use and total cost of ownership. These models can be used as a workhorse complement to frontier models.”
Here’s why this matters and how JetBrains is acting on it.
The energy cost of chasing scale
AI models now operate at a scale that was hard to imagine a few years ago. Although there are no official numbers, observers estimate that GPT-3 has 175 billion parameters, while estimates place GPT-4 at 1.8 trillion and Grok-3 at 2.7 trillion. To accommodate the growing size of models, data centers with as much as 2 million GPUs are being built.
The energy required for this is significant. Processing 1 billion AI chats, with a frontier model, can use more than 124 GWh per year (at 0.34 Wh per day), or equivalent to powering 31k UK households per year. The environmental impact alone raises important concerns, and the financial cost can be just as significant.
For a mid-sized company, using a frontier model daily might cost around USD 15,000, which adds up to USD 45,000 per employee per year. That figure is six times higher than the average IT expenditure per employee in 2023. This assumes: $15 per 10 requests, and an enterprise of 1,000 makes 10,000 requests per day, then that’s $15,000 per day. This is a conservative estimate.
What are focal models?
Focal models aim to solve a single problem with scalability in mind. They are:
- Small, usually with fewer than 10 billion parameters
- Domain-specific, not general-purpose
- Post-trained to maximise cost efficiency and inference speed
- Built using techniques like quantisation, distillation, and Mixture of Experts techniques
Rather than trying to be good at everything (generating images, text-to-speech, coding, and more) they focus on high performance in a narrow field, such as code, legal, or medical tasks.
This specialisation allows enterprises to apply focal models to specific use cases that have the highest ROI per cost, where ROI might also include energy efficiency.
Mellum: A focal model purpose-built for code
JetBrains has taken this concept and is applying it directly to software development. Mellum, our 4-billion-parameter model, is the first in a family we intend to open source. It was built from the ground up to support code completion in real-world settings.
Mellum works across multiple programming languages and has been optimized to run quickly on modest hardware (e.g. it can run locally easily on a Macbook M4 Max). Developers can use it in isolated environments without relying on third-party providers and without giving up control over their data. You can already access Mellum through JetBrains IDEs via our AI Assistant or on Hugging Face if you prefer to run it independently.
Rather than aiming for general use, Mellum stays focused on one job: helping developers write better code more efficiently.
Why focal models matter for the future of AI
Focal models offer a practical answer to AI’s rising technical, financial, and environmental costs.
By narrowing the model’s purpose, teams can work with systems that cost less to operate and offer greater flexibility in how and where they’re deployed.
This also opens the door to deeper enterprise integration. Companies can fine-tune these models to fit their workflows or adapt them to meet compliance requirements without needing access to large-scale computeing clusters.
Focal models do not replace frontier models but offer a complement that better suits many real-world applications.
The JetBrains perspective: Smarter, not just bigger
JetBrains believes the next meaningful shift in AI will not come from chasing larger models but from refining how and where we apply them in order to help companies grow their businesses sustainably. We have invested in focal model development because our customers are asking for it, and we believe this approach delivers more balanced results.
Mellum is our first step in this direction. It shows what is possible when you focus on purpose instead of size.
Try Mellum today
You can try Mellum in several ways:
- On Hugging Face, for experimentation or offline use
- Inside JetBrains IDEs, through AI Assistant
- Soon, as a containerised deployment in NVIDIA AI Enterprise
Whether you prioritise sustainability, performance, or security, focal models like Mellum offer a practical way to adopt LLMs where they bring the most value.
The future of AI is not just about using frontier models as they hit the market, it’s about sustainably embedding AI to your business that leads to positive outcomes.
 
                                                                                                 
                 
                                                     
                 
                                                     
                 
                                                     
                