JetBrains AI

Supercharge your tools with AI-powered features inside many JetBrains products

Explore More

News Releases

Mellum2 Goes Open Source: A Fast Model for AI Workflows

Anton Semenkin Nikita Pavlichenko

Read this post in other languages:

Trained from scratch and designed for practical deployment, Mellum2 is built for routing, Q&A, sub-agents, and private AI use in software engineering systems.

Today, we’re open-sourcing Mellum2, a 12B model engineered to solve the hardest parts of production AI: latency, throughput, and cost. Built from scratch and released under the Apache 2.0 license, Mellum2 offers a high-performance, cost-efficient alternative for your infrastructure.

Mellum began with code completion; now we’ve evolved it to handle both natural language and code. It is now a versatile tool ready to power routing, summarization, and intermediate reasoning steps across your modern AI workflows.

Whether you want to experiment, fine-tune, or deploy at scale, Mellum2 is ready to run in your own systems.

Try Mellum

Architecture and performance

Mellum2 is engineered to solve the bottlenecks of production-scale systems through its architecture and focused, efficiency-driven design.

Mixture-of-Experts (MoE) design: The model features 12B total parameters, but because it uses a MoE design, only 2.5B parameters are active per token. This reduces compute costs while enabling high-throughput, low-latency inference for real-time workloads.
Specialized focus: Unlike many modern models, Mellum2 is not multimodal. It is trained specifically on natural language and code data. This specialization ensures the model excels in software engineering environments while remaining lean and fast.

In our technical report, we detail our model’s performance across code generation, science, math, and reasoning benchmarks. Mellum2 is competitive with other similar-sized models while cutting inference time to less than half – a definitive advantage for production-grade deployments.

Key use cases for Mellum2

Route and orchestrate AI workloads: Use Mellum2 to analyze incoming prompts and help select the right model or tool for each task.
Build low-latency RAG pipelines: Retrieve relevant context, use Mellum2 to summarize it, and generate responses instantly.
Power fast sub-agents in complex workflows: Break down agent pipelines into steps like context gathering, planning, and validation. Use Mellum2 for fast, specialized tasks instead of relying on a single large model.
Enable private, local AI deployment: Run Mellum2 locally or self-host it to keep code and data fully under your control.

The “focal model” philosophy: Why focused models scale better

As AI systems become more complex, performance bottlenecks shift from raw capability to latency, throughput, and cost at scale. Not every task requires the largest model. Many steps in modern AI systems are repetitive, latency-sensitive, and high-frequency. These steps benefit from a fast and reliable model that can be efficiently routed, hosted, and controlled.

At JetBrains, we believe the future belongs to coordinated systems, not single models. Frontier models will continue to push the limits, but practical AI products also require focal models: fast, specialized components that handle high-frequency tasks efficiently.

That’s the role we see for Mellum2 in the next generation of AI software tooling.

Get started with Mellum2

If you’re building AI systems for software engineering – whether inside an IDE, in a RAG pipeline, as part of an agent workflow, or fully on your own infrastructure – we’d love for you to try Mellum2.

Open source is how better tools get made.

Try Mellum

How We Use AlphaEvolve to Make Complex IDE Algorithms Faster Agentic AI Governance: Designing for Accountability and Control

Discover more

Today, we’re launching JetBrains Context, a new repository intelligence layer that helps coding agents work more efficiently and produce higher-quality results on complex codebases. As part of the JetBrains AI for Teams and Organizations rollout, JetBrains Context is now available in early access at…

We evaluated coding agents across the JVM, .NET, and Python – and Codex was the winning agent.

Introducing recap and insights, two experimental AI features that proactively help you understand recent activity and non-obvious code in your project.

Cursor is now available as an AI agent inside JetBrains IDEs through the Agent Client Protocol.

JetBrains AI

Mellum2 Goes Open Source: A Fast Model for AI Workflows

Architecture and performance

Key use cases for Mellum2

The “focal model” philosophy: Why focused models scale better

Get started with Mellum2

Discover more

Introducing JetBrains Context: Repository Intelligence for Coding Agents

Introducing a Recommended Agent in AI Chat, With Codex as the Current Default

Experimental AI Features for JetBrains IDEs: Recap and Insights

Cursor Joined the ACP Registry and Is Now Live in Your JetBrains IDE

JetBrains AI

Mellum2 Goes Open Source: A Fast Model for AI Workflows

Architecture and performance

Key use cases for Mellum2

The “focal model” philosophy: Why focused models scale better

Get started with Mellum2

Subscribe to JetBrains AI Blog updates

Discover more

Introducing JetBrains Context: Repository Intelligence for Coding Agents

Introducing a Recommended Agent in AI Chat, With Codex as the Current Default

Experimental AI Features for JetBrains IDEs: Recap and Insights

Cursor Joined the ACP Registry and Is Now Live in Your JetBrains IDE