Ai logo

JetBrains AI

Supercharge your tools with AI-powered features inside many JetBrains products

AI Opinion

The Missing Link Between AI and Business Value

Let’s call a spade a spade. Some enterprises are already using AI agents, but very few can explain their impact on business performance.

Metrics such as DORA, SPACE, and developer experience indicators captured through third-party platforms offer insight into delivery velocity and developer quality of life, but it is still difficult to cleanly map this all the way to business impact. 

Unless you work directly on model development, model metrics themselves rarely determine whether AI is creating enterprise value.

The gap between technical performance signals and sustained business outcomes is an obstacle to scaling AI responsibly.

From technical metrics to business value

Abstract benchmarks such as SWE-Bench Pro and Tau2-bench are directionally useful in selecting AI tools, but can be orthogonal to how these tools perform in enterprise systems. An agent that performs well in a controlled environment can fail once integrated into production workflows. What matters is not benchmark scores, but the impactfulness, traceability, and resilience of AI systems under real-world conditions. 

Recent data underscores the urgent need to find an accurate way of measuring these variables. Though 88% of employees use AI at work today, just 5% use it “in a transformative way”, according to the EY 2025 Work Reimagined Survey.

Blindly adopting AI is unlikely to be fruitful. Enterprises should instead experiment with and evaluate AI through operational metrics on the systems they are accountable to build and operate. The focus should be on the lifetime cost of maintaining systems, the average time humans spend over baseline, and throughput as a function of Total Cost of Ownership (TCO).

Auditability matters for tracing decisions and meeting governance needs, while human readability ensures teams can understand and manage system behavior now, and later. These are table stakes for technical teams to have in place as they adopt AI at scale.

The ROI problem

Every enterprise wants to link AI to ROI, but the data rarely aligns. The problem is not limited to model telemetry. AI is embedded into enterprise systems and assigned responsibility for specific parts of the SDLC and operational workflows.

Evidence of its impact must therefore span system behaviour, human intervention, and downstream business KPIs. These signals live in different systems and move on different timescales, which creates a gap between AI activity and measurable business outcomes. This is why most organizations rely on proxies or assumptions rather than proof.

Closing the gap

The next generation of AI orchestration platforms will need to close this gap by correlating technical performance with operational and financial signals. When those systems mature, ROI will shift from being an abstract target to a measurable outcome grounded in data.

The impact of this gap is already visible in enterprise outcomes. The WRITER 2025 Enterprise AI Adoption Report found that organizations without a formal AI strategy report only 37% success when adopting AI, compared with 80% for those that tie performance to clear operational outcomes.

The data is unambiguous. Only when an organization measures technical and operational signals together does it finally gain a true picture of AI’s value.

Towards continuous benchmarking

What underlies enterprise AI is not static. Data drifts, workflows evolve, and compliance obligations expand. Measurement must therefore become a continuous feedback loop rather than an annual report. 

The same principle should apply across the enterprise: Performance metrics should remain stable, but they must either stay independent of changing conditions or explicitly measure those changes over time.

Measuring what matters

Meaningful AI performance measurement is not about bigger numbers or more dashboards. It is about connecting operational signals with business truth. 

Enterprise leaders must grapple with model performance alongside how intelligently it scales, how transparently it operates, and how clearly its impact can be proven.

Taking benchmark numbers at face value resembles trusting a car manufacturer’s fuel efficiency without ever driving the car to see if it holds up in real conditions. 

Only when these can be addressed with real data will AI become a truly accountable part of the enterprise stack.

The real question for leaders is simple: Are you measuring the numbers that prove AI is working in practice, or just parroting back the numbers on public benchmarks?

image description