.NET Tools
Essential productivity kit for .NET and game developers
Your AI Agent Keeps Missing The Real Bottleneck. JetBrains Rider Can Fix It Now.
Here’s a case worth pondering: your app freezes for ten seconds, and you ask an AI agent what’s wrong. What does it actually do? For a long time the honest answer was: it rummages through your code and takes a wild guess.
A snapshot taken by a profiler tool is runtime evidence. It knows exactly where the CPU went. But an agent with no access to profiling can’t read it. So it does the only thing it can: scans the project, finds some plausible-looking inefficiencies, and confidently presents them as the bottleneck. Sometimes it gets lucky. On a real freeze, it usually doesn’t.
We’ve been building something to fix that in Rider: a dotTrace-backed profiling skill for the agents inside AI Assistant, called dottrace-analyze. The idea is very straightforward. You hand the agent a .dtp snapshot you already captured with dotTrace – using the standalone profiler, the command-line tool, or dotTrace inside Rider – and instead of wandering through your source code, it reads the profile first. It finds where the time actually went, follows the hot path back into your code, and explains what’s slow and why, with recommendations for what to look at next.
We ran the evals. To keep the scoring from becoming a personal judgment call, each answer was evaluated against a reference root cause with a fixed LLM-as-judge rubric: did the agent identify the primary hotspot, explain the mechanism, avoid misleading detours, and propose a fix that followed from the evidence? The results were even greater than expected, so why not start with the most dramatic ones:
Case study: a UI freeze the agent couldn’t find without dotTrace
One of our test scenarios was an earlier version of Avalonia that used to hang when being shut down. This issue even creeped into Rider itself 2 years ago, which illustrates how easily performance degradations from popular open source projects can pemiate your applications.
To clarify our methodology: we intentionally tested a version of Avalonia from before the fix in AvaloniaUI/Avalonia#16633. It was crucial for us to use a known, since-resolved bug because it gave us a clean reference answer for the eval.
We ran the same agent against the issue ten times with the skill and ten times without, and had an LLM judge score each diagnosis against the known root cause on a scale of 0 to 10.
Without the skill, the agent averaged 1.6 out of 10. It went looking through rendering code, listed some general suggestions, and never landed on the real problem.
With the skill, it scored 10 out of 10 on every single one of the ten runs.
Ten runs with the same agent, judged against the known root cause.
Span<T> before the rule loops.This is the difference between “the code looks suspicious over here” and “the snapshot says the freeze is here.”
The results weren’t “somewhat better”: the agent went from reliably lost to reliably correct, with no variance.
And what it found is the kind of thing that’s genuinely hard to spot by reading code (even if you have a couple of hours to spare). The freeze wasn’t in any one obvious place. It came from a single character-by-character operation deep in the text layout path, cheap on its own but run so many times that it swallowed most of the CPU. That cost only shows up when you can see where the time actually went. The agent followed the snapshot straight to it, explained why it was so expensive, and pointed at the change that would fix it.
The full benchmark: eight scenarios, 80 runs
That Avalonia scenrio was only one slice of our evaluation. After checking that case across repeated runs, we widened the batch to eight .NET performance-investigation scenarios from different projects and compared the same agent with and without profiler access. Here’s how the skill changed the average accuracy score on each scenario, out of 10:
Across 80 runs, the skill improves both average quality and consistency.
| Scenario | Without skill | With skill | Change |
|---|---|---|---|
| avalonia-long-line | 1.6 | 10.0 | +8.4 |
| avalonia-styles | 1.6 | 9.9 | +8.3 |
| cyclops | 2.4 | 8.8 | +6.4 |
| eShopOnWeb | 1.0 | 5.2 | +4.2 |
| stock_nemo | 3.0 | 4.0 | +1.0 |
| game-of-life | 10.0 | 10.0 | 0.0 |
| checkers-copy | 9.3 | 9.1 | -0.2 |
| checkers-update | 8.8 | 8.2 | -0.6 |
The biggest wins appear where the answer genuinely lives in the runtime evidence. The flat or slightly negative cases are useful too: they show where the product should avoid invoking a heavier profiler workflow.
Across all 80 runs, the average accuracy score went from 4.71 to 8.15. The number of runs scoring 8 or higher roughly doubled, from 29 to 59. Runs that nailed the root cause exactly (a perfect 10) more than doubled, from 20 to 48.
Two things in that table we want to be honest about.
The first is where the skill earns its keep: the scenarios where the baseline was hopeless. Anywhere the answer genuinely lived in the runtime (the Avalonia freezes, the Cyclops workload), the baseline scored in the 1 to 2 range and the skill pulled it up to 9 or 10. The agent stopped spreading its attention across general optimization ideas and stayed anchored to the thing that was actually slow.
The second is the bottom of the table, and we’re leaving it in on purpose. game-of-life, checkers-copy, and checkers-update were already handled well without any profiler tooling, and on the two checkers cases the skill nudged the score down by a few tenths. The lesson isn’t that the skill hurts the results. It’s that some tasks don’t need it. Sometimes, invoking a full profiler workflow when a quick look at the code would do is just wasted tokens.
What this actually costs
We tracked cost as carefully as accuracy, because a skill that produces better answers at an unreasonable price isn’t one we’d ship. The straightforward part first: reading a snapshot is real work. The agent loads the profiler data, walks the call trees, and connects the evidence back to your source before it starts reasoning. In the 80-run batch above, that showed up in the bill. Cost went from about USD 1.91 per run without the skill to about USD 2.61 with it. Total batch cost was about USD 153 without the skill and USD 209 with it. Given how much the diagnoses improved, we think that’s a good trade, but it is a real increase, and we’d rather you hear it from us.
There’s a second effect, though, and it runs the other way. In an additional Avalonia test case, an app that was slow to shut down, the agent without the skill never found the real cause. Across ten runs it kept building the same plausible but wrong theory, searching broadly and reading file after file along the way, and scored 0 out of 10 every time. With the skill, it measured first, followed the profiler straight to the responsible code path, and scored 10 out of 10. Skipping all that wandering also made the runs cheaper and faster: USD 2.58 per run instead of USD 3.74, and 206 seconds instead of 373.
So the fair summary is that the skill changes where the money goes. It spends more on reading evidence and less on exploring dead ends. Sometimes that nets out more expensive, sometimes cheaper, but in both cases you’re paying for an answer grounded in what your application actually did, and that’s the part we think is worth it.
Profiler analysis has a cost, but it can also reduce broad, unproductive code search.
Snapshot analysis increased run cost, while accuracy improved substantially.
Profiler evidence reduced wandering and sent the agent directly to the responsible code path.
The skill is not cheaper by default. It changes where the work happens: more evidence reading, less dead-end exploration.
What you’d see in Rider
The workflow is intentionally simple: capture a .dtp snapshot with dotTrace, whether from inside Rider, from dotTrace Standalone, or with the dotTrace command-line tool. Then ask your agent of choice in the AI Assistant tool window to investigate that snapshot by referencing its directory in the prompt.
Under the hood it loads the dottrace-analyze skill and uses the dotTrace SDK to read the profile; what comes back is a focused report:
- a short summary of the dominant bottlenecks;
- the methods, source locations, and call paths that own the runtime;
- the root cause in plain developer language, not just a method name;
- recommendations for what to look at next.
For deeper investigations it can render that into a concise HTML report you can open in a tab and share with the team. Performance work is often collaborative, and a clean artifact is very handy when it’s more than one developer working on resolving the issue.
How the agent’s search trajectories changed
The strongest signal from that second eval is the trajectory: the agent’s actual path through tools and files. The no-skill runs did not fail because they were lazy or incoherent. They searched broadly, found a real-looking timer issue, and built a confident explanation around the wrong subsystem. The skill-backed runs started with measurement, so the search space collapsed around the hot path before source reading began.
Same shutdown task, same model, same codebase. The difference is the first useful piece of evidence.
- Explore shutdown pathsFans out across dispose, shutdown, dispatcher, and timer code.
- Read rendering and dispatcher filesMoves through MediaContext, render timers, application lifetime, and dispatcher loops.
- Search for blocking patternsLooks for waits, joins, sleeps, and render-thread patterns.
- Anchors on Win32DispatcherImpl.csSpots
Now - dueTimeinUpdateTimerand treats it as the decisive clue. - Corroborates the timer theoryReads dispatcher interfaces and queues to support a plausible but wrong diagnosis.
Wrong targetAll 10 no-skill runs converged on the wrong area. The answer sounded grounded, but it never reached the teardown hotspot.
- Invoke
dottrace-analyzeEnters the structured snapshot workflow. - Read snapshot and timelineFinds a 7.3 s capture, one core pinned for about 5.5 s, and negligible GC.
- Inspect running call treeThe largest own-time leaf is
List<T>.Remove, reached viaClasses.RemoveListener. - Jump to implicated sourceGreps
RemoveListener, readsClasses.cs, then locatesSafeEnumerableList.cs. - Confirm the root causeDrills down to recursive detach and identifies O(n^2) listener removal.
Right targetAll 10 skill-backed runs named the exact root cause. Source reading confirmed measured evidence instead of inventing the suspect list.
In this shutdown eval, the skill-backed arm also finished faster and cheaper on average: 206 s and USD 2.58 per run, versus 373 s and USD 3.74 without the skill.
Evidence beats guessing
You can summarize the whole result in one contrast. Without the snapshot, the best an agent can offer is “here are some code smells that might be slow.” With it, the agent can say “this snapshot says 88% of your time went here, and here’s why.” That second sentence is the entire point.
A side-by-side view of how a code-only agent and a profiler-backed agent decide where to look first.
Performance work has a harsh requirement that most coding tasks don’t: the answer has to match what the program actually did, not what it looks like it might do. The strong runs in our evals all had the same shape: they didn’t just name a method, they explained the call path, quantified the hot region, separated the root cause from its symptoms, and proposed a fix that followed directly from the profile. That’s the shape of an expert’s investigation, and it’s only possible when the agent starts from the same evidence an expert would.
dotTrace already knows what happened at runtime. AI Assistant can already turn evidence into an explanation and a next step. The skill is the bridge between the two, and on the cases that matter most, it’s the difference between an agent that guesses and one that knows.
Available now in Rider 2026.2 EAP 8
A note on licensing: During the Rider 2026.2 Early Access Program, you can try this workflow in the EAP build for free. For regular product licensing, the profiling part of this feature relies on dotTrace. The dotTrace and dotMemory plugin in Rider is available with dotUltimate or All Products Pack subscriptions; a Rider-only subscription does not include dotTrace profiling.
This is still an early, experimental implementation for the Early Access Program, and there’s still work for us to do. Which is why your feedback is extremely important. Let us know how useful you find the reporting or where it might not be entirely reliable. Download the latest public build to try it out.
