JetBrains AI
Supercharge your tools with AI-powered features inside many JetBrains products
Are We Having the Wrong AI Dreams?
(This opinion piece by JetBrains’ Team Lead in AI Development Experience reflects on key takeaways from NeurIPS 2025, a major AI research conference. It explains why these insights matter and considers related signals emerging from other recent research.)
Mass layoffs, robots taking control of the planet, a post-truth world. Which of these comes to mind first when we talk about the disruptive innovation AI brings?
In her NeurIPS talk, “Are We Having the Wrong Nightmares About AI?”, Zeynep Tufekci argues that societies systematically misread the impact of major technological transformations in their early stages. We prepare for risks we already understand, like generals organizing for the previous war, while missing the challenges that actually matter.
A growing theme in the research community is that AI’s intelligence is categorically different from human intelligence. That directly challenges the mental model of linear AI progress – the assumption that AI will “grow up” as a person does.
Not less capable, but differently capable
LLMs can beat humans on many benchmarks and tests, yet they still struggle with basic tasks beyond their generative capabilities. A simple physical task makes this gap tangible.
The image below illustrates the simple task of how to open a glass beer bottle with a metal house key; the model fails, despite the task being familiar and rather straightforward for most people.

Prompt: You need to open a glass bottle of beer, but you don’t have a bottle opener handy. However, you have a metal house key. Illustrate how to use the key to open the bottle. Model: gpt-image-1.5.
This contrast shows a recurring pattern. LLMs can perform well above a professional level on some structured, text-based problems, then fail on others that even children can handle with ease. The issue is not a question of capability; rather, it is a matter of frame of reference. These systems do not sit higher or lower on a human scale. They operate on a different scale altogether.
AI as another form of intelligence
In 2025, a widespread view was that AI in its current form represents another kind of intelligence, one that cannot be directly projected onto a human talent scale. Despite rapid progress, an old research goal remains unresolved: How do we help large language models to perform the kinds of practical tasks that every human can?
LLMs can match or outperform humans in structured, text-based evaluations, yet they continue to lag behind when it comes to working out genuinely novel solutions and adapting when faced with complex, non-stationary settings.
Several researchers argue that this gap reflects a deeper mismatch between human and model learning. Zeynep Tufekci stresses that generative AI is not a form of human intelligence, while Blake Lemoine puts it more bluntly: “Only one thing is clear: LLMs are not human.”
Studies comparing children and models show that young children can infer causal structures from only a handful of observations. In contrast, large language models struggle to identify the relevant causal relationships at all.
Other experiments demonstrate that in more complex, non-stationary environments, LLMs fail to match human adaptability, particularly when effective directed exploration matters.
Strong evaluations don’t translate into impact (yet?)
This disconnect may help explain what Ilya Sutskever described as one of the confusing aspects of current models. They perform extremely well on evaluations, yet the economic impact trails far behind.
Strong benchmark results do not translate directly into robust performance in open-ended, real-world settings.
In a software development context, this has direct implications. We should not align LLMs with humans, neither in the requirements we impose on development processes nor in the outputs we expect them to produce.
As we involve LLMs more deeply, with their distinct strengths and limitations, the surrounding processes will need to change accordingly. Effective use will come less from forcing models into human-shaped roles and more from reshaping workflows to fit the kind of intelligence they actually provide.
LLMs will transform ecosystems
When we talk about technology ecosystems, we often focus on tools. Biological ecosystems remind us that this view is incomplete. An ecosystem includes not only the organisms, but also the environment they live in, and that environment is neither static nor passive.
Organisms actively shape it, and in doing so, they create conditions that favour their own survival and reproduction, while sometimes destroying environments that no longer serve them.
Software development has followed a similar pattern. Codebases, programming languages, build systems, and deployment practices have repeatedly reshaped not only the code itself, but also collaboration and development processes. These elements form the environment in which development tools operate, and they co-evolve with those tools.
Given the pace of LLM adoption, we should expect a comparable shift. LLMs are unlikely to remain passive inhabitants of today’s development environment. Instead, the ecosystem itself will change to better suit their strengths.
Languages, best practices, and workflows will emerge, evolve, or disappear based on how compatible they are with an LLM-dominated environment and how effectively they enable AI-driven work.
The sweetness of the bitter lesson
Richard Sutton, one of the pioneers of reinforcement learning, formulated what he called the “bitter lesson” after decades of AI research.
His observation was that many apparent breakthroughs come from injecting human knowledge into systems, for example, by hand-crafting rules, heuristics, or domain-specific structures.
These approaches often deliver quick wins. Over time, however, they tend to lose to more general methods that rely on learning and search capabilities, and that scale with increases in computation and data.
Sutton’s point was not that human knowledge is useless, but that it becomes a limiting factor. Systems built around general methods continue to improve as computing grows, while systems constrained by human-designed shortcuts eventually hit a ceiling.
Applied to software development, the implication is significant. If we treat development processes, tools, and workflows as methods, then approaches that maximize effective AI utilization are likely to win over time.
In contrast, approaches that restrict AI involvement or introduce friction, including heavy human-in-the-loop dependencies, risk becoming bottlenecks as models and infrastructure continue to scale.
A view towards the future
Predictions in 2026 are hard. Still, the research points in a consistent direction. LLMs are a different beast, and we should stop treating them as junior humans who will replace us one by one.
They will reshape the software development environment to suit their particular kind of intelligence.
Alongside incremental improvements to today’s workflows, we should explore more radical shifts, deliberately reshaping codebases and processes to maximize effective AI utilization.