The Current AI Agent Hype Misses the Point. Here’s What We’ve Learned from the Last Two Years of Deployment.

What two years of genuine multi-agent collaboration in live commercial operations has taught us — and why most of what you’re reading about AI agents this week misses the point.

This has been quite a week for AI agents.

Crypto.com’s CEO spent $70 million on the AI.com domain and launched it during the Super Bowl. OpenClaw, an open-source personal assistant, crossed 150,000 GitHub stars and spawned Moltbook — a social network where AI agents talk to each other. OpenAI’s Frontier launched as an “enterprise operating system for AI agents” with Fortune 500 customers. And Google’s Co-Scientist — a multi-agent system that has been generating validated biomedical hypotheses since early 2025 — continues to demonstrate what’s possible when agents collaborate on complex problems.

The excitement is justified. Something genuinely important is happening. But almost everything generating headlines this week shares one characteristic: none of it has a production track record. Impressive demos, impressive ambitions, zero evidence of sustained commercial value.

What connects these stories is a shared trajectory. The market is rapidly discovering that AI becomes dramatically more powerful when agents stop working alone and start working together.

  • OpenClaw's value comes from connecting an agent to your apps and data.

  • Moltbook is fascinating precisely because it shows agents interacting with each other — debating, challenging, building on each other's ideas.

  • OpenAI Frontier's pitch is agents sharing institutional context across an enterprise.

Each is a step on the same path: from AI as a solo tool to AI as a collaborative workforce. But there is a further step that none of them have taken — agents that genuinely deliberate as a team, challenge each other's reasoning, capture human expertise, and compound organisational intelligence over time.

That is what Agentic Teams are. And that is what we have been doing in production for over two years now.

Not in a lab. Not in a demo. In production, handling real decisions, working alongside real people, producing measurable commercial outcomes. What follows are three things that two years of genuine multi-agent collaboration have taught us that you won’t find in any launch announcement.

1. One brain arguing with itself is not a team

Almost every AI “multi-agent” system making headlines works the same way under the hood: one large language model is given multiple personas and asked to approach a problem from different angles. OpenClaw runs multiple instances of the same model. Even Google’s Co-Scientist, which has had nearly a year to mature, runs its Generation, Reflection, Ranking, and Evolution agents on Gemini 2.0 — different roles, same underlying model. OpenAI Frontier’s agents divide tasks in parallel and synthesise results through a single reasoning engine.

This is one brain arguing with itself. You can’t get genuine intellectual challenge from a single intelligence any more than you can play chess against yourself and expect a real contest. The perspectives may differ in content but never in cognitive approach. Same reasoning patterns, same training biases, same blind spots.

Our architecture is fundamentally different. Built on Artificial Life principles — research into how complex collective behaviours emerge from simple independent agents — our agentic teams use multiple independent, specialised models coordinating through swarm intelligence. Different training, different reasoning patterns, genuinely different perspectives. When our insurance claims specialist agent challenges our coverage analyst, that challenge comes from architecturally independent reasoning, not one model wearing two hats.

This isn’t a philosophical distinction. It produces measurably different outcomes.

Every experienced executive understands intuitively why a committee of genuine experts outperforms one brilliant person considering multiple angles. We’ve simply replicated this architecturally. And the results bear it out: our production deployment shows not just efficiency gains, but improved quality of decision-making — something single-brain systems structurally cannot deliver.

2. The results that matter

The metrics being celebrated this week are attention metrics: 150,000 GitHub stars, $70 million domain purchases, Super Bowl ads, Fortune 500 launch lists. These measure excitement, not value creation.

Here are the metrics from two years of production with one insurance client, for example:

  • 70% reduction in required human staff for the operations we transformed. Not through replacement — through redesigning how work flows between humans and AI.

  • 25% higher customer satisfaction compared to industry benchmarks. This is the number that surprises people. Efficiency gains are expected from automation. Quality improvements are not. They happen because genuine deliberation catches nuances that single-agent processing misses.

  • 11-point improvement in loss ratio. In insurance, this is extraordinary. Marginal improvements in claims outcomes are worth multiples of equivalent cost savings because they directly affect 75–80% of outbound cash flow.

But perhaps the most telling metric is the one we didn’t predict: human employees reported higher job satisfaction, and several requested they be “cloned”, turned into digital twins of themselves, so they can collaborate with AI agents more effectively. That’s worth sitting with for a moment. The people working alongside these agents don’t feel threatened. They feel augmented. They’re spending less time on administrative processing and more time on the judgment calls that drew them to the profession in the first place.

This is what two years of production teaches you that no demo can: the human response to genuinely collaborative AI is not resistance. It’s relief. (Case study here).

3. Something happens when agents work with humans as colleagues, not tools

This is the insight that matters most, and it’s the one that requires production experience to discover.

In every other framework — OpenAI, Microsoft, CrewAI, LangGraph — the human role is supervisor. AI produces output, human reviews and approves. The learning signal is binary: right or wrong, approved or rejected. This is quality control. It is not collaboration.

In our production deployment, humans participate in deliberations as colleagues. They sit in the same Slack channels as the AI agents. They contribute reasoning, challenge agent analysis, and adjust recommendations — and the system captures not just what they decided but how they reasoned. Why did the experienced handler override the coverage analysis? What contextual knowledge did she apply that the agents missed? What pattern did she recognise from a similar dispute three years ago?

This continuous capture of human expertise produces something called ‘Intelligence Capital’ — organisational intelligence that compounds over time. Every deliberation enriches the next one. A marine cargo dispute in month six draws on how the previous ten marine disputes were analysed. By month twelve, the system isn’t just faster; it’s making qualitatively better decisions than it was in month one. And the rate of improvement accelerates.

This is the fundamental difference between AI that depreciates — performing the same task slightly cheaper each year as the technology commoditises — and AI that appreciates, generating compounding organisational value. When we look at what’s being celebrated this week, we see impressive technology that will produce efficiency. What we don’t see is architecture designed to produce compounding intelligence.

What this means for enterprise leaders

The market is sending two contradictory signals right now. The hype signal says: AI agents are the future, move fast, invest big. The production signal says: most of what’s being launched this week will produce efficiency — useful, but not transformative.

The transformative question is not whether to deploy AI agents. That's already settled. The question is whether you stop at agents that execute tasks — however impressively — or build toward agents that think together, learn from your people, and make your organisation permanently smarter.

Two years of production have taught us that the answer depends on three architectural choices most organisations haven’t thought to ask about:

  1. Do your agents genuinely challenge each other, or is it one intelligence wearing multiple hats? (If you can’t ask this question and get a clear technical answer, you’re buying efficiency, not intelligence.)

  2. Does your system learn from how humans reason, or just whether they approve the output? (The difference determines whether human expertise transfers to the organisation or walks out the door with each retirement.)

  3. Who owns the intelligence your AI accumulates? (If the answer is your platform vendor, you’re building someone else’s asset, not your own.)

The agents are here. That debate is over. The question that matters now is what kind of intelligence they’re building, and for whom.

More on how to think about Agentic AI here.

Simon Torrance

Expert on business model transformation through Agentic AI

https://ai-risk.co
Previous
Previous

The AI Your Competitors Can’t Buy

Next
Next

Agentic AI Isn't a Tool. It's Your New Workforce.