Opinion

OpenAI lit the spark, but real agents require more than hype

"If we continue blurring definitions, ignoring challenges, and chasing hype, the term “agent” will lose its meaning before the technology even matures," writes Prof. Yoav Shoham, Co-CEO and Co-Founder of AI21.

Yoav Shoham | 09:47, 22.07.25

The launch of OpenAI’s Agent Mode this past weekend, along with Google’s announcement of “next-generation agent experiences” and AWS’s unveiling of AgentCore, are the latest indicators of the growing momentum behind the vision of intelligent agents. Digital assistants that reschedule meetings, perform integrations, and even "talk" to one another to complete tasks - it sounds like the future has arrived. But the excitement can be misleading. If we don’t pause to ask what lies behind the term “agent,” we risk mistaking basic automation for breakthrough innovation - and end up disappointed when the reality fails to live up to the promise.

Not Every Automated Process Is an “Agent”

Today, almost any tool that uses AI or strings together commands is labeled an “agent.” There’s no unified definition, which allows companies to market limited capabilities as if they were revolutionary. This phenomenon deserves a name: Agentwashing. To avoid the inevitable letdown, we must distinguish between simple scripts and agents with real autonomy. We need to assess how independent they truly are, what their limitations are, and what oversight and safeguards are in place.

The Real Challenge: Reliability

Most of today’s agents are built on large language models (LLMs) - models that can dazzle but also make mistakes. In some cases, like with Cursor (an AI-based programming assistant), they can even invent policies that don’t exist. Users reported that Cursor’s automated support claimed the software couldn’t be used on more than one device - despite no such policy ever existing. This wasn’t an infrastructure glitch, but a fundamental behavioral issue: the model simply made up the restriction. While that may sound minor, when an automated system makes decisions on behalf of an organization, it can mislead customers, erode trust, and even lead to subscription cancellations and significant business losses.

In enterprise environments, such errors can have costly consequences. We must stop treating LLMs as finished products and start building complete systems around them: systems that can handle uncertainty, track responses, manage costs, and incorporate safety mechanisms to ensure accuracy. These systems must guarantee that answers meet user needs, comply with company policy, protect privacy, and most importantly - can be trusted.

At AI21, for example, we’ve developed Maestro, an enterprise agent system that combines language models, internal and external data, and task-specific tools. The goal is not to build just another “assistant,” but to create a system that understands context, can say “I’m not sure,” and gives organizations precise control.

AWS has also recently entered the space with “AgentCore” - a technical infrastructure for building custom enterprise agents. It includes long-term memory, identity management, observability tools, and the ability to link language capabilities with real-world actions. This is a meaningful step in the right direction, but it also highlights how far we still are from building trustworthy, autonomous solutions.

Protocols Alone Are Not Enough

Google is attempting to lead the way with the Agent-to-Agent protocol (A2A), a common language for agent communication. The idea is simple: if agents can “speak” the same language, they can divide tasks, share insights, and build complex solutions in a modular way. But the challenge isn’t just communication, it’s comprehension. Without a shared vocabulary, a clear context, and incentive alignment, collaboration between agents can easily break down.

And that’s before we consider conflicting interests. For example, if my agent plans a trip and requests quotes from your agent, who happens to represent a travel agency biased toward a specific airline - I probably won’t get the best deal. Without contracts, incentive structures, or balancing mechanisms, it’s hard to imagine such collaboration working effectively in the real world.

The Potential Is Enormous - But Only If We Proceed Carefully

If we continue blurring definitions, ignoring challenges, and chasing hype, the term “agent” will lose its meaning before the technology even matures. Some tech leaders are already rolling their eyes when they hear it. That’s a warning sign. We don’t want excitement to obscure real obstacles, only to rediscover them later, when both users and developers are already disillusioned.

But these are not insurmountable problems. We can develop shared languages, improve protocols, and teach agents to negotiate and collaborate intelligently. The potential is real and indisputable, but only if we build wisely, define terms clearly, and maintain realistic expectations. If we do, agents won’t be a passing trend, but the foundational infrastructure of how we operate in the digital world.

Prof. Yoav Shoham is Co-CEO and Co-Founder of AI21, and Chair of the Scientific Advisory Board for Israel’s National AI Program.

OpenAI lit the spark, but real agents require more than hype

"If we continue blurring definitions, ignoring challenges, and chasing hype, the term “agent” will lose its meaning before the technology even matures," writes Prof. Yoav Shoham, Co-CEO and Co-Founder of AI21.

TAGS