
Many organizations rush to prototype with AI agents, experimenting with small tasks. But these experiments often stay stuck as disconnected demos, disjointed from real project flow, lacking the context, controls, and documentation needed to produce meaningful outcomes. The leap from prototype to production is a chasm, not a step. Crossing it doesnât require bravery, it requires an adoption playbook that turns risk into repeatable practice.
This is a subject we at frog recently explored in âAI Agents in Action: Foundations for Evaluation and Governanceâ, a joint white paper produced by the World Economic Forum in collaboration with Capgemini.
AI agents are being pitched as productivity accelerators. They can plan, decide, and act across systems without waiting for a human to supervise every step. With their ability to take action, use tools, and operate under uncertainty, AI agents are unlike any preceding innovation. They work at a pace and scale that breaks traditional oversight, so governance canât be an afterthought.
Thatâs why we advocate progressive governance: practical âspeedbumpsâ embedded in workflows that include clear guardrails, verification points, escalation rules, and human approvals at the moments that matter. The goal is not to slow agents down. Itâs to let them run safely inside organizational boundaries.
Business leaders want to lead AI agent adoption and gain a competitive advantage. But leaders cannot treat deployment like a race because agent failure modes can escalate fast: brand damage, financial loss, security exposure, and in some settings, physical impact. A new playbook is needed that is fundamentally different from traditional software rollout. It requires an operating model of progressive governance that is one part IT, one part risk, one part people leadership that continuously improves performance while keeping agents set with the minimum authority required to do the job.
Real transformation wonât come from clever agent demos. It comes from giving AI agents real context: an understanding of how work gets done, access to the right trusted data (and no more than necessary), and continuous feedback from human experts. Only then can agents move from tools to true collaborators embedded in the fabric of the organization.
Making that shift requires a new mindset around governance: progressive, shared, and operational. This isnât something a single team or function can âown.â Every role has a stake in how agents are designed, deployed, and held accountable. The real question isnât âcan we build agents?â Itâs âare we ready to run them?â
AI agents are not simply more capable chatbots. Using a combination of models, tools, memory, and systems access, AI agents can plan, decide, and act toward goals. But it is their ability to do all this with minimal supervision that makes them so unique.
In customer support, an AI agent can autonomously resolve cases endâtoâend. In marketing, an AI agent can turn data into hyperâpersonalized content. In hospitality, an AI agent can summarize staffing gaps and predict seasonal booking based on data. In scientific research, an AI agent can deconstruct a question, search multiple sources, and synthesize recommendations without stepâbyâstep guidance. And that autonomy is exactly what makes the stakes higher.

Figure 1: Software architecture of an AI agent
Source: AI Agents in Action: Foundations for Evaluation and Governance | World Economic Forum
This shift from âmodelâ to âagentâ matters because it changes the risk profile. Agents can misuse tools, drift from intended goals, or interact unpredictably with other agents. They operate in environments that are often dynamic, incomplete, and uncertain.
Thatâs the trade: agents move closer to real work, and real work is messy. You get speed and coverage, but also drift, edgeâcases, and compounding errors if oversight isnât designed in. And yet, adoption is accelerating. Many organizations plan to integrate AI agents within the next one to three years. Most efforts today are still pilots which is precisely why now is the moment to set the rules before the rules are set by accidents.
The most common failure mode is not technical; it is organizational. Many companies approach AI agents the way they approached previous automation: deploy quickly, restrict access, and assume governance can be solidified later. But agents are not static systems. They behave differently depending on context, autonomy, authority, and environment. Governance models designed for traditional software (e.g., focused on uptime and access control) are no longer sufficient.
A useful analogy is hiring. When a person joins an organization, they are not given full authority on day one. They are assigned a role, trained, observed, and gradually entrusted with more responsibility. Performance is evaluated continuously, not just at onboarding. AI agents deserve the same discipline: scope, training, evaluation, and progressive trust.
Before asking what an agent can do, leaders must decide what it should do. A practical starting point is to define the agent across a small set of dimensions:
These choices are not inherent to the technology â they are leadership and design decisions. Two organizations can deploy the same agent in radically different ways, with very different risk profiles.
Leaders who skip this step often discover too late that an agentâs authority or autonomy exceeds what the organization is prepared to oversee. If you canât explain an agentâs job and boundaries in one minute, itâs not ready for production.
AI agents cannot be evaluated the same way as static models. Traditional benchmarks measure accuracy on predefined tasks. Agents, by contrast, must be assessed as systems operating over time. Effective evaluation focuses on questions such as:
Here, metrics matter: task success rate, completion time, error types, toolâcall success, escalation frequency, and rollback rate. If you canât measure it, you canât manage it. Agents should be piloted in sandbox environments that mirror real workflows before being exposed to live systems.
Evaluation must continue beyond deployment. As data, tools, and workflows change, agent behavior changes tooâso monitoring and reâvalidation are nonânegotiable.
Risk does not disappear once an agent passes testing. Organizations need a living risk process that links:
A highly autonomous agent operating in a complex environment requires stronger safeguards than a deterministic agent performing a narrow task. This is not about avoiding risk altogether, but about making it explicit, measurable, and manageable.
Importantly, risk assessment must include organizational and ecosystem risks (e.g., data exposure, regulatory compliance, and interactions with other agents), not just technical failure modes. If agents can act across systems, risk can cascade across systems.
One of the most powerful insights for leaders is that governance should scale with impact. Lowârisk agents may require basic safeguards:
As autonomy and authority increase, governance must evolve to include stronger verification, more frequent review, adaptive permissions, and explicit accountability.
Human oversight remains central. In highâimpact settings, humanâinâtheâloop means agents propose actions but do not execute without approval. In more stable contexts, humanâonâtheâloop means agents act within boundaries while humans monitor and intervene when necessary.
The goal is to increase speed safely: so users trust it, leaders can defend it, and regulators donât become your operating model.
Much of what has been discussed concerns the implementation of individual agents for specific purposes: customer service, finance, research, etc. For many organizations, that is where exploration stops. But the real gains emerge when you move from the singular to the plural: multiâagent ecosystems that begin to connect work across the organization.
As agents begin to collaborate, transact, and negotiate across teams, vendors, and organizations, new possibilities open up alongside new considerations. These systems must navigate shared goals, dependencies, and handoffs in environments where no single team has full visibility or control.
Standardizing interaction doesnât guarantee safety, but it does create a foundation for scale. Without boundaries on authority, verification of intent, and humanâlevel governance, ecosystems can propagate errors as efficiently as they propagate value. With those elements in place, they can also unlock speed, resilience, and entirely new ways of working.
Treat agentâtoâagent connections as an organizational interface, not just a technical integration, by putting explicit constraints, monitoring, and accountable ownership in place. Leaders who thoughtfully approach classification, evaluation, and governance are far better positioned to harness the opportunity of agentic ecosystems while staying in control. In doing so, they create the conditions for complexity to become an advantage, not a liability.

Jason De Perro is a HumanâAI Collaboration Director at frog and a World Economic Forum Fellow with the AI Governance Alliance. He is a researcher and designer of AI agent systems, focusing on adoption, governance, and trust at scale. With over 15 years of experience leading teams at Apple, Capital One, and frog, Jason helps organizations design humanâcentered AI systems that integrate responsibly into realâworld workflows while preserving transparency, fairness, and human agency.
We respect your privacy
We use Cookies to improve your experience on our website. They help us to improve site performance, present you relevant advertising and enable you to share content in social media. You may accept all Cookies, or choose to manage them individually. You can change your settings at any time by clicking Cookie Settings available in the footer of every page. For more information related to the Cookies, please visit our Cookie Policy.