Rewriting the Playbook for Agentic AI

An exploration of why traditional automation playbooks fall short for agentic AI, and what it takes to adopt, govern, and scale it responsibly.
Article

Many organizations rush to prototype with AI agents, experimenting with small tasks. But these experiments often stay stuck as disconnected demos, disjointed from real project flow, lacking the context, controls, and documentation needed to produce meaningful outcomes. The leap from prototype to production is a chasm, not a step. Crossing it doesn’t require bravery, it requires an adoption playbook that turns risk into repeatable practice.

This is a subject we at frog recently explored in ‘AI Agents in Action: Foundations for Evaluation and Governance’, a joint white paper produced by the World Economic Forum in collaboration with Capgemini.

AI agents are being pitched as productivity accelerators. They can plan, decide, and act across systems without waiting for a human to supervise every step. With their ability to take action, use tools, and operate under uncertainty, AI agents are unlike any preceding innovation. They work at a pace and scale that breaks traditional oversight, so governance can’t be an afterthought.

That’s why we advocate progressive governance: practical “speedbumps” embedded in workflows that include clear guardrails, verification points, escalation rules, and human approvals at the moments that matter. The goal is not to slow agents down. It’s to let them run safely inside organizational boundaries.

Business leaders want to lead AI agent adoption and gain a competitive advantage. But leaders cannot treat deployment like a race because agent failure modes can escalate fast: brand damage, financial loss, security exposure, and in some settings, physical impact. A new playbook is needed that is fundamentally different from traditional software rollout. It requires an operating model of progressive governance that is one part IT, one part risk, one part people leadership that continuously improves performance while keeping agents set with the minimum authority required to do the job.

Real transformation won’t come from clever agent demos. It comes from giving AI agents real context: an understanding of how work gets done, access to the right trusted data (and no more than necessary), and continuous feedback from human experts. Only then can agents move from tools to true collaborators embedded in the fabric of the organization.

Making that shift requires a new mindset around governance: progressive, shared, and operational. This isn’t something a single team or function can “own.” Every role has a stake in how agents are designed, deployed, and held accountable. The real question isn’t “can we build agents?” It’s “are we ready to run them?”

 

Why AI agents change the risk equation

AI agents are not simply more capable chatbots. Using a combination of models, tools, memory, and systems access, AI agents can plan, decide, and act toward goals. But it is their ability to do all this with minimal supervision that makes them so unique.

In customer support, an AI agent can autonomously resolve cases end‑to‑end. In marketing, an AI agent can turn data into hyper‑personalized content. In hospitality, an AI agent can summarize staffing gaps and predict seasonal booking based on data. In scientific research, an AI agent can deconstruct a question, search multiple sources, and synthesize recommendations without step‑by‑step guidance. And that autonomy is exactly what makes the stakes higher.


Figure 1: Software architecture of an AI agent
Source: AI Agents in Action: Foundations for Evaluation and Governance | World Economic Forum

This shift from “model” to “agent” matters because it changes the risk profile. Agents can misuse tools, drift from intended goals, or interact unpredictably with other agents. They operate in environments that are often dynamic, incomplete, and uncertain.
That’s the trade: agents move closer to real work, and real work is messy. You get speed and coverage, but also drift, edge‑cases, and compounding errors if oversight isn’t designed in. And yet, adoption is accelerating. Many organizations plan to integrate AI agents within the next one to three years. Most efforts today are still pilots which is precisely why now is the moment to set the rules before the rules are set by accidents.

 

The mistake leaders risk making

The most common failure mode is not technical; it is organizational. Many companies approach AI agents the way they approached previous automation: deploy quickly, restrict access, and assume governance can be solidified later. But agents are not static systems. They behave differently depending on context, autonomy, authority, and environment. Governance models designed for traditional software (e.g., focused on uptime and access control) are no longer sufficient.

A useful analogy is hiring. When a person joins an organization, they are not given full authority on day one. They are assigned a role, trained, observed, and gradually entrusted with more responsibility. Performance is evaluated continuously, not just at onboarding. AI agents deserve the same discipline: scope, training, evaluation, and progressive trust.

A practical adoption framework (what leaders can do now)
Successful integration of AI agents follows a simple but rigorous progression. While the underlying technology is complex, the leadership questions are not. The key foundations for AI agent evaluation and governance are (1) define the job, (2) evaluate in real work, (3) manage risk continuously, and (4) scale governance with impact. Let’s examine them in greater detail. 
1Set guardrails before you hit deploy

Before asking what an agent can do, leaders must decide what it should do. A practical starting point is to define the agent across a small set of dimensions:

  • Function: What task does it perform?
  • Role: Is it narrowly specialized or broadly general?
  • Autonomy: Can it act independently, or only on request?
  • Authority: What systems, data, or actions is it allowed to access?
  • Predictability: Does it behave deterministically or probabilistically?
  • Operational context: Is the environment simple and stable, or complex and dynamic?

These choices are not inherent to the technology — they are leadership and design decisions. Two organizations can deploy the same agent in radically different ways, with very different risk profiles.

Leaders who skip this step often discover too late that an agent’s authority or autonomy exceeds what the organization is prepared to oversee. If you can’t explain an agent’s job and boundaries in one minute, it’s not ready for production.

2Evaluate agents in workflows—not demos

AI agents cannot be evaluated the same way as static models. Traditional benchmarks measure accuracy on predefined tasks. Agents, by contrast, must be assessed as systems operating over time. Effective evaluation focuses on questions such as:

  • Reliability: Does the agent reliably complete tasks end-to-end?
  • Rationality: How does it behave when inputs are ambiguous or incomplete?
  • Adaptability|: Can it recover from errors or unexpected conditions?
  • Governance: How often does it escalate to humans, and why?
  • Trust: Do users trust its outputs enough to rely on them?

Here, metrics matter: task success rate, completion time, error types, tool‑call success, escalation frequency, and rollback rate. If you can’t measure it, you can’t manage it. Agents should be piloted in sandbox environments that mirror real workflows before being exposed to live systems.

Evaluation must continue beyond deployment. As data, tools, and workflows change, agent behavior changes too—so monitoring and re‑validation are non‑negotiable.

3Treat risk as continuous work (not a gate)

Risk does not disappear once an agent passes testing. Organizations need a living risk process that links:

  • The agent’s design choices (e.g., autonomy, authority, and context)
  • Evidence from evaluation and monitoring
  • Clear thresholds for acceptable behavior
  • Clear ownership: who is accountable when it fails

A highly autonomous agent operating in a complex environment requires stronger safeguards than a deterministic agent performing a narrow task. This is not about avoiding risk altogether, but about making it explicit, measurable, and manageable.

Importantly, risk assessment must include organizational and ecosystem risks (e.g., data exposure, regulatory compliance, and interactions with other agents), not just technical failure modes. If agents can act across systems, risk can cascade across systems.

4Scale governance progressively with impact

One of the most powerful insights for leaders is that governance should scale with impact. Low‑risk agents may require basic safeguards:

  • Least-privilege access: The AI agent is only permitted to perform its assigned task.
  • Audit logs: Tamper-proof records of what, when, and how the AI agent acted.
  • Human-override Mechanisms: The crucial ability for human operators to pause, stop, and roll back AI agents.

As autonomy and authority increase, governance must evolve to include stronger verification, more frequent review, adaptive permissions, and explicit accountability.

Human oversight remains central. In high‑impact settings, human‑in‑the‑loop means agents propose actions but do not execute without approval. In more stable contexts, human‑on‑the‑loop means agents act within boundaries while humans monitor and intervene when necessary.

The goal is to increase speed safely: so users trust it, leaders can defend it, and regulators don’t become your operating model.

From isolated agents to agentic ecosystems

Much of what has been discussed concerns the implementation of individual agents for specific purposes: customer service, finance, research, etc. For many organizations, that is where exploration stops. But the real gains emerge when you move from the singular to the plural: multi‑agent ecosystems that begin to connect work across the organization.

As agents begin to collaborate, transact, and negotiate across teams, vendors, and organizations, new possibilities open up alongside new considerations. These systems must navigate shared goals, dependencies, and handoffs in environments where no single team has full visibility or control.

Standardizing interaction doesn’t guarantee safety, but it does create a foundation for scale. Without boundaries on authority, verification of intent, and human‑level governance, ecosystems can propagate errors as efficiently as they propagate value. With those elements in place, they can also unlock speed, resilience, and entirely new ways of working.

Treat agent‑to‑agent connections as an organizational interface, not just a technical integration, by putting explicit constraints, monitoring, and accountable ownership in place. Leaders who thoughtfully approach classification, evaluation, and governance are far better positioned to harness the opportunity of agentic ecosystems while staying in control. In doing so, they create the conditions for complexity to become an advantage, not a liability.

Authors
Jason De Perro
Director Human-AI Collaboration frog, NA
Jason De Perro
Jason De Perro
Director Human-AI Collaboration frog, NA

Jason De Perro is a Human–AI Collaboration Director at frog and a World Economic Forum Fellow with the AI Governance Alliance. He is a researcher and designer of AI agent systems, focusing on adoption, governance, and trust at scale. With over 15 years of experience leading teams at Apple, Capital One, and frog, Jason helps organizations design human‑centered AI systems that integrate responsibly into real‑world workflows while preserving transparency, fairness, and human agency.

Cookies settings were saved successfully!