What Anthropic's Zero Trust for Agents eBook Gets Right (And What It Doesn't Say)

When a frontier AI company publishes a 36-page security framework for autonomous agent deployments, it is worth paying attention. Anthropic's “Zero Trust for AI Agents” eBook was released in May, and if you have been following this blog, a significant portion of its core philosophy will look incredibly familiar. Non-human short-lived credentials as the baseline for agent authentication, MCP as a critical attack surface that cannot self-secure, and the structural reason that prompt injection isn't patchable - we've covered all of these. So if sections one and two of the eBook feel like a recap, it's because you've been following along.

Rather than simply recapping the entire eBook, I want to pull out four concepts that are genuinely worth stealing or thinking harder about, and then let's talk about two areas where I think the conversation needs to go further.

The Cryptographic North Star

One of the most ambitious positions in the paper is the declaration that static API keys are no longer acceptable for any deployment. We know that agents should not operate using long-lived API keys or static IAM roles stored in environment variables, and plenty of work is being done with passing security tokens, ephemeral identities, and secure local identity. But Anthropic pushes for identities cryptographically tied to the underlying infrastructure layer as the "Foundational floor" for identity. The progression gets more interesting from there: certificate-based agent identity at the Enterprise tier, and hardware-bound credentials tied to a TPM or HSM at the Advanced tier, where credential material literally cannot be exfiltrated from a compromised host.

This is undoubtedly the right strategic direction. But we have to view it through the lens of enterprise reality. Most corporate security teams currently struggle to manage clean OAuth 2.0 On-Behalf-Of delegation tokens for standard, predictable microservices, never mind dynamic, non-deterministic agent workflows. More importantly, even the major cloud providers aren't there yet. Amazon Bedrock AgentCore, for example, introduces phenomenal infrastructure boundary isolation, but it doesn't natively provision certificate-based agent workloads or hardware-enforced cryptographic boundaries out of the box today. That is not a criticism of AWS; it is an indicator of how early-stage these advanced controls actually are. Think of this as your cryptographic North Star: it tells you which direction to move. Your immediate priority should be migrating away from master API keys toward scoped, short-lived tokens. The North Star doesn't need to be reachable this quarter to be worth orienting toward.

The Three-Tier Maturity Model - with a major caveat

The bulk of the paper is organized around a tier structure for agent security controls: using Foundation, Enterprise, and Advanced labels to define the security maturity. The framing is clean and the progression is logical; each tier builds on the one before it, and the paper is honest that the Advanced tier will likely become the Enterprise standard as tooling matures.

It's a genuinely useful implementation roadmap, and I'd recommend any security architect working on agentic deployments use it as a starting checklist. But there's a tension worth naming, one we looked at closely in our analysis of the A2AS framework and the limits of self-defending systems even with dual-model architectures. The paper leans heavily on controls that live inside the agent runtime: input validation, in-context security meta-instructions, and constitutional classifiers. The fundamental problem with asking a system to defend itself is the same one that makes segregation of duties a core principle of security: if the agent's reasoning loop is successfully compromised, the internal defenses are compromised alongside it. You cannot reliably ask a compromised runtime to report its own compromise. The tiered model addresses this partially through a combination of external sandboxing and network segmentation, but the tension between "controls embedded in the agent" and "controls independent of the agent" still exists.

The Advanced tier calls also for constitutional classifiers, AI-based systems that scan for manipulation attempts beyond strict technical attack patterns. This is what we've been building, but focused on social engineering attacks. Notably, the paper's own framing of Advanced input sanitization implicitly argues against doing all of this inside the model itself.

Also worth noting is the paper's output filtering section, which focuses on semantic analysis of what agents produce before it's delivered. This is exactly what we've been describing as outcome-based security - evaluating the consequence of an action, not just the syntax of the input that triggered it.

Least Agency - A Concept Worth Stealing

We're all familiar with Least Privilege. Anthropic introduces an adaptation specifically tailored for autonomous workflows: Least Agency (found in the OWASP Top 10 for Agentic Applications for 2026).

Traditional authorization models assign permissions to an identity based on its role. If an automated script needs to update customer records, it receives permanent update privileges to that database. "Least Agency" throws this approach out. It dictates that an agent's operational permissions must be bounded dynamically by the context of the specific task it has been asked to perform.

If an agent is summarizing an email thread, its active security context should restrict it to read-only access for those specific message IDs. This is the explicit implementation of Zero Trust with just-in-time authorization. Even if the agent has tools capable of deleting files or querying financial systems, those capabilities should be disabled for the duration of that execution window. When the summary is delivered, the task identity dissolves. No standing access, no cached credentials waiting to be harvested.

This turns security into a dynamic variable that scales with active intent, dramatically shrinking the blast radius of any compromised session. The combination with Just-in-Time access patterns makes this one of the more practically powerful ideas in the paper. But let's be honest, for most enterprise teams this is years away.

Regardless though, Least Agency is a concept you can build into agent design from day one, before you write a line of production code.

The "Impossible vs. Tedious" Design Test

This is probably the most immediately useful single idea in the paper, and the one I'd put on a whiteboard in every architecture review. The test is simple: when evaluating any security control, ask whether it makes the attack impossible or merely tedious. Security through friction is the new security through obscurity. Controls whose value comes from friction, such as rate limits, non-standard ports, extra pivot hops, SMS-based MFA, degrade sharply against an adversary that can run at machine speed with near-zero per-attempt cost. Tedious or time consuming is no longer a barrier; it's a delay overcome at machine speed.

The controls that survive the test share a pattern: hardware-bound credentials that can't be exfiltrated, expiring tokens that are worthless after minutes, cryptographic identity that makes forgery computationally impossible rather than just socially unlikely. This is the same logic behind our argument that proxies and DLP appliances are insufficient for agentic workloads - they add friction, not impossibility. It's a useful test precisely because it's fast: you can apply it in a five-minute architecture conversation without needing a threat model in front of you. If the honest answer is "this would slow an attacker down," that's useful context, but depending on your risk appetite, not a sufficient control.

What the Paper Doesn't Address

Anthropic’s Zero Trust paper provides a well thought out, forward-looking architectural blueprint for greenfield engineering projects. It describes an ideal world of clean cryptographic handshakes, multi-tiered container isolation, and dynamic runtime circuit breakers. However, if you are an enterprise security leader tasked with securing agent deployments this morning, you likely cannot afford to re-architect your entire infrastructure stack from scratch. This is where a more pragmatic approach bridges the gap.

The Patching Reality Gap

First, the paper's prescription assumes continuous patching, immutable infrastructure, and hardware-backed credentials as baseline capabilities. For the majority of enterprises, those aren't current capabilities, and you're lucky if they're even on your multi-year roadmap. We wrote recently about why "just patch better" isn't a strategy, and the same logic applies here. Attack Path Analysis is more immediately implementable for most organizations than immutable infrastructure. Instead of trying to build a perfect cryptographic bubble around a new agent, enterprises should focus on what their agents can already touch. Prioritizing security investments based on whether a vulnerability actually has a clear, exploitable path to a high-value target buys meaningful risk reduction today while the longer-term architectural engineering happens over the coming years.

The Accountability Gap

Second is accountability. The paper has a governance section, and it's good to see. But it treats security as a technical and governance problem internal to the deploying organization. It doesn't engage with the broader Shared Responsibility question: when an agent exceeds its authority and causes a breach, who actually owns the outcome? The platform vendor, the model provider, the deploying organization? This isn't an abstract legal question; the answer to this question determines where organizations should invest in controls they can't delegate. Cloud governance took years to develop a working model for that accountability split, and we're repeating the same early mistakes here. Until the industry resolves the accountability model, every tier in the framework is yours to implement alone. This is critical to factor into how ambitiously you scope your initial deployment.

Anthropic’s paper is well worth reading in full. Use its tier system as a maturity roadmap; build Least Agency into your tool design from day one; and run every proposed control through the "Impossible vs. Tedious" test. Just don't mistake an exceptional engineering blueprint for a complete solution to the organizational and semantic risks of the agentic era.

Maybe you think we missed the point here though, or have questions about how to get your roadmap aligned to this guidance. In either case, we'd love to hear from you at questions@generativesecurity.ai.

About the author

Michael Wasielewski is the founder and lead of Generative Security. With 20+ years of experience in networking, security, cloud, and enterprise architecture Michael brings a unique perspective to new technologies. Working on generative AI security for the past 3 years, Michael connects the dots between the organizational, the technical, and the business impacts of generative AI security. Michael looks forward to spending more time golfing, swimming in the ocean, and skydiving... someday.

June 2, 2026

< Back to Blog