The practical application of Assume Breach and Agentic AI with Lee Newcombe

In last week’s blog, Agentic Architectures: Securing the Future of AI with Zero Trust, we discussed the critical role of Zero Trust in securing agentic AI architectures. As part of that, we introduced a few critical practices, including the principle of “Assume Breach”. This practice, or tenet, demands that every interaction within a component-based design, whether with users, data stores, or other agents, be treated as originating from a potentially compromised source, driving the need for layered defenses, limiting the blast radius when a breach occurs. In this next iteration of the series, we’re going to dive deeper into putting Assume Breach into practice in Agentic AI systems.

While the high level mantra “Assume Breach” gives us direction, alone it’s not really applicable. Luckily, Lee Newcombe and I had a few conversations on this very topic, and Lee put together some excellent, actionable lessons for robust AI security in his recent blog Musings on Agentic AI security. His musings elevate the “Assume Breach” mantra from a high-level concept to an embedded practice within AI architectures, providing a critical lens to enhance our security posture. While he doesn’t claim to have all the answers (there are a lot of open questions in his writings), I wanted to highlight and augment his key points. I highly recommend reading his blog in full at some point, though things will make more sense if you read it now.

1. Trust, what is it good for?

One of the first practical approaches Lee tackles is the introduction of trust weightings to agents and their interactions. The idea of moving away from binary trust is something I’ve worked on in the past, but here it adds a novel approach to dealing with agents whose responses could be subject to both security issues and hallucinations. This then leads down more systemic issues when chaining agents with varying degrees of trust. Do you take a low water mark approach in determining the trust at the end of the chain, or a more mathematical calculation like MTBF for putting a number to the trust value.

Overall, the concept of constantly evaluating the trustability of an agent in real time is one way to Assume Breach and build resilience into your decision systems.

2. Don’t Treat Agentic AI as a Black Box

Perhaps the most significant expansion Lee offers is around not treating agentic AI as a “black box.” Assume Breach posits that any interaction could be compromised. Lee compels us to look inside that interaction. If we assume a breach, we cannot afford to ignore the internal mechanisms, the decision-making processes, or the data transformations occurring within an agent. This transforms “Assume Breach” from an external boundary concept to an internal mandate for transparency and control, requiring granular visibility, auditing, and micro-segmentation within the AI’s operational logic. This way, if even if an agent’s input is compromised, its internal components and subsequent actions are monitored, validated, and contained. (I hear Databricks is building something like this) This deep dive ensures that compromise doesn’t silently propagate through opaque processes.

A novel piece of this is the idea that we must understand the “why” and “how” of an agent’s actions to effectively contain a breach. When the components of a breach are simply following commands, they don’t have intent. But when an agent is making decisions and creating its own actions, how can you tell when the outcome is because of an external attack or an internal “feature”. (Watch Silicon Valley’s final few episodes for an analogy here)

3. Manipulating the fly-wheel

Perhaps one of Lee’s most critical points for security is the control of reward structures that reinforce an agent’s learned behaviors. In an “Assume Breach” scenario, this becomes a prime target for adversaries. If an attacker can manipulate these structures, they can steer the agent towards malicious outcomes even if the agent itself hasn’t been directly compromised. There have been numerous articles, like AI system resorts to blackmail if told it will be removed, where threatening an LLM or giving it demerit points when it doesn’t answer has proven to bypass guardrails. Therefore, securing the integrity and provenance of these reward mechanisms is vital. This means implementing robust authentication and authorization for modifying reward functions, continuous auditing of their configurations, and anomaly detection for any unusual changes.

While the entire problem of prompt injection can be (over)simplified by the combining of the data and control planes for LLMs, it’s clear that the reward structure needs to be one of the first things decoupled from user interactions.

Many thanks to Lee for his musings around how to put the tenet of Assume Breach into practice with Agentic AI. It’s important we look deeply at how we can apply general security principles we’ve honed over the past decades into the new paradigm of generative AI while looking beyond the problem scope of traditional technologies. It’s no longer about protecting the outside or the inside of the M&M, but the very essence and behavior of our (artificially) intelligent systems.

If you want to dive deeper into Assume Breach, jump ahead to other conversations around Zero Trust and generative AI security (agentic or not), or connect with us about helping you achieve this transformation, please reach out to questions@generativesecurity.ai.

About the author

Michael Wasielewski is the founder and lead of Generative Security. With 20+ years of experience in networking, security, cloud, and enterprise architecture Michael brings a unique perspective to new technologies. Working on generative AI security for the past 2 years, Michael connects the dots between the organizational, the technical, and the business impacts of generative AI security. Michael looks forward to spending more time golfing, swimming in the ocean, and skydiving… someday.