Trust, or the lack thereof, in the age of agentic tools

Last week we talked a bit about how modern agentic architectures can impact the security dynamic in our blog Accelerating generative AI security with lessons from Containers. A future where a core agent can call and utilize agent-powered “tools” is already researched and deployable using the Model Context Protocol. But as we build this future, it should be fairly obvious that no single organization will develop the complete set of tools its agents will require, and much like every other software model, we will leverage a combination of in-house, open-source, and purchased 3rd party tools and services. We find ourselves echoing past challenges once again, asking “How do we ensure our data remains secure and our processes reliable when interacting with a universe of externally written or hosted tools.”

Well, the news here is hit and miss. In some ways, this question is being approached by the standards for interoperability, namely Anthropic’s Model Context Protocol (MCP) and Google’s Agent2Agent (A2A) protocol. MCP is working to standardize how models interact with the tools and data sources they need to function, while A2A is designed to create a common language for agents to communicate, negotiate, and collaborate. These standards will help us build adequate tests and validations for these types of interactions. However, neither is doing a very good job of security at the moment. For example, GitHub had an MCP server exploited, and while some would argue it’s not MCP’s fault, it’s hard not to see that the technology immaturity is more than 50% to blame. Additionally, a threat model exercise against A2A demonstrated a number of interesting attack vectors that I would argue most organizations won’t solve on their first try.

This doesn’t mean we have to give up though. Let’s bifurcate the problem space into two for a moment: internally hosted tools (agents) and externally hosted tools (agents). We don’t really care if they’re home grown, open source, or procured from a 3rd party, because we want to treat them all the same both for security and simplicity’s sake. For internally hosted tools, we have an opportunity to look not just at the general input prompt and output prompt responses, but we can also pay attention to external network calls and communication. Unsurprisingly, Prompt security validation isn’t necessarily a “solved” problem. Getting concrete and actionable guidance on things like SQL Injection took forever, so it makes sense we’re still learning. But guidance from Anthropic, OWASP, and others do offer good starts. Less studied is how to validate prompt responses. In an agentic model where agents are exchanging information to trigger further action, it is equally important to consider the response from one agent as an input to the system and validate its appropriateness and security. (Hint: we’ll return to this concept in a minute)

On top of evaluating the prompts themselves, how can we ensure the black boxes of externally developed agents aren’t behaving in ways we don’t like. We can use existing networking approaches to validate that our data is being kept local to the model and not shared with 3rd parties (not at 100%, but something at least). For example, packet captures of outbound calls at the time of tool invocation and execution to see what types of communications are occurring could help us understand if the agent is checking for updates or sending the entire payload to a remote server. Mapping network communications that agents create can help build a better understanding of the risks associated with that agent, even if all the details aren’t visible. You can then block, monitor, or otherwise restrict traffic to balance an acceptable risk level. For example, if you see large communications traffic on instantiation, it could be downloading updates and sending relevant information back, or it could be sending logs of all your traffic back home. So you could block all traffic outbound during start up and manually update your tool agent periodically.

For externally hosted tools, it’s even more important to trust the provider, not just from a security standpoint but also from a resiliency perspective. If they go down, or are inaccessible, will they bring down your entire application or do you have fail-open or timeouts implemented. With security, the same prompt validation practices apply, looking at both inputs and responses to and from the tool. It’s especially important when communication to the tool is encrypted (as it should be) to be able to evaluate the responses before it’s being acted upon by the receiving agents. Otherwise, you are susceptible to successful attacks similar to software supply chain and provider attacks. But here you don’t have access to the network to validate the communication. Therefore, you must use non-technical methods like contractual obligations and penalties to mandate your data not be shared or used outside your context.

Some of this should sound familiar though. Not trusting the other end of your connection, validating input and output from partner systems, and actively protecting against changes in security posture are all foundational principles of Zero Trust. In an agentic world, Zero Trust starts with some basic efforts: explicitly verify all partner tools in real time, verify all prompts and responses in real time, and know how to respond when a tool goes “bad” without stopping your ability to operate. This means every agent, tool, prompt, and response should be authenticated, authorized, and evaluated for appropriateness regardless of where it originates. Imagine that a tool you’ve been using starts sending malformed or inappropriate responses. There’s fair reason to think agentic architectures will be responsible for a new Log4Shell. Last, when something does go wrong, assume breach of those discreet components and have a resilient system that can survive. To be clear though, what zero trust for AI doesn’t mean though is a firewall between each and every agent. I’m sure some vendors will be selling that soon enough, but that’s not the right path.

We are looking at an agentic future, there’s no doubt. But we need to be aware of the challenges tool dependency and interoperability create, and learn from past examples. By considering the implications of this new, tool-based agentic framework while embracing Zero Trust principles, we can ensure that the collaborative possibilities can be achieved safely. Keep following the blog if you want to know more about this intersection, as I hope to write more on the topic soon. As always, if you want to discuss this more or connect with us about helping you achieve this transformation, please reach out to questions@generativesecurity.ai.

About the author

Michael Wasielewski is the founder and lead of Generative Security. With 20+ years of experience in networking, security, cloud, and enterprise architecture Michael brings a unique perspective to new technologies. Working on generative AI security for the past 2 years, Michael connects the dots between the organizational, the technical, and the business impacts of generative AI security. Michael looks forward to spending more time golfing, swimming in the ocean, and skydiving… someday.