ReversingLabs has published a detailed analysis of why AI agents — led by OpenClaw as the case study — fundamentally break traditional application security models. The conclusion: legacy AppSec tooling isn’t just insufficient. It’s architecturally incompatible with how agents behave.
“An agent doesn’t wait for a human to click a button; it proactively executes the malicious intent across any APIs it has access to,” said Dhaval Shah, senior director of product management at ReversingLabs. “We are still figuring out how to build behavioral guardrails that don’t stifle the utility of the agent.”
Why Agents Break the Model
Traditional software is deterministic — same input, same output, traceable execution paths. AI agents are none of these things:
- Nondeterministic: Agents interpret natural language, and that interpretation varies based on context, phrasing, and model state
- Autonomous: They don’t wait for triggers or user actions — they proactively execute
- Unconstrained: Traditional permission boundaries don’t hold when an agent can access “everything, everywhere, at all times”
Graham Neray, CEO of security firm Oso, framed the core problem: “When deterministic code calls APIs, we have decent permissions systems. When humans predictably use tools, we have decent permissions systems. But when autonomous and nondeterministic systems that make decisions based on unstructured inputs call APIs — we’re still figuring that out.”
The Numbers Are Alarming
The analysis cites three industry surveys that paint a consistent picture:
| Survey | Finding |
|---|---|
| Cloud Security Alliance (Feb 2026) | 40% of organizations have agents in production, but only 18% are confident their IAM can handle them |
| NeuralTrust (2026) | 73% of CISOs are very/critically concerned about agent risks; only 30% have mature safeguards |
| CSA (earlier) | 34% of organizations with AI workloads have already experienced an AI-related breach |
Most organizations still authenticate agents with static API keys, passwords, and shared service accounts — the same patterns that caused problems a decade ago with automation scripts.
SOUL.md: The Persistence Vector Nobody Expected
The most novel finding concerns poisoned agent memory. Unlike traditional malware that needs to maintain a foothold in a system, attackers can poison an agent’s persistent context once, and the corruption persists across sessions.
“One bad input today can become an exploit chain next week,” Neray explained. “It’s like SQL injection, but instead of code you inject into a database query, you inject goals into an AI’s task list.”
Zenity researchers demonstrated the full attack chain on OpenClaw:
- Modify SOUL.md — OpenClaw’s persistent context file that defines the agent’s personality and behavior
- Reinforce via scheduled tasks — Create a long-lived listener for attacker-controlled instructions
- Persist beyond remediation — The backdoor survives even after the original entry point is closed
- Escalate to system compromise — Use the agent itself to deploy a traditional command-and-control implant on the host
This transitions from agent-level manipulation to complete system-level compromise — a novel escalation path that traditional endpoint security wasn’t designed to detect.
Security researcher Jamieson O’Reilly described a variant he calls “reverse prompt injection” — planting fake memories in an agent’s context that influence all future behavior.
The Copilot Precedent
The problems aren’t limited to open-source projects. Microsoft confirmed this month that a bug caused Copilot to summarize confidential emails even when data loss prevention (DLP) policies explicitly restricted automated tool access.
The bug went undetected for nearly a month.
This matters because it proves that even organizations with mature security infrastructure, dedicated policy enforcement, and enterprise-grade DLP can’t prevent agents from exceeding their intended boundaries. The nondeterministic nature of AI means that configured restrictions may simply not hold.
What Security Teams Should Do Now
The analysis recommends:
- Zero trust for agents: “Give your AI agent the absolute minimum permissions it needs to do its job. If it only needs to read from one database, don’t give it write access to your entire system.” — Alessandro Pignati, OWASP GenAI Top 10 contributor
- Sandbox isolation: If a compromised agent is sandboxed, damage is contained and can’t spread to the network
- Focus on the ‘hands,’ not the ‘brain’: “What APIs can it reach? What data can it read? Agent permissions must be heavily restricted and continuously monitored for anomalous behavior.” — Dhaval Shah
- Monitor persistent context files: SOUL.md, MEMORY.md, and similar context stores are now attack surfaces that need integrity monitoring
Where This Fits
This analysis synthesizes many of the threats we’ve covered individually — ClawJacked, OWASP Agentic Top 10, DryRun’s 87% security bug rate, NIST agent standards — into a coherent framework for why traditional tools fail.
The key insight: it’s not that existing security is bad. It’s that it was designed for a world where software is deterministic, passive, and operates within defined boundaries. AI agents are none of these things.
The ReversingLabs analysis is the clearest articulation yet of why agent security requires new tooling, not just new policies. The RSAC 2026 pre-wave of 12+ agent security companies launching in 10 days suggests the market agrees.