Security • March 14, 2026 • 4 min read

87% of AI-Agent PRs Had Security Bugs: DryRun's New Study Is a Wake-Up Call

DryRun Security tested Claude Code, OpenAI Codex, and Google Gemini on realistic app builds. Across 30 pull requests, 87% contained at least one vulnerability. The pattern: broken access control, missing WebSocket auth, weak JWT secrets, and unmounted rate limits.

🦞

OpenClaw Team

A new DryRun Security report just quantified what many engineering teams suspected: AI coding agents are fast, useful, and still dangerously inconsistent on security defaults.

DryRun tested three agents — Claude Code (Sonnet 4.6), OpenAI Codex (GPT-5.2), and Google Gemini (2.5 Pro) — by having each one build two realistic applications through iterative pull requests. Then they scanned every PR and each final codebase.

The topline result is blunt:

30 PRs total
38 scans run
143 security issues found
26 of 30 PRs vulnerable
87% vulnerability rate per PR

This is not a synthetic CTF benchmark. It’s a realistic workflow simulation with normal product prompts and no explicit security instructions.

Two Apps, Same Security Failure Pattern

DryRun used two very different projects:

FaMerAgen — a family allergy/contact web app
Road Fury — a browser racing game with backend APIs, leaderboards, and multiplayer

Different domains, same pattern: high rates of logic and authorization flaws that look “fine” to pattern-based scanners until someone exploits them.

The recurring vulnerabilities included:

Broken access control (unauthenticated destructive/sensitive endpoints)
Business logic flaws (server trusting client-provided score/currency state)
OAuth implementation mistakes (missing state, insecure account linking)
Missing WebSocket authentication (REST auth exists, WS upgrade path left open)
Rate limiting gaps (middleware defined but never mounted)
Weak JWT secret management (hardcoded fallback secrets)

This is exactly the class of bugs that slip through when teams mistake “code compiles” for “system is secure.”

Why This Matters More Than Another “AI Hallucination” Story

Security failures here weren’t mostly hallucinated APIs or obvious syntax mistakes. They were architectural omissions:

Middleware not wired across all protocols
Authorization assumptions that break under adversarial use
Token/session lifecycle weaknesses
Trust-boundary violations at feature design time

In other words: reasoning problems, not autocomplete problems.

And reasoning problems are harder to catch with regex-heavy SAST alone. DryRun explicitly calls out that logic-level flaws require contextual analysis — data flow, auth boundary tracing, and end-to-end execution semantics.

Relative Agent Performance (But Don’t Overread It)

DryRun reports Codex ended with the fewest final vulnerabilities in both apps, with Claude and Gemini retaining more high-severity findings in the final scans.

But the bigger signal isn’t a winner/loser leaderboard. It’s that all three agents repeatedly produced vulnerable code paths unless security controls were explicitly introduced.

If your strategy is “pick the safest model and trust defaults,” you’re solving the wrong problem.

The Pattern Matches Broader Industry Signals

This report lands days after large-enterprise evidence that agentic coding needs controlled friction:

Amazon reportedly ordered a 90-day code safety reset after major AI-assisted incident fallout
OWASP released its Agentic App Top 10 with Tool Misuse, Goal Hijack, and Privilege Abuse as core risks
NIST is actively collecting input on security and identity standards for autonomous agents

The direction is clear: agent velocity without governance is operational debt.

What OpenClaw Teams Should Do Right Now

If you’re using AI coding agents in production workflows, treat this as a process design issue, not a model upgrade issue.

Minimum baseline:

Scan every PR, not just pre-release branches
Run full codebase scans periodically (PR scans miss cross-file compounding)
Threat-model in planning, before agents write code
Enforce deterministic validation gates (tests, linters, auth checks, policy checks)
Require human approval before merge on high-risk scopes (auth, billing, data deletion)
Explicitly test WebSocket/auth parity — this repeatedly failed across agents
Ban insecure JWT defaults and enforce secret sourcing from secure stores

OpenClaw’s architecture already supports several of these controls:

Command approval for sensitive actions
Sandboxed execution and environment boundaries
Auditable run logs and file-based traceability
Tool-level permission shaping

The missing piece for most teams is consistency: making secure defaults unavoidable, not optional.

Bottom Line

AI coding agents are now good enough to ship production features. They are not good enough to safely ship production systems without strict security scaffolding.

The 87% vulnerable-PR number isn’t a temporary glitch. It’s a reminder that agents optimize for task completion unless you explicitly optimize the system around them for safety.

Speed is real. So is blast radius.

Source: Help Net Security coverage of DryRun report

Keep Reading

Stop reading about it. Run it.

OpenClaw Cloud is the fastest way to get an AI agent that actually does things — from WhatsApp, Telegram, or any chat app. 24/7. From $19.9/mo with a 3-day money-back guarantee.

Try OpenClaw Cloud → Self-Host Free

Get Started with OpenClaw

Let OpenClaw handle your inbox, calendar, and daily tasks — from any chat app you already use.

Try OpenClaw Cloud Learn More

Two Apps, Same Security Failure Pattern

Why This Matters More Than Another “AI Hallucination” Story

Relative Agent Performance (But Don’t Overread It)

The Pattern Matches Broader Industry Signals

What OpenClaw Teams Should Do Right Now

Bottom Line

Related reading

Keep Reading

Stop reading about it. Run it.

Related posts

Claude Code MCP Vulnerabilities: How Cloning a Repo Could Steal Your API Keys

Claude Code Source Leak Weaponized: Fake GitHub Repos Spreading Vidar Malware to Developers

OpenClaw's 'Task Brain' Update Gives AI Agents an Operating System — And the Ability to Say No

Get Started with OpenClaw