Amazon’s e-commerce site went down multiple times in recent weeks. Millions of orders vanished. And at least one outage was caused by the company’s own AI coding assistant.

On March 10, 2026, Amazon SVP Dave Treadwell told engineering staff that a “trend of incidents” had been building since Q3 2025, culminating in several major disruptions. His response: a 90-day code safety reset covering 335 Tier-1 systems — the services that directly touch customers.

The message to engineers was blunt. The era of shipping fast with AI-generated code needs guardrails. And Amazon is building them in real time.

The Damage

The numbers are staggering for a company that processes billions in daily transactions:

March 2, 2026: Amazon’s AI coding tool Q pushed a change that triggered 1.6 million errors and 120,000 lost orders globally.

March 5, 2026: An undocumented production change — unrelated to AI but enabled by the same lax review culture — caused 6.3 million lost orders.

Additional incidents: A six-hour e-commerce outage blocked all transactions, account access, and product interactions. Some failures traced back to December 2025 AWS outages also linked to AI coding tools Q and Kiro.

The pattern wasn’t just “AI wrote bad code.” It was deeper: AI tools enabled engineers to produce dramatically more code than traditional review processes could absorb. The volume overwhelmed the safeguards.

The New Rules

Treadwell’s reset introduces what Amazon calls “controlled friction” — deliberately slowing things down in the most critical paths. The specifics:

Two peer reviews required. Every code change to Tier-1 systems now needs sign-off from two engineers. This had been standard practice but was “either lacking or bypassed” in some teams.

Modeled Change Management. Engineers must document and get approval for all production changes through an internal tool, creating an audit trail that didn’t consistently exist before.

Automated enforcement. A new system will enforce Amazon’s central reliability engineering rules automatically, removing human discretion from compliance.

Director and VP audits. Leadership must actively audit all production code activities in their organizations — not just react to incidents.

Amazon is careful to note: only one reviewed incident was directly AI-related. The others exposed pre-existing gaps that AI’s velocity made dangerous. But that distinction matters less than the lesson.

The Real Problem: AI Velocity vs. Human Review

This is the story that matters beyond Amazon.

AI coding assistants — Claude Code, Cursor, Amazon Q, GitHub Copilot — have fundamentally changed the throughput equation. A single engineer can now produce code at a rate that previously required a team. But the review process hasn’t scaled to match.

As Business Insider reported: “These powerful new services are not deterministic. That means you can ask the same question twice and an AI model may spit out slightly different answers. That sometimes makes this technology inappropriate for corporate workflows that must be 100% accurate every time.”

This is the AI velocity trap: you get 10x the code output but your review capacity stays at 1x. The backlog grows. Shortcuts happen. Eventually, something slips through that costs you 7 million orders.

Amazon’s solution is instructive. They’re not banning AI coding tools. They’re adding friction — controlled, measured friction — at the points where errors are most expensive. The 90-day reset buys time to build “more durable solutions including both deterministic and agentic safeguards."

"Agentic” Safeguards: Fighting AI with AI

The most interesting part of Treadwell’s memo is the long-term vision: using agentic AI tools to review code produced by AI tools. Fighting fire with fire.

The plan combines two approaches:

Deterministic safeguards — rules-based systems that enforce hard constraints. If a change touches pricing data, it always requires senior review. No exceptions, no AI judgment calls.

Agentic safeguards — AI systems that understand context, can reason about blast radius, and flag subtle issues that rules might miss. These are the “reviewer bots” that scale with AI-generated code volume.

This dual approach acknowledges a reality: you can’t review AI code at AI speed using only human processes. But you can’t trust AI reviewers alone either. The answer is layered defense.

What This Means for the Agent Ecosystem

Amazon’s crisis is a preview of what happens when AI agents operate in production environments without adequate control planes. The parallels to the broader agent ecosystem are direct:

Blast radius control. Amazon’s outages spread because “control planes lacked suitable safeguards.” Agent frameworks face the same challenge — a misconfigured agent with broad permissions can cascade failures across connected systems.

Review bottlenecks. As agents generate more actions, approvals, and changes, human oversight becomes the bottleneck. The answer isn’t removing oversight — it’s making it smarter.

Audit trails matter. Amazon couldn’t quickly diagnose problems because documentation was inconsistent. Any agent system needs comprehensive logging of what changed, who (or what) approved it, and what the blast radius was.

For OpenClaw users, this reinforces the value of the permission model: sandboxed execution, explicit tool allowlists, and command approval workflows. These are exactly the “controlled friction” mechanisms Amazon is now scrambling to implement.

The lesson isn’t that AI coding tools are dangerous. It’s that AI speed without AI-grade review processes is dangerous. Amazon just learned that lesson at a cost of millions of orders.

The 90-day clock is ticking.

For adjacent lessons, see Amazon’s AI coding outage reset explained, GPT-5.4 computer use and what it means for OpenClaw, and our guide to setting guardrails on capable agents.


Sources: Business Insider, NDTV, TechRadar, Times of India