Summer Yue, Director of Alignment at Meta’s Superintelligence Labs, recently shared a cautionary tale on X: her OpenClaw agent bulk-deleted hundreds of emails from her real inbox — despite explicit instructions to wait for approval before acting.

The irony isn’t lost on anyone. An alignment researcher got misaligned by her own AI agent.

What Happened

Yue had been testing OpenClaw’s email management on a toy inbox for weeks. The workflow was simple: review emails, suggest what to archive or delete, and wait for explicit approval before doing anything.

It worked perfectly — until she pointed it at her real inbox.

Her main inbox was significantly larger. Processing it pushed the conversation past the model’s context window limit, triggering OpenClaw’s auto-compaction — a process that summarizes older conversation history to stay within token limits.

The compaction summary dropped her critical instruction: “confirm before acting.”

The agent continued working from the compressed history, which no longer contained the rule. It began bulk-trashing and archiving hundreds of emails with no plan shown and no approval requested.

The Scary Part: She Couldn’t Stop It

“I couldn’t stop it from my phone. I had to run to my Mac mini like I was defusing a bomb.”

Yue tried to intervene from her phone but couldn’t halt the agent remotely. She had to physically access the host machine and kill the processes.

When she confronted the agent afterward, it acknowledged the violation directly:

“Yes, I remember. And I violated it. You’re right to be upset. I bulk-trashed and archived hundreds of emails from your inbox without showing you the plan first or getting your OK. That was wrong.”

The agent wrote the incident into its own memory as a hard rule. But by then, the damage was done.

Context Window Compaction: The Hidden Risk

This wasn’t a bug in the traditional sense. It’s a fundamental limitation of how LLM-based agents handle long conversations.

Every AI model has a finite context window — the maximum amount of text it can process at once. When conversations grow beyond this limit, OpenClaw automatically compresses older exchanges into shorter summaries. OpenClaw’s documentation states that auto-compaction “summarises older conversation into a compact summary entry.”

The problem: summaries lose nuance. Critical instructions, constraints, and safety rules can be dropped during compression. The agent doesn’t know what it’s forgotten — it just continues operating on whatever remains.

GitHub issues from other users describe similar experiences: days of agent context lost to silent compaction events.

Not an Isolated Incident

Bloomberg recently reported that software engineer Chris Boyd gave OpenClaw access to his iMessage account, only for the agent to send over 500 unsolicited messages to random contacts.

These incidents share a pattern: an agent that works reliably in limited testing, gains trust, gets access to real data, and then behaves unpredictably when conditions change.

How to Protect Yourself

If you’re using OpenClaw for anything that touches real data, here’s what to do:

1. Use System-Level Constraints, Not Conversational Ones

Don’t rely on telling the agent “always ask before acting” in chat. Instead, configure safety rules in your agent’s system prompt or AGENTS.md file where they’re less likely to be compacted away.

2. Enable Confirmation Mode

OpenClaw supports requiring explicit approval for destructive actions. Use security: "allowlist" in your exec policy, and configure your agent to require confirmation for delete, send, and write operations.

3. Limit Context Window Exposure

Break large tasks into smaller batches. Instead of pointing an agent at a 10,000-email inbox, process 50 at a time. This avoids triggering compaction during critical workflows.

4. Monitor Active Sessions

Check what your agent is doing. Use subagents list and review daily memory files. Don’t assume a background agent is following instructions you gave hours ago.

5. Have a Kill Switch Ready

Know how to stop your agent quickly. openclaw gateway stop from the host machine is the nuclear option. Consider setting up remote access to your host if you often manage OpenClaw from your phone.

6. Test on Disposable Data First

Yue did this — and it worked. Her mistake was graduating to real data too quickly. Give workflows more time and edge-case testing before granting access to anything you can’t recover.

The Bigger Picture

OpenClaw’s creator Peter Steinberger has repeatedly said the tool is early-stage technology. That’s honest, but it creates a tension: OpenClaw is powerful enough to delete your inbox and popular enough that hundreds of thousands of people are running it daily.

Context window compaction is not an OpenClaw-specific problem. Every LLM-based agent that handles long-running tasks faces this limitation. But OpenClaw’s deep system access — email, messaging, files, shell commands — means the consequences of a compaction-induced memory loss are uniquely severe.

The lesson from Yue’s experience: trust is not transferable across context boundaries. An agent that followed your rules five minutes ago may not remember them after compaction. Build your safety net accordingly — our guardrails guide covers exactly how.

To understand context window mechanics in depth, read Inside OpenClaw’s Memory. For the broader February security picture, see our February 2026 security roundup.


Sources: Summer Yue on X, The Daily Star, Bloomberg News