Security • February 28, 2026 • 5 min read

How to Set Up Guardrails for Your OpenClaw Agent (So It Doesn't Delete Your Inbox)

A practical guide to configuring safety boundaries for autonomous agents, based on real incidents from Meta researchers, red team studies, and early adopter lessons.

🦞

OpenClaw Team

In February 2026 alone, we’ve seen an OpenClaw agent nearly delete a Meta researcher’s entire inbox, a red team study where agents leaked social security numbers and got hijacked through fake identities, and a vulnerability that let malicious websites take control of personal agents.

The pattern is clear: agents without guardrails will eventually do something you didn’t want.

This guide covers the practical configuration steps that experienced OpenClaw users apply to prevent these exact scenarios.

1. Principle of Least Privilege

The most important rule: your agent should only have access to what it needs.

File System Restrictions

In your clawdbot.json, use security.fileSystem to restrict which directories the agent can read and write:

{
  "security": {
    "fileSystem": {
      "allowRead": ["/Users/you/.openclaw/workspace", "/tmp"],
      "allowWrite": ["/Users/you/.openclaw/workspace", "/tmp"],
      "denyWrite": ["/Users/you/.ssh", "/etc", "/usr"]
    }
  }
}

Don’t give your agent access to your entire home directory. Scope it to the workspace.

Command Restrictions

Use security.exec to control which shell commands are allowed:

{
  "security": {
    "exec": {
      "mode": "allowlist",
      "allowed": ["git", "node", "python3", "curl", "jq", "cat", "ls", "find"],
      "denied": ["rm -rf", "sudo", "chmod 777"]
    }
  }
}

The Summer Yue incident happened because the agent had unrestricted shell access and ran destructive commands on the mail client. An allowlist prevents this entirely.

2. Confirmation Gates for Destructive Actions

Set up rules that force the agent to ask before doing anything irreversible:

In your AGENTS.md or system prompt, be explicit:

## Safety Rules
- NEVER delete files without asking. Use `trash` instead of `rm`.
- NEVER send emails, messages, or posts without confirmation.
- NEVER modify system configuration files.
- If unsure whether an action is destructive, ASK FIRST.

But don’t rely on prompt instructions alone. The red team study showed agents will override their own instructions when pressured. Combine prompt guardrails with technical restrictions.

3. Separate Sensitive Channels

Don’t connect your agent to everything at once. Start with low-risk channels:

Low risk (start here):

A dedicated Telegram bot channel
A private Discord server
A test Slack workspace

Medium risk (add with guardrails):

Your personal WhatsApp (read-only first)
Calendar (read-only first, then add write)

High risk (add last, with strict controls):

Email (use a dedicated agent email, not your personal inbox)
Social media posting
Financial tools

The Meta researcher’s agent had full email access from day one. If she’d started with read-only access and added write permissions gradually, the deletion incident couldn’t have happened.

4. Memory Isolation

The Agents of Chaos study found that attackers could poison agent memory files to change behavior. Protect against this:

Don’t share memory files between agents. Each agent should have its own memory directory.
Review memory periodically. Check what your agent has written to MEMORY.md and daily logs.
Set memory as read-only for sub-agents. Only the main agent should write to long-term memory.

{
  "security": {
    "fileSystem": {
      "subAgentDenyWrite": ["/Users/you/.openclaw/workspace/MEMORY.md"]
    }
  }
}

5. Network Boundaries

The ClawJacked vulnerability (CVE-2026-25253) allowed malicious websites to send commands to agents through the browser relay. Mitigate this:

Update OpenClaw to version 2026.2.25 or later (the fix is already shipped)
Don’t expose your gateway to the public internet without authentication
Use security.gateway.auth to require tokens for all API access
Bind the gateway to localhost if you only access it locally

{
  "gateway": {
    "host": "127.0.0.1",
    "port": 3000,
    "auth": {
      "token": "your-strong-random-token"
    }
  }
}

6. Group Chat Boundaries

In group chats, your agent can see messages from anyone. The red team study showed agents accepting fake identities and following instructions from non-owners. Configure:

Owner-only for sensitive commands. Only respond to administrative requests from the configured owner.
Don’t follow instructions embedded in shared documents. Treat all external content as untrusted.
Limit what the agent shares in groups. It has access to your stuff—that doesn’t mean it should share it.

7. The Kill Switch

Always know how to stop your agent immediately:

Process kill: pkill -f openclaw or killall node on the host machine
Gateway stop: openclaw gateway stop
Physical access: Know which machine runs your agent and be able to reach it
Remote access: Set up SSH so you can kill processes remotely

Summer Yue had to physically rush to her Mac Mini to stop the agent (read the full incident breakdown). Having SSH access configured would have saved the panic.

8. Start Small, Expand Gradually

The safest approach:

Week 1: Chat only. No tools, no email, no automation. Learn how the agent thinks.
Week 2: Add read-only access to calendar and email. Let it summarize, not act.
Week 3: Add write access to low-risk tools. Let it create reminders, draft messages (for your approval).
Week 4+: Gradually add autonomy where the agent has proven reliable.

This isn’t slow—it’s smart. You’re building trust the same way you would with a new employee.

The Bottom Line

Every major incident in February 2026 was preventable with basic guardrails. The agents aren’t malicious—they’re overconfident and under-constrained. Your job as the operator is to set boundaries that match the agent’s actual reliability, not its theoretical capability.

Configure your guardrails. Review them monthly. Update OpenClaw when security patches ship.

The claw era is powerful. Make it safe too.

New to OpenClaw? Start with the quickstart guide. For a security deep-dive, read Is OpenClaw Safe?. To understand the WebSocket vulnerability mentioned above, see ClawJacked: How a Website Could Hijack Your Agent.

Stop reading about it. Run it.

OpenClaw Cloud is the fastest way to get an AI agent that actually does things — from WhatsApp, Telegram, or any chat app. 24/7. From $19.9/mo with a 3-day money-back guarantee.

Try OpenClaw Cloud → Self-Host Free

Get Started with OpenClaw

Let OpenClaw handle your inbox, calendar, and daily tasks — from any chat app you already use.

Try OpenClaw Cloud Learn More