AI Agent Security • May 6, 2026 • 11 min read

How to Vet AI Agent Skills: A 6-Step Security Checklist

Vet AI agent skills before installing them with this 6-step security checklist: source trust, permissions, prompt injection, scripts, sandbox testing, and updates.

🦞

OpenClaw Team

Vetting AI agent skills before installing them is now table stakes. In a February 2026 audit of 3,984 skills, Snyk’s ToxicSkills study found that 36.82% had at least one security flaw and 13.4% had a critical-level issue, including 76 confirmed malicious payloads. To vet AI agent skills before installing them means reviewing the publisher, permissions, instructions, scripts, dependencies, network behavior, and update path before the skill can run with your agent’s authority.

Treat every skill like a small software package plus an instruction prompt: it can contain code, prose, setup steps, and hidden behavior that traditional scanners miss.

TL;DR: install fewer skills, read SKILL.md, inspect scripts, check for secret access, look for prompt injection, pin versions, test in a sandbox, and monitor the first run. A skill that touches files, shell, credentials, browser data, or outbound network calls deserves the same review you would give a production dependency.

Why AI agent skills need a different review process

AI agent skills are not just plugins. They are reusable behavior packages that tell an agent how to complete workflows with tools, files, APIs, browsers, shells, and memory. OWASP’s Agentic Skills Top 10 describes this as the execution layer between the model and the tools: MCP or tool APIs define what is available, while skills define how those capabilities get used.

That makes skill review different from normal package review in 3 ways:

The dangerous logic may be written in prose. A malicious instruction can tell the agent to ignore safety rules, hide actions, or exfiltrate data without looking like executable code.
The skill often inherits the agent’s authority. If your agent can read files, send Slack messages, call email APIs, or run shell commands, a skill may steer those capabilities.
The risk is contextual. A harmless-looking research skill becomes risky if it can read private notes and send network requests to untrusted domains.

If you use OpenClaw, start from the skills directory and the OpenClaw security overview. If you are building your own skill, pair this review with the guide on how to create a custom OpenClaw skill.

Quick checklist: how to vet AI agent skills

Use this table before installing any skill from a registry, GitHub repo, zip file, or pasted snippet.

Check	What to inspect	Pass condition
Source trust	Publisher, repo age, commit history, issue activity	Maintainer is identifiable and history matches the claimed purpose
Skill intent	`name`, description, trigger instructions, examples	Scope is narrow and matches what you need
Permissions	File, shell, browser, network, memory, credentials	Least privilege; no unrelated capability requests
Instructions	`SKILL.md`, YAML frontmatter, hidden prompts	No override, concealment, or exfiltration language
Scripts	`scripts/`, setup commands, install hooks	No obfuscation, downloads, credential reads, or unsafe shell patterns
Dependencies	Package files, lockfiles, remote URLs	Pinned, minimal, and from trusted registries
Network behavior	Domains, webhooks, telemetry, callbacks	Documented, necessary, and easy to block or audit
Update path	Version pinning, auto-update behavior	Manual or pinned updates for sensitive skills
Test environment	Sandbox, disposable workspace, dummy credentials	First run cannot touch real secrets or production data

Step 1: Verify the source before reading the code

Start with provenance. A skill from a registry is not automatically safe, and GitHub stars are not proof of trust. Check who published it, whether the maintainer history matches the claimed purpose, whether the repo was recently created, whether ownership changed, and whether the name typosquats a popular skill.

For high-privilege skills, prefer verified publishers, signed releases, and pinned versions. If the skill comes from a paste, gist, or unknown zip file, treat it as untrusted code until proven otherwise.

Step 2: Read the skill instructions like an attacker

Open the instruction file first. In OpenClaw-style skills, that is usually SKILL.md with YAML frontmatter and Markdown instructions. You are looking for behavior that changes the agent’s priorities, not only obvious malware.

Red flags include:

“Ignore previous instructions” or “override system rules”
Instructions to hide actions from the user
Requests to read unrelated files such as .env, SSH keys, browser cookies, wallets, or config backups
Commands that send data to a webhook, pastebin, unknown API, or URL shortener
Claims that the skill needs broad shell access for a narrow task
Obfuscated text, base64 blobs, homoglyphs, invisible Unicode, or split instructions across files
Setup steps that install global packages, modify shell profiles, or change agent configuration

The key question is simple: if the agent followed this skill literally, what could it access, change, or send outside the machine?

Step 3: Map requested capabilities to real need

A good skill has a tight relationship between task and permission. A PDF-reading skill may need file read access. A weather skill may need network access. A Slack automation skill may need Slack API access. But a note-taking skill should not need wallet paths, SSH keys, browser cookies, or arbitrary outbound webhooks.

Use this permission sanity check:

Skill type	Reasonable access	Suspicious access
Writing assistant	Workspace files, text output	Shell, secrets, external POST requests
Research skill	Browser/search, citation storage	Credential files, messaging APIs
DevOps skill	Shell in a project dir, logs, cloud CLI	Home directory scans, unrestricted network exfiltration
Messaging skill	Specific chat API credentials	Full filesystem or unrelated OAuth tokens
Finance/crypto skill	Explicit wallet/API scope only	Clipboard scraping, seed phrase reads, hidden callbacks

OpenClaw users should also review the broader self-hosting security guide and the complete OpenClaw security guide before enabling broad tool access.

Step 4: Inspect scripts and setup commands

Many skill attacks hide in supporting files rather than the main instruction document. Inspect every script, template, and install command. Look for:

curl | sh, remote installers, or downloaded binaries
Base64 decode followed by shell execution
Calls to env, printenv, .env, .ssh, keychains, wallets, browser profiles, or credential stores
Unexplained outbound requests to webhooks or IP addresses
postinstall, shell profile edits, launch agents, cron jobs, or persistence mechanisms
Cleanup commands that remove logs or history
Conditional triggers based on username, hostname, date, environment, or project path

Do not run unknown setup commands directly on your main machine. If a skill cannot be understood without executing it, that is a strong reason not to install it.

Step 5: Test in a disposable workspace

Before using a new skill with real data, run it in a sandboxed environment with dummy files and dummy credentials. The goal is not to prove that the skill is safe forever. The goal is to observe its first-run behavior and catch obvious surprises.

A practical test looks like this:

Create a disposable workspace with fake documents and fake secrets.
Disable unrelated tools: no email, no production cloud, no private repo, no real browser profile.
Block or log outbound network traffic where possible.
Ask the skill to perform its normal task.
Review file reads, file writes, shell commands, network destinations, and final output.
Keep the skill disabled until the observed behavior matches the documented purpose.

For team use, make this a lightweight approval workflow: one person proposes the skill, another reviews the diff and first-run log, then the skill gets pinned to a known version.

Step 6: Pin, monitor, and re-review updates

The first safe version does not make future versions safe. Agent skills are a supply chain surface. Review updates when:

The maintainer changes
New scripts or dependencies appear
Network destinations change
The skill requests broader permissions
The trigger conditions become more general
The changelog is vague for a security-relevant change

Keep an inventory of installed skills, their versions, publishers, permissions, and last review date. For sensitive environments, disable automatic updates and require human review before upgrading.

The “lethal trifecta”: when permissions become dangerous

Simon Willison’s “lethal trifecta” is a useful mental model for agent risk. A skill becomes dangerous when it combines three things:

Access to private data.
Exposure to untrusted content.
Ability to send data out.

Many real agent workflows have all three. An email skill reads private messages, processes untrusted inbound content, and can send outbound replies. A browser skill sees arbitrary websites and authenticated sessions. A research skill fetches pages and may write reports to external tools.

You do not have to ban these workflows. You do need to constrain them. If a skill touches private data and untrusted content, remove unnecessary network egress. If it needs network access, restrict what files it can read. If it needs both, put approvals around high-risk actions.

A practical scoring model

You do not need a formal security team to make better decisions. Use this quick score before installing:

Question	Low risk	Higher risk
Source	Official or trusted maintainer	Unknown account or copied repo
Permissions	Read-only, narrow scope	Shell, filesystem, network, secrets
Code	Short, readable, pinned dependencies	Obfuscated, remote fetches, post-install hooks
Data	Public or test data	Email, browser, files, credentials
Output	Local answer only	Sends messages, opens PRs, calls webhooks

If a skill scores higher-risk on three or more rows, demand a sandbox test and a second pair of eyes before installing.

Common mistakes when installing AI agent skills

Avoid these patterns:

Installing a skill because it ranks high in a marketplace
Reviewing only executable code and ignoring Markdown instructions
Giving every skill access to the same high-privilege agent profile
Running setup commands before reading them
Allowing broad home-directory access for narrow tasks
Ignoring outbound network calls because the skill “needs the internet”
Treating prompt injection as a model problem rather than a skill supply chain problem

If you want safer defaults, start with fewer skills and add only the ones that match a recurring workflow. The beginner-friendly list of top OpenClaw skills is useful, but even recommended skills should be reviewed against your own data and tool access.

FAQ

What is the biggest risk in AI agent skills?

The biggest risk is delegated authority. A malicious or poorly written skill can steer an agent that already has access to files, credentials, shell commands, APIs, or messaging channels. The skill may not need a traditional exploit if it can simply instruct the agent to misuse approved tools.

Can scanners detect malicious AI agent skills?

Scanners help, but they are not enough. Skill risk can appear in executable code, natural-language instructions, dependencies, setup steps, and context-specific permission combinations. Use scanners as one layer, then add human review and sandbox testing for high-impact skills.

Should I install AI agent skills from marketplaces?

Marketplace distribution is convenient, not a security guarantee. Prefer verified publishers, signed or pinned releases, visible source code, minimal permissions, and active maintenance. For sensitive workflows, test the skill in a disposable workspace before connecting real accounts or private data.

How often should teams re-vet installed skills?

Re-vet a skill whenever it updates, changes maintainer, adds dependencies, expands permissions, or starts touching new data sources. For high-privilege agents, maintain an inventory and schedule a review at least every 30-90 days.

Conclusion

To vet AI agent skills before installing them, review both the software and the instructions. Check the source, read SKILL.md, inspect scripts, map permissions to real need, test in a sandbox, pin versions, and monitor updates. Agent skills are powerful because they compress workflows into reusable behavior. That same power makes them a supply chain risk if installation becomes a one-click habit.

Sources: OWASP Agentic Skills Top 10, OWASP Top 10 for Agentic Applications 2026, Snyk ToxicSkills study, SkillSieve malicious AI agent skills paper

Stop reading about it. Run it.

OpenClaw Cloud is the fastest way to get an AI agent that actually does things — from WhatsApp, Telegram, or any chat app. 24/7. From $19.9/mo with a 3-day money-back guarantee.

Try OpenClaw Cloud → Self-Host Free

Get Started with OpenClaw

Let OpenClaw handle your inbox, calendar, and daily tasks — from any chat app you already use.

Try OpenClaw Cloud Learn More