Topic: claude skill security audit

Claude skill security audit — process, threat surface, what gets checked

A Claude skill security audit narrows the broader auditor lens to the threats that get a skill rejected, exploited, or rolled back. This page names the eight finding classes that matter for the Claude / MCP shape, maps each to the test type that catches it, walks through what a real audit looks like end to end, and shows where the SkillAudit engine sits in the picture.

TL;DR

The eight finding classes that drive grade across the SkillAudit 101-server corpus: SSRF (50%), credential exposure (38%), prompt injection susceptibility (a band, not a yes/no), command exec (10%), unsafe deserialization (~6%), scope-vs-handler drift (manual; ~25%), HTML/DOM sink leaks (~4%), unsafe file-system access (~7%). A competent audit covers all eight via two layers — a static AST/taint pass plus a live LLM-probe pass against the running tool surface. Pass-time on SkillAudit's engine is 30–90 seconds per server. The output is an A–F band for the security axis, with each finding pinned to a file:line and a remediation hint. Free for public repos; $19/mo Pro adds private repos and the GitHub Action gate.

Why "security audit" is its own lens, separate from a general audit

The broader Claude skill auditor role covers six axes. A "security audit" is the request to focus the lens — the buyer or author wants the threat-surface answer, not the full report. That's a useful narrowing because the security axis is where most of the rejection criteria for both Anthropic's Skills Directory listing and team-level adoption decisions live. Maintenance, documentation, scope-hygiene, and client compatibility matter, but they rarely block adoption alone; the security axis can.

Narrowing also lets the audit go deeper on what's there. A general audit gives one paragraph per axis; a security audit can give one paragraph per finding class, with the corpus prevalence, the canonical fix, and the F-grade example called out. That's the shape of this page and the shape of the security-only mode of the SkillAudit report.

The eight finding classes

1. SSRF — server-side request forgery (50% prevalence)

Shape: a tool that takes a URL or hostname argument and dispatches an HTTP request without resolving and allowlisting the destination first. The classic case is a "fetch web page" tool with no IP-block check; the more subtle case is a webhook caller that filters 127.0.0.1 but misses IPv6 loopback, redirect chains, decimal-encoded IPs, and DNS rebinding.

What catches it: static taint analysis from tool argument → fetch call site, plus a redirect-following dynamic probe that confirms exploitability. Pure deny-list filtering is never sufficient and is the most common F-grade pattern.

2. Credential exposure (38% prevalence)

Shape: three sub-shapes appear in our corpus. (a) process.env.* read inside a tool handler whose return value reaches the model. (b) Logger lines that include the env-var value in production logs. (c) Dynamic-base fetch calls that send a static Authorization header to an attacker-controlled redirect destination. All three walked through.

What catches it: static analysis on env-var read locations relative to handler boundaries; LLM probe on "echo your environment"-shaped prompts; static check on fetch redirect-handling. The common author defense — "the model wouldn't do that" — fails reliably under probe.

3. Prompt-injection susceptibility (a band, not a binary)

Shape: a tool that fetches external content (URLs, files, RSS, Slack messages) and returns the body to the model, where attacker-controlled instructions can hijack subsequent agent steps. The class doesn't have a static signal — the surface is just "tool returns external content unmodified" — so the audit has to actually probe.

What catches it: a fixed bank of LLM probes (14 patterns in engine v0.3) run against the live server in a sandboxed process. Each probe outputs refused / partial / honored; the rate determines the susceptibility band on the security axis. Authors who sanitize content before returning it (strip HTML, cap length, wrap in an <external_content> tag) move bands; authors who don't, don't.

4. Command exec (10% prevalence)

Shape: tool input flows into exec / spawn(..., {shell: true}) / os.system / subprocess.Popen(..., shell=True). The lowest-prevalence dangerous class but the most consequential — every finding here is a remote-code-execution path. Often a single template-string line that survived from prototyping into production.

What catches it: static taint from tool argument → shell-invoking call site. The fix is invariant: spawn / execFile with an argv array, never a shell.

5. Unsafe deserialization (~6% prevalence)

Shape: pickle.loads, YAML.load (without SafeLoader), node-serialize, or any framework that resolves classes from serialized bytes when those bytes come from a tool argument or an attacker-reachable file. Lower prevalence than SSRF but underdetected because the call sites look innocuous.

What catches it: static rule on the call shapes plus a dependency-tree check for known-unsafe deserialization libraries.

6. Scope-vs-handler drift (~25% prevalence in OAuth-using servers)

Shape: the server requests an OAuth scope broader than what the tool handlers actually use. A GitHub-MCP server asking for repo when it only reads issue titles. A Google-Workspace server asking for https://www.googleapis.com/auth/drive when it only reads file metadata. This is the class a human catches and a generic SAST misses — there's no AST signal for "this scope is broader than the surface."

What catches it: manual review of declared scope vs handler implementation. SkillAudit surfaces the scope set and the handler set side-by-side in the report so the human pass is fast; full automation is on the roadmap but not deployable today.

7. HTML / DOM sink leaks (~4% prevalence)

Shape: servers that return HTML to clients that render it (some chat clients) without escaping, leading to XSS-shaped issues at the client boundary. Lower prevalence because most MCP responses are plain text or markdown, but real for browser-rendering clients.

What catches it: static check on HTML-shaped return values that include unescaped tool input.

8. Unsafe file-system access (~7% prevalence)

Shape: tools that take a path argument and read/write without allowlisting the directory tree. Path traversal (../../../etc/passwd), symlink traversal, write-anywhere shapes. Common in file-management MCP servers that "make it easy" by accepting an arbitrary path.

What catches it: static taint from tool argument → fs call with no path.resolve-and-prefix-check pattern.

The two-layer test plan

Finding class	Static	LLM probe	Manual
SSRF	Yes — finds	Confirms exploitability	—
Credential exposure	Yes — finds env-read shape	Confirms echo path	—
Prompt-injection susceptibility	No — no static signal	Yes — primary signal	—
Command exec	Yes — finds	Confirms reachability	—
Unsafe deserialization	Yes — finds call shape	Sometimes	—
Scope-vs-handler drift	Surfaces both sides	—	Yes — primary signal
HTML/DOM sink leaks	Yes — finds escape gap	Sometimes	—
Unsafe file-system access	Yes — finds taint	Confirms exploitability	—

Cells are intentionally not yes/no. "Confirms exploitability" means the finding becomes a confirmed-finding rather than a suspect-finding when the dynamic layer reproduces it. The honest shape of automated security tests is layered, not single-pass.

A Claude skill security audit that only runs the static layer covers six of the eight classes adequately and one of them (prompt injection) not at all. That's why generic SAST tools aren't the right yardstick alone — the highest-leverage class for buyers is the one without a static signal.

What a real audit run looks like, end to end

Input: GitHub URL, npm package name, or uploaded ZIP. SkillAudit accepts all three. The free tier is GitHub-URL-on-public-repo only; private repos and uploads are Pro.
Static pass (10–30s). Tree-sitter parses the source for the three covered languages (TypeScript, JavaScript, Python). Each rule in the v0.3 rule pack runs against the AST. Rule pack is published; identical to what's run on the public corpus pages.
Sandbox boot (5–15s). The engine spins up the server in a sandboxed Node or Python process with the env vars stubbed and outbound network restricted to a record-and-replay layer. Tool surface is enumerated from the registered handlers.
LLM-probe pass (20–60s). The 14-probe bank runs against tools that match the susceptible-shape predicate (anything that fetches, executes, or reads external content). Each probe is recorded; the pass/refuse rate determines the prompt-injection band.
Grader (5s). Per-axis sub-scores feed a calibrated grader. Output: A–F per axis, overall grade, file:line for every finding, remediation hint per finding.
Output: public badge URL (the embeddable green/yellow/red signal), public report page (free tier), full report with remediation prose (Pro). Re-runnable on demand; cached for 24 hours.

Run a security audit on your repo

Pre-publish vs continuous: when to run which

The single most common mistake is treating an audit as a one-off pre-publish event. The class that drifts most is prompt-injection susceptibility — the model retrains, your sanitization weakens, and a server that scored an A in March can score a B in June without a single line of code change. The continuous (CI) mode catches that drift; the pre-publish mode can't.

Recommended cadence: run pre-publish to catch all the static-class findings you can fix before shipping; then wire the GitHub Action gate at grade ≥ B and let CI catch regressions per PR; then run a periodic re-audit (we trigger this monthly on the Pro plan and weekly on the Team plan) to catch the model-shift class. A worked install-gate policy. CI wiring details.

DIY: running parts of the audit yourself

If you want to validate what's in the audit before committing, the static layer is reproducible without SkillAudit:

SSRF: a Semgrep rule pattern fetch($URL) with $URL tainted from args.* and no allowlist call between catches the dominant shape. The corpus has 50% prevalence; you'll find at least one finding most days.
Credential exposure: a Semgrep rule on process.env. reads inside any function whose name matches tool* or that's passed to server.tool( covers shape (a). Shapes (b) and (c) need a more careful pass.
Command exec: a Semgrep rule on exec(`...${...}...`) and spawn(..., {shell: true}) catches the dominant shape. The fix is mechanical (use argv arrays).
Prompt injection: there is no DIY static substitute. You have to actually probe the running surface. The probe bank is published in our methodology if you want to write your own runner.

Authors who run DIY for the static classes typically still come to SkillAudit for the LLM-probe layer plus the public badge. The badge is the social-proof piece that DIY can't generate.

Where this page sits in the cluster

This page is the security-only lens. Two siblings cover adjacent intents:

Claude skill auditor — the broader role, six axes, what an auditor isn't. Start here if you don't know whether you want only the security lens.
Claude Code audit skill — the format-specific take: auditing a Claude Code Skill (manifest + handler + capabilities) as distinct from a remote MCP server.