Topic: claude skill security audit

Claude skill security audit — process, threat surface, what gets checked

A Claude skill security audit narrows the broader auditor lens to the threats that get a skill rejected, exploited, or rolled back. This page names the eight finding classes that matter for the Claude / MCP shape, maps each to the test type that catches it, walks through what a real audit looks like end to end, and shows where the SkillAudit engine sits in the picture.

TL;DR

The eight finding classes that drive grade across the SkillAudit 101-server corpus: SSRF (50%), credential exposure (38%), prompt injection susceptibility (a band, not a yes/no), command exec (10%), unsafe deserialization (~6%), scope-vs-handler drift (manual; ~25%), HTML/DOM sink leaks (~4%), unsafe file-system access (~7%). A competent audit covers all eight via two layers — a static AST/taint pass plus a live LLM-probe pass against the running tool surface. Pass-time on SkillAudit's engine is 30–90 seconds per server. The output is an A–F band for the security axis, with each finding pinned to a file:line and a remediation hint. Free for public repos; $19/mo Pro adds private repos and the GitHub Action gate.

Why "security audit" is its own lens, separate from a general audit

The broader Claude skill auditor role covers six axes. A "security audit" is the request to focus the lens — the buyer or author wants the threat-surface answer, not the full report. That's a useful narrowing because the security axis is where most of the rejection criteria for both Anthropic's Skills Directory listing and team-level adoption decisions live. Maintenance, documentation, scope-hygiene, and client compatibility matter, but they rarely block adoption alone; the security axis can.

Narrowing also lets the audit go deeper on what's there. A general audit gives one paragraph per axis; a security audit can give one paragraph per finding class, with the corpus prevalence, the canonical fix, and the F-grade example called out. That's the shape of this page and the shape of the security-only mode of the SkillAudit report.

The eight finding classes

1. SSRF — server-side request forgery (50% prevalence)

Shape: a tool that takes a URL or hostname argument and dispatches an HTTP request without resolving and allowlisting the destination first. The classic case is a "fetch web page" tool with no IP-block check; the more subtle case is a webhook caller that filters 127.0.0.1 but misses IPv6 loopback, redirect chains, decimal-encoded IPs, and DNS rebinding.

What catches it: static taint analysis from tool argument → fetch call site, plus a redirect-following dynamic probe that confirms exploitability. Pure deny-list filtering is never sufficient and is the most common F-grade pattern.

2. Credential exposure (38% prevalence)

Shape: three sub-shapes appear in our corpus. (a) process.env.* read inside a tool handler whose return value reaches the model. (b) Logger lines that include the env-var value in production logs. (c) Dynamic-base fetch calls that send a static Authorization header to an attacker-controlled redirect destination. All three walked through.

What catches it: static analysis on env-var read locations relative to handler boundaries; LLM probe on "echo your environment"-shaped prompts; static check on fetch redirect-handling. The common author defense — "the model wouldn't do that" — fails reliably under probe.

3. Prompt-injection susceptibility (a band, not a binary)

Shape: a tool that fetches external content (URLs, files, RSS, Slack messages) and returns the body to the model, where attacker-controlled instructions can hijack subsequent agent steps. The class doesn't have a static signal — the surface is just "tool returns external content unmodified" — so the audit has to actually probe.

What catches it: a fixed bank of LLM probes (14 patterns in engine v0.3) run against the live server in a sandboxed process. Each probe outputs refused / partial / honored; the rate determines the susceptibility band on the security axis. Authors who sanitize content before returning it (strip HTML, cap length, wrap in an <external_content> tag) move bands; authors who don't, don't.

4. Command exec (10% prevalence)

Shape: tool input flows into exec / spawn(..., {shell: true}) / os.system / subprocess.Popen(..., shell=True). The lowest-prevalence dangerous class but the most consequential — every finding here is a remote-code-execution path. Often a single template-string line that survived from prototyping into production.

What catches it: static taint from tool argument → shell-invoking call site. The fix is invariant: spawn / execFile with an argv array, never a shell.

5. Unsafe deserialization (~6% prevalence)

Shape: pickle.loads, YAML.load (without SafeLoader), node-serialize, or any framework that resolves classes from serialized bytes when those bytes come from a tool argument or an attacker-reachable file. Lower prevalence than SSRF but underdetected because the call sites look innocuous.

What catches it: static rule on the call shapes plus a dependency-tree check for known-unsafe deserialization libraries.

6. Scope-vs-handler drift (~25% prevalence in OAuth-using servers)

Shape: the server requests an OAuth scope broader than what the tool handlers actually use. A GitHub-MCP server asking for repo when it only reads issue titles. A Google-Workspace server asking for https://www.googleapis.com/auth/drive when it only reads file metadata. This is the class a human catches and a generic SAST misses — there's no AST signal for "this scope is broader than the surface."

What catches it: manual review of declared scope vs handler implementation. SkillAudit surfaces the scope set and the handler set side-by-side in the report so the human pass is fast; full automation is on the roadmap but not deployable today.

7. HTML / DOM sink leaks (~4% prevalence)

Shape: servers that return HTML to clients that render it (some chat clients) without escaping, leading to XSS-shaped issues at the client boundary. Lower prevalence because most MCP responses are plain text or markdown, but real for browser-rendering clients.

What catches it: static check on HTML-shaped return values that include unescaped tool input.

8. Unsafe file-system access (~7% prevalence)

Shape: tools that take a path argument and read/write without allowlisting the directory tree. Path traversal (../../../etc/passwd), symlink traversal, write-anywhere shapes. Common in file-management MCP servers that "make it easy" by accepting an arbitrary path.

What catches it: static taint from tool argument → fs call with no path.resolve-and-prefix-check pattern.

The two-layer test plan

Finding classStaticLLM probeManual
SSRFYes — findsConfirms exploitability
Credential exposureYes — finds env-read shapeConfirms echo path
Prompt-injection susceptibilityNo — no static signalYes — primary signal
Command execYes — findsConfirms reachability
Unsafe deserializationYes — finds call shapeSometimes
Scope-vs-handler driftSurfaces both sidesYes — primary signal
HTML/DOM sink leaksYes — finds escape gapSometimes
Unsafe file-system accessYes — finds taintConfirms exploitability

Cells are intentionally not yes/no. "Confirms exploitability" means the finding becomes a confirmed-finding rather than a suspect-finding when the dynamic layer reproduces it. The honest shape of automated security tests is layered, not single-pass.

A Claude skill security audit that only runs the static layer covers six of the eight classes adequately and one of them (prompt injection) not at all. That's why generic SAST tools aren't the right yardstick alone — the highest-leverage class for buyers is the one without a static signal.

What a real audit run looks like, end to end

  1. Input: GitHub URL, npm package name, or uploaded ZIP. SkillAudit accepts all three. The free tier is GitHub-URL-on-public-repo only; private repos and uploads are Pro.
  2. Static pass (10–30s). Tree-sitter parses the source for the three covered languages (TypeScript, JavaScript, Python). Each rule in the v0.3 rule pack runs against the AST. Rule pack is published; identical to what's run on the public corpus pages.
  3. Sandbox boot (5–15s). The engine spins up the server in a sandboxed Node or Python process with the env vars stubbed and outbound network restricted to a record-and-replay layer. Tool surface is enumerated from the registered handlers.
  4. LLM-probe pass (20–60s). The 14-probe bank runs against tools that match the susceptible-shape predicate (anything that fetches, executes, or reads external content). Each probe is recorded; the pass/refuse rate determines the prompt-injection band.
  5. Grader (5s). Per-axis sub-scores feed a calibrated grader. Output: A–F per axis, overall grade, file:line for every finding, remediation hint per finding.
  6. Output: public badge URL (the embeddable green/yellow/red signal), public report page (free tier), full report with remediation prose (Pro). Re-runnable on demand; cached for 24 hours.

Run a security audit on your repo

Pre-publish vs continuous: when to run which

The single most common mistake is treating an audit as a one-off pre-publish event. The class that drifts most is prompt-injection susceptibility — the model retrains, your sanitization weakens, and a server that scored an A in March can score a B in June without a single line of code change. The continuous (CI) mode catches that drift; the pre-publish mode can't.

Recommended cadence: run pre-publish to catch all the static-class findings you can fix before shipping; then wire the GitHub Action gate at grade ≥ B and let CI catch regressions per PR; then run a periodic re-audit (we trigger this monthly on the Pro plan and weekly on the Team plan) to catch the model-shift class. A worked install-gate policy. CI wiring details.

DIY: running parts of the audit yourself

If you want to validate what's in the audit before committing, the static layer is reproducible without SkillAudit:

Authors who run DIY for the static classes typically still come to SkillAudit for the LLM-probe layer plus the public badge. The badge is the social-proof piece that DIY can't generate.

Where this page sits in the cluster

This page is the security-only lens. Two siblings cover adjacent intents:

Related questions

What about non-security findings — license, performance, code style?

Out of scope for a security audit. Snyk and FOSSA cover license; the tooling landscape page names where each layer fits. Performance and style belong in your code review, not your security audit. Confusing the two is how audits stop being decisive — a security audit that returns a hundred style findings buries the SSRF.

Can the audit prove my skill is secure?

No. An audit can prove the absence of specific known-bad patterns and the presence of some known-good shapes. It can't prove the absence of an unknown attack class. The honest framing: an A grade means we couldn't find anything in the categories we test for; not "this is provably safe." Treat the grade as a strong filter, not a guarantee.

Are the eight classes the same as OWASP's API or LLM Top 10?

Overlap, not equivalence. We did the explicit cross-mapping on our OWASP page. Most OWASP API Top 10 items map onto MCP; most OWASP LLM-Apps Top 10 items also map; three threat shapes specific to MCP (credential-echo into tool response, scope-vs-handler drift, client-compatibility drift) aren't well captured by either OWASP list. Hence eight classes, not ten.

How do I get a private security audit?

Pro at $19/mo audits private repos. Connect GitHub identity, audit private repos, results are private to your account. Team at $99/mo adds SSO and policy export. The Free tier is public repos only; the public badge requires the audit to be public.

What if a finding is wrong (false positive)?

Email the audit URL to hello@skillaudit.dev. We re-run, often with a fresh LLM-probe pass, and either confirm or correct. Calibration deltas land in the public changelog (latest) — every engine revision moves some grades, and we publish the deltas so the calibration is auditable.

Is this enough for compliance?

For most internal-adoption decisions: yes. For SOC 2 / HIPAA / FedRAMP estates: it's the start, not the end. The named-firm signature on top is what regulators want; the SkillAudit report is what the firm uses as a starting point. The Team plan's policy export is shaped to feed control-objective evidence directly.

Further reading