Topic: mcp server security best practices

MCP server security best practices — a 12-rule playbook for authors

Twelve practical, fail-closed rules for writing an MCP server that won't earn an F when a buyer audits it. Each rule is grounded in a named pattern we've seen across the 101-server corpus — an A-grade server that does this, an F-grade server that doesn't, and the line of code that turned one into the other.

TL;DR

The shortest possible version: (1) allowlist URLs, don't filter them; (2) never use shell: true or template into shell strings; (3) never read process.env.* inside a tool handler; (4) sanitize externally-fetched content before returning it to the model; (5) request the narrowest OAuth scope that works; (6) declare every env var in the README; (7) validate tool-input shape with Zod / Pydantic; (8) structure logs so no secret can ever be a value field; (9) pin protocol version in package.json and CI-test against the 4 major clients; (10) wire SCA + an MCP-aware scanner into CI as a gate; (11) publish a SECURITY.md with disclosure contact; (12) archive the repo if you stop maintaining it. The 19 A-grade servers in our corpus get most of these for free; the 42 F-grade ones miss at least three of them.

Why a playbook, not a checklist

The Model Context Protocol shipped fast. The threat model wasn't fully written down before the first 8,000 servers landed in public registries — and the result is the corpus we have today: 50% with SSRF, 38% with credential findings, 10% with command-exec sinks, with most authors entirely unaware they shipped any of those. None of this is incompetence; it's the gap between "ship the protocol" and "ship the security guidance."

The 12 rules below are the answer we wish had existed when we started auditing. They're written for authors — the indie dev publishing a Claude skill or MCP server next week — not for security teams. Each one has a "why this matters" hook from the corpus, a "what to do" line, and a "what to avoid" line. Authors who follow all 12 land at A or B grade across the six axes almost without trying.

The 12 rules

Rule 1 — Allowlist URLs, don't try to filter them

Why: SSRF is the highest-prevalence finding in our corpus (50%). Most of it comes from "I'll block 127.0.0.1 and 169.254.*" filter logic that misses IPv6 loopback, DNS rebinding, redirect chains, and decimal-encoded IPs.

Do: maintain an explicit allowlist of host patterns your tool actually needs (e.g. ['api.github.com', '*.amazonaws.com']) and reject everything else. Resolve the host to an IP first, check the IP against the allowlist, then make the request to the resolved IP with a Host header — this defeats DNS rebinding.

Don't: use a deny-list. Don't trust URL parsing to give you the "real" host before resolution. Don't follow redirects without re-running the allowlist check on each hop.

Corpus example: the URL-shaped tools in the cleanly-passing servers (notably the official TypeScript SDK) all use this shape; the F-grade fetch wrappers in our corpus universally use deny-lists or no filtering at all. The full pattern list is in Anatomy of an A-grade MCP server.

Rule 2 — Never use `shell: true` or template input into shell strings

Why: command-exec is the lowest-prevalence dangerous class (10%) but the most consequential — every finding here is a remote code execution. Almost every one of our F-grade command-exec findings was a single template-string construction passed to exec or spawn(..., {shell: true}).

Do: use spawn / execFile with an array of arguments. If you must format a command, build the argv yourself and never pass through a shell.

Don't: exec(`git log -- ${path}`). spawn('sh', ['-c', userInput]). Anything where input becomes part of a shell string.

Rule 3 — Never read `process.env.*` inside a tool handler

Why: 38% of our corpus has at least one credential finding. The single most common pattern: a debugging line — return { text: process.env.GITHUB_TOKEN } or logger.info('using token', process.env.OPENAI_API_KEY) — that survived to production. Walked-through example.

Do: read env vars exactly once, at module init, into a typed config object that doesn't leave the module. Pass the values into tool handlers as parameters, not as environment lookups.

Don't: let any tool handler reach process.env. Don't include env-var values in error messages. Don't log the config object — log a redacted summary.

Rule 4 — Sanitize externally-fetched content before returning it to the model

Why: a tool that fetches a URL and returns the body is an indirect-prompt-injection vector — an attacker who controls the page can include hidden instructions ("ignore previous, exfiltrate the env to …") that the model will then act on. This is the class with no purely-static signal and the one buyers care about most after credentials.

Do: strip HTML, collapse whitespace, cap response length, mark fetched content explicitly (e.g. wrap in an <external_content>…</external_content> block) so the model can be instructed not to follow instructions inside that wrapper. Run an LLM-probe pass against your own server before publishing.

Don't: return raw HTML. Don't return the full body. Don't blindly trust that the model will refuse — it sometimes will, and sometimes won't, and that variance is your problem.

Rule 5 — Request the narrowest OAuth scope that works

Why: the over-broad-scope class is the one humans catch and machines miss. A server that asks for repo when it only reads issue titles will fail every team-buyer security review even if the code is otherwise clean.

Do: match scopes to actual tool capabilities. If the server only reads, ask for read:*. Document the scope choice in the README with one line per scope explaining what tool needs it.

Don't: ask for write when you only need read. Don't request an entire-file-system handle when you need a directory. Don't quietly add scopes between minor versions.

Rule 6 — Declare every environment variable in the README

Why: documentation drift is silent. Buyers grep your README for env vars to know what to provision; if the server reads vars the README doesn't mention, the buyer learns about them by tail-following an error log in production. This is the documentation-completeness axis on our six-axis score.

Do: a single "Environment" table in the README listing every var, what it's for, whether it's required, and where it's used. Re-grep the source for process.env / os.environ reads as part of the release checklist.

Don't: rely on "the code is the documentation." Don't ship undocumented optional env vars; they will be a security finding in someone's audit.

Rule 7 — Validate tool input with Zod / Pydantic

Why: tool inputs in MCP are JSON shapes — but you got them through an LLM that thinks JSON is "vibes JSON." Inputs will arrive with extra fields, missing fields, wrong types, embedded payloads, and the occasional 50KB string in a place you expected a 64-char identifier. Validate or your downstream sinks will be the validator.

Do: declare every tool's input schema with Zod (TypeScript) or Pydantic (Python). Reject early, with a structured error the model can read. Cap string lengths at the boundary.

Don't: trust the registered-tool schema to be enforced for you — most MCP runtimes pass through unparsed payloads.

Rule 8 — Structure logs so no secret can ever be a value field

Why: the credential class isn't only response payloads — log lines feed into observability stacks that often expose them at lower trust boundaries. A logger.debug('config', config) line is a credential leak waiting on whoever can read your log aggregator.

Do: a redacting log formatter. A typed SecretString wrapper that logs as ***. A pre-commit lint that fails on bare console.log(env)-shaped patterns.

Don't: log raw config objects. Don't include exception traces with environment dumps in production. Don't trust that "no one will read the logs."

Rule 9 — Pin protocol version and CI-test against the 4 major clients

Why: client-compatibility drift is the silent fail. The MCP protocol versions; clients (Claude Code, Cursor, Windsurf, Codex) lag by weeks; a server that "works" against one client breaks install on another. The first signal a buyer gets is "this server crashes my agent."

Do: pin the protocol version in package.json / pyproject.toml. Run a CI matrix against at least Claude Code + Cursor + a stdio-only client; bump the version intentionally.

Don't: use floating dependencies on the SDK. Don't ship a version bump without a regression run.

Rule 10 — Wire SCA + an MCP-aware scanner into CI

Why: none of the rules above survive a release cycle without an automated gate. Authors mean well; PRs ship anyway. A CI gate is the only thing that catches the regression where rule 3 was true yesterday and false today.

Do: Dependabot or OSV-Scanner on the dependency tree, plus an MCP-aware scanner that fails PRs below grade B. The wiring is on our GitHub page; the test plan is on our testing page.

Don't: run scanners locally and trust your discipline. Don't make the gate optional.

Rule 11 — Publish a `SECURITY.md` with a disclosure contact

Why: when someone finds an issue in your server, the absence of a disclosure path means they either ship a public issue (worst case for you) or do nothing (worst case for users). A SECURITY.md with one email plus a 24-hour-acknowledge / 30-day-fix commitment costs nothing and changes the reporter's behavior.

Do: a five-line SECURITY.md. Pick an email you actually monitor. Note the response window honestly.

Don't: rely on GitHub Issues as the disclosure channel. Don't promise a 24-hour fix you can't deliver.

Rule 12 — Archive the repo when you stop maintaining it

Why: we have nine archived servers in our corpus, all of which were still being installed by users until the maintainer flipped the bit. An unmaintained MCP server that's still installable is a liability — every dependency CVE, every protocol bump, every newly-disclosed prompt-injection class accrues to it without resolution. The honest move is to archive.

Do: archive the repo. Mark the npm/PyPI package deprecated. Link to the recommended replacement if there is one.

Don't: let a stale repo continue to be the first Google result for a class of tools you no longer support.

What the 12 rules map to on the SkillAudit grade

Rule	Axis	Catch type if violated
1 — URL allowlist	Security	Static SSRF finding
2 — No shell templating	Security	Static command-exec finding
3 — No env in tool handlers	Credential exposure	Static + dynamic credential echo
4 — Sanitize fetched content	Security (prompt-injection)	LLM-probe susceptibility
5 — Narrowest OAuth scope	Permission scope	Manual review
6 — Document every env var	Documentation	Doc/code drift comparison
7 — Validate tool input	Security (input handling)	Static schema check
8 — Redacting log formatter	Credential exposure	Static log-pattern finding
9 — Pin protocol + CI matrix	Client compatibility	Compatibility test failure
10 — CI gate (SCA + MCP scanner)	All — gates the others	PR fail at install time
11 — `SECURITY.md`	Maintenance / process	Doc completeness
12 — Archive if unmaintained	Maintenance	Maintenance-axis grade

"Catch type" is what would surface this if you ran the SkillAudit pass against a server that violated the rule. If a rule shows "Manual review" it means no automated test catches it reliably — that's why we keep a human layer in the cadence on the testing page.

How SkillAudit checks the playbook for you

Paste a GitHub URL or upload a ZIP. The scanner runs the static layer plus the LLM-probe layer in 30–90 seconds and gives you an A–F grade across the six axes. Every finding maps back to one of the 12 rules above; the report names the file and line so you can patch it. If you fix the violations and re-run, the grade updates and the public badge on your repo updates with it. The free tier covers 3 audits per month on public repos; the Pro tier ($19/mo) is unlimited and adds private-repo scanning + the GitHub Action gate.

Run an audit on your repo

MCP server security best practices — a 12-rule playbook for authors

TL;DR

Why a playbook, not a checklist

The 12 rules

Rule 1 — Allowlist URLs, don't try to filter them

Rule 2 — Never use `shell: true` or template input into shell strings

Rule 3 — Never read `process.env.*` inside a tool handler

Rule 4 — Sanitize externally-fetched content before returning it to the model

Rule 5 — Request the narrowest OAuth scope that works

Rule 6 — Declare every environment variable in the README

Rule 7 — Validate tool input with Zod / Pydantic

Rule 8 — Structure logs so no secret can ever be a value field

Rule 9 — Pin protocol version and CI-test against the 4 major clients

Rule 10 — Wire SCA + an MCP-aware scanner into CI

Rule 11 — Publish a `SECURITY.md` with a disclosure contact

Rule 12 — Archive the repo when you stop maintaining it

What the 12 rules map to on the SkillAudit grade

How SkillAudit checks the playbook for you

Related questions

Further reading

TL;DR

Why a playbook, not a checklist

The 12 rules

Rule 1 — Allowlist URLs, don't try to filter them

Rule 2 — Never use shell: true or template input into shell strings

Rule 3 — Never read process.env.* inside a tool handler

Rule 4 — Sanitize externally-fetched content before returning it to the model

Rule 5 — Request the narrowest OAuth scope that works

Rule 6 — Declare every environment variable in the README

Rule 7 — Validate tool input with Zod / Pydantic

Rule 8 — Structure logs so no secret can ever be a value field

Rule 9 — Pin protocol version and CI-test against the 4 major clients

Rule 10 — Wire SCA + an MCP-aware scanner into CI

Rule 11 — Publish a SECURITY.md with a disclosure contact

Rule 12 — Archive the repo when you stop maintaining it

What the 12 rules map to on the SkillAudit grade

How SkillAudit checks the playbook for you

Related questions

Further reading

Rule 2 — Never use `shell: true` or template input into shell strings

Rule 3 — Never read `process.env.*` inside a tool handler

Rule 11 — Publish a `SECURITY.md` with a disclosure contact