Security·Checklist·Pre-deployment

MCP Server Security Audit Checklist: The 15-Point Pre-Deployment Review

Most MCP server security vulnerabilities are preventable with a consistent pre-deployment review. This 15-point checklist covers the controls that matter most — authentication, input validation, secrets handling, logging, and transport — with explicit pass/fail criteria for each. Run it before every production deployment.

How to use this checklist: Work through each control. Mark PASS (control is in place and verified) or FAIL (missing or broken). Any CRITICAL or HIGH failure blocks the deployment. MEDIUM failures should be logged as technical debt with a remediation deadline before the release proceeds. A SkillAudit scan automates checks 1–9 and 13–14; controls 10–12 and 15 require human verification.

Category 1: Authentication and Authorization

Authentication failures are the leading cause of complete MCP server compromise. These five controls cover identity verification at every layer.

API key / token validation on every tool call

CRITICAL

Every tool invocation must authenticate the caller. Authentication must happen before any tool logic executes — even for "lightweight" read operations. A common mistake is to authenticate at the transport level but skip re-validation in the tool handler itself, creating bypass opportunities when the transport layer is misconfigured or proxied.

PASS: Auth validation is in a middleware or base handler that runs before all tool logic. The validation path is unit-tested with an invalid token input and returns 401/403.

FAIL: Any tool handler accesses request context, user data, or external APIs before validating the caller's identity. Auth logic is copy-pasted per-tool rather than centralized.

Tool-level authorization (not just session-level)

HIGH

Session-level authentication answers "who is this caller?" — tool-level authorization answers "is this caller allowed to invoke this specific tool right now?" Many MCP servers conflate the two. A session token that authenticates the caller for read_file should not automatically authorize them for delete_file or execute_command. Write tools, admin tools, and external-API-calling tools each require explicit authorization checks separate from session authentication. This is the core of the ambient authority problem.

PASS: Each tool has an explicit authorization check (role, scope, or capability) that is enforced in code and tested with a fixture that has valid session auth but insufficient tool-level permissions.

FAIL: Authorization is a single session-level check with no per-tool differentiation. All authenticated callers can invoke all tools.

Token expiry and rotation enforced

HIGH

Long-lived tokens are a persistence mechanism for attackers. An OAuth token, API key, or session credential without expiry means that a stolen token provides indefinite access. MCP server tokens should have a defined maximum lifetime, and the server should reject expired tokens without a grace period. JWT-based tokens must validate exp and iat claims and reject the algorithm none.

PASS: Tokens have a maximum lifetime (≤24h for API keys, ≤1h for session JWTs). Server rejects tokens beyond expiry with no configurable grace window. JWT validation rejects alg: none and validates exp.

FAIL: Tokens have no expiry, or expiry validation is optional / configurable off. JWT validation uses a library in "accept all algorithms" mode.

Sensitive tool confirmation gate

HIGH

Tools that write data, delete records, send emails, make payments, or execute shell commands should require explicit confirmation before execution. In an LLM-driven context, a prompt injection attack can cause the model to invoke destructive tools without user intent. A confirmation gate (human approval step, secondary auth factor, or cryptographic intent token) breaks the single-turn attack path.

PASS: All write, delete, and execute tools return a confirmation-required response on first invocation. Confirmation requires a separate, short-lived token that is checked on the second invocation.

FAIL: Write and delete tools execute on first invocation without any confirmation step. No distinction between read and write tools in the invocation flow.

Rate limiting on authentication endpoints

MEDIUM

Credential stuffing and brute-force attacks target MCP authentication endpoints. Without rate limiting, an attacker can test thousands of API keys or passwords per second. Rate limits must apply to both the authentication endpoint itself and to tool invocation (to prevent credential-free abuse of authenticated-but-cheap API calls).

PASS: Authentication attempts are rate-limited by IP (≤10/minute). Tool invocations are rate-limited by caller identity (≤100/minute or per-product SLA). Rate limits are enforced at the server level, not relying solely on upstream infrastructure.

FAIL: No rate limiting on authentication. Tool invocations are limited only by upstream infrastructure (CDN, API gateway) with no server-side enforcement.

Category 2: Input Validation and Injection Prevention

MCP tool arguments are attacker-controlled strings. These controls prevent injection attacks via malformed or malicious inputs.

Schema validation on all tool arguments

CRITICAL

Every tool must define an argument schema and validate all incoming arguments against it before any processing. Schema validation should enforce: required vs optional fields, data types (string, integer, boolean — not generic "any"), string length limits, numeric range constraints, and enum membership for fields with a fixed value set. Arguments that fail validation must be rejected with an error before any business logic runs.

PASS: Every tool has a schema (JSON Schema, Zod, Pydantic, etc.) that is enforced at the handler boundary. Invalid arguments return an error before any tool logic executes. Schema includes string length limits and type constraints — no open-ended any types on user-controlled fields.

FAIL: Tools accept arbitrary argument objects without schema validation. String fields have no length limits. Any argument type is accepted and passed into business logic directly.

Shell injection prevention

CRITICAL

If any tool constructs shell commands using user-supplied arguments, shell injection is a critical risk. Common patterns that introduce shell injection: template literals in exec() calls, string concatenation into subprocess.run(shell=True), and passing unvalidated arguments to child_process.exec. The fix is always to use parameterized subprocess calls with argument arrays, never string concatenation.

PASS: No tool constructs shell command strings from user arguments. All subprocess calls use argument arrays (execv, subprocess.run(['cmd', arg]), child_process.execFile). A grep for exec(, shell=True, and template literals in subprocess contexts returns zero results in tool handlers.

FAIL: Any tool constructs a shell command string from a user-supplied argument, even with escaping. Shell escaping is not a valid mitigation — only parameterized calls are.

Path traversal prevention

CRITICAL

File-operating tools that accept a filename or path argument are vulnerable to path traversal (e.g., ../../etc/passwd). The validation approach: resolve the path to its real absolute form, then verify it is still within the allowed base directory. Both steps are required — resolving alone doesn't prevent traversal, and prefix-checking alone doesn't handle symlinks.

PASS: All file path arguments are (1) normalized with path.resolve() / os.path.realpath(), and (2) checked to be within the allowed base directory using a string prefix check on the resolved path. The check is tested with ../ and %2e%2e%2f inputs.

FAIL: File paths are used as-is, or only one of the two steps is performed. URL-encoding traversal variants are not tested.

Prompt injection defense in tools that process external data

HIGH

Tools that fetch external content (web pages, database records, email, Slack messages) and return it to the LLM create a prompt injection attack surface. An attacker who can write to the external data source can embed instructions that redirect the model's next action. Mitigations: clearly delimit returned content with markers the model understands are data (not instructions), avoid returning raw HTML/Markdown that may contain formatting the model interprets as directives, and strip or escape control sequences before returning data to the context.

PASS: Tools that return external content wrap the data in a labeled delimiter and strip HTML/Markdown before returning. Documentation instructs integrating systems to use system prompt framing to tell the model that tool output is untrusted data.

FAIL: Tools return raw external content directly into the model context with no sanitization or delimiters. Web-fetching tools return full HTML.

Category 3: Secrets and Credential Handling

No secrets in source code or build artifacts

CRITICAL

API keys, database credentials, OAuth client secrets, and signing keys embedded in source code or Dockerfiles are exposed to everyone with repository access — and to the public if the repository is open-source. Run a secret scanning tool (git-secrets, trufflehog, gitleaks) against the entire git history, not just the current HEAD. Secrets committed and then deleted are still in history. The fix for a committed secret is always to rotate the credential first, then remove it from history.

PASS: Secret scanning tool passes with zero findings against full git history. All credentials are injected via environment variables or a secrets manager (Vault, AWS Secrets Manager, 1Password Secrets Automation) at runtime. Pre-commit hooks prevent future commits containing secrets.

FAIL: Any secret is found in source code, Dockerfile, configuration files, or git history. Secrets are not rotated before remediation.

Secrets never logged

CRITICAL

Structured logging libraries, request logging middleware, and error trackers commonly capture the full request object — including headers and body. If an API key is passed in an Authorization header or a POST body, it will appear in your logs and potentially in third-party error tracking (Sentry, Datadog). Audit every log statement that references request objects, tool arguments, or response bodies. See our secrets management guide for redaction middleware patterns.

PASS: Logging middleware has an allowlist of fields that may be logged (never the full request/response). Authorization headers are scrubbed before logging. Token/key fields in tool argument schemas are tagged as sensitive and excluded from structured log output.

FAIL: Request logging captures full headers including Authorization. Tool argument logging includes fields that may contain secrets. Error tracing sends full exception context including credentials.

Least-privilege service account credentials

HIGH

The credentials your MCP server uses to call downstream APIs should have only the permissions required for the tools it exposes. A server that exposes a send_email tool should have send-only SMTP credentials, not full mailbox read/write/admin access. This limits blast radius: if the server is compromised, the attacker inherits only the permissions you granted, not all permissions the service supports.

PASS: Each downstream API credential is scoped to the minimum permission set required by the tools that use it. Permissions are documented and reviewed. Admin-level credentials are not used for runtime operations.

FAIL: Runtime credentials have admin or full-access permissions. The same credential is used for both read and write operations when read-only would suffice.

Category 4: Logging and Observability

Structured audit log for all tool invocations

HIGH

Every tool invocation should produce a structured audit log event with: timestamp, caller identity, tool name, argument summary (sanitized), and outcome (success/failure/error). Audit logs are your primary forensic resource after a security incident. Without them, you cannot reconstruct what an attacker did after they gained access. Logs must be written to an append-only store that the server process cannot delete — use a separate log aggregation service, not just local files.

PASS: Every tool invocation emits a structured log event with caller identity, tool name, timestamp, and sanitized arguments. Logs are written to a remote log aggregation service the server process cannot write-delete. Logs are retained for ≥90 days.

FAIL: Tool invocations are not logged, or are logged only on error. Logs are stored locally in files the server process can write and delete. Logs do not include caller identity.

Error messages do not expose internal details

MEDIUM

Stack traces, database error messages, internal file paths, and dependency version strings returned to callers help attackers understand your server's internals. Error responses should return a generic error code and message to the caller while logging the full detail internally. Database errors especially should never surface to the API response — they often include table names, column names, and query structure that enable SQL injection fingerprinting.

PASS: Error responses contain a standardized error code (e.g., TOOL_EXECUTION_FAILED) and a generic human-readable message. Stack traces are logged internally only. Database errors are caught and mapped to generic error responses before leaving the tool handler.

FAIL: Error responses include stack traces, internal file paths, or database error messages. Dependency names and versions appear in error responses.

Category 5: Transport and Network Security

TLS enforced end-to-end; no plaintext fallback

CRITICAL

All MCP server communication must use TLS 1.2 or higher. This includes: the client-to-server connection, any server-to-downstream-API calls, and any internal service-to-service calls if the MCP server is decomposed into microservices. Plaintext fallback modes, development-only HTTP endpoints reachable in production, and NODE_TLS_REJECT_UNAUTHORIZED=0 environment variables in production are all deployment blockers.

PASS: Server only listens on HTTPS. HTTP requests are redirected to HTTPS, not served. All outbound API calls use HTTPS with certificate validation enabled. TLS version is ≥1.2. No NODE_TLS_REJECT_UNAUTHORIZED, verify=False, or equivalent disabled-validation patterns in production configuration.

FAIL: Server listens on both HTTP and HTTPS in production. Any outbound call uses verify=False or equivalent. TLS 1.0 or 1.1 is accepted. Internal service calls use plaintext.

Interpreting your results

After completing the checklist, categorize your results:

0 CRITICAL / 0 HIGH failures: Deployment can proceed. Log any MEDIUM failures as tracked issues with remediation deadlines.
1+ CRITICAL failure: Deployment is blocked. Fix all CRITICAL items before releasing. CRITICAL findings represent vulnerabilities with high likelihood of exploitation and high impact.
1+ HIGH failure: Deployment is blocked pending team review. HIGH findings may be acceptable to ship with documented risk acceptance and a remediation commitment, but must be reviewed by a senior engineer or security lead before release.

A checklist run takes approximately 45–60 minutes for a developer familiar with the codebase. Running it before every production deployment is the minimum; integrating automated checks (items 1–9, 13–14) into your CI/CD pipeline catches regressions before the human review is even needed.

Automating the checklist with SkillAudit

SkillAudit's static analyzer covers 11 of the 15 controls in this checklist automatically — every time you push to GitHub. Controls 1–9 (authentication, input validation, injection prevention) and 13–14 (audit logging, error message exposure) are analyzed from source code. Controls 10–12 (secrets management) and 15 (TLS configuration) require your deployment environment context and are supported via the SkillAudit deployment configuration scan.

When SkillAudit finds a failing control, it links directly to the offending code location with a specific remediation recommendation — rather than just flagging the control category as failed. This reduces the time from "checklist failure" to "deployed fix" from hours to minutes.

The full SkillAudit grade (A through F) reflects how many of these controls — and 30+ additional signals — are passing in your server. A grade of B or above means your server passes all CRITICAL and HIGH controls. An A grade means you're also clear on most MEDIUM and INFO controls.

For more detail on what each grade means and what controls it maps to, see our security review checklist and the sandboxing and isolation patterns post.

Run SkillAudit on your MCP server → Automate 11 of these 15 controls on every push. Free for public repositories.