Topic: mcp server zero-trust

MCP server zero-trust — applying zero-trust principles to MCP tool handlers

Zero-trust is often described as "never trust, always verify" — a network architecture principle that replaces perimeter-based trust with per-request verification. Applied to MCP servers, it means something concrete: the tool handler must never assume the argument it received is safe just because a language model sent it. Fifty percent of the 101 MCP servers in the SkillAudit corpus have SSRF findings that exist precisely because the handler trusted the model's argument. Zero-trust rewrites that assumption at the code level.

TL;DR

Zero-trust for MCP servers means four things: (1) validate every tool argument at the handler boundary before using it, (2) allowlist every outbound network target — never pass an argument-derived URL directly to fetch(), (3) treat the calling model as an untrusted caller whose arguments may have been shaped by attacker-controlled content, and (4) request only the permissions the handler actually needs and revoke them when not in use. Most F-grade findings in the SkillAudit corpus are the result of violating exactly one of these four principles.

Why "never trust the model" is the right framing for MCP

In a traditional API, you trust the caller based on their authentication token. A caller with a valid bearer token is assumed to have been validated by your auth system. Zero-trust updates that assumption to require per-request authorization — but still, the identity of the caller is bounded.

In MCP, the "caller" is a language model whose arguments are generated by inference over a context window. That context window may contain:

A document the user asked the agent to summarize — which may include injection payloads
A web page the agent fetched — which may have been crafted by an adversary who knew the agent would fetch it
A file from a repository the agent was asked to review — which may have been committed specifically to shape the agent's tool calls

The model is not making security decisions when it constructs tool arguments. It is predicting the most plausible next token given its context, and that context may have been adversarially shaped. Zero-trust in this environment means: every tool argument is untrusted until your handler validates it, regardless of which model sent it.

Principle 1: Validate every tool argument at the handler boundary

Argument validation is the first and most tractable zero-trust control. Every tool argument that enters your handler should be validated against an explicit schema before it's used in any operation.

Zod (TypeScript/JavaScript) and Pydantic (Python) are the standard libraries for this, and both integrate cleanly with the MCP SDK's tool registration APIs. The pattern is:

// TypeScript — zero-trust validated handler
import { z } from 'zod';

const FetchPageArgs = z.object({
  url: z.string().url().refine(
    (u) => ALLOWLISTED_HOSTS.includes(new URL(u).hostname),
    { message: 'Host not in allowlist' }
  ),
  max_length: z.number().int().min(1).max(50_000).default(10_000),
});

server.tool('fetch_page', FetchPageArgs, async (args) => {
  const { url, max_length } = FetchPageArgs.parse(args);
  // url is now guaranteed to be from an allowed host
  const response = await fetch(url);
  const text = (await response.text()).slice(0, max_length);
  return { content: [{ type: 'text', text }] };
});

The key property: the validation runs before any operation, and the handler doesn't proceed unless validation passes. An argument that fails validation returns an error to the model — which is the correct behavior under zero-trust. The model receiving an error rather than a result is not a problem; it's the system working as designed.

Principle 2: Allowlist every outbound network target

The SSRF finding class — present in half of all servers in our corpus — exists because handlers pass an argument-derived URL directly to fetch() without checking whether the target is one the server was designed to reach. Under zero-trust, you define exactly which hosts the handler is allowed to contact and reject anything outside that list.

// BAD — implicit trust in the argument
async function fetchTool({ url }: { url: string }) {
  return fetch(url); // any URL the model sends will be fetched
}

// GOOD — explicit allowlist
const ALLOWED_HOSTS = new Set(['api.github.com', 'registry.npmjs.org']);

async function fetchTool({ url }: { url: string }) {
  const parsed = new URL(url);
  if (!ALLOWED_HOSTS.has(parsed.hostname)) {
    throw new Error(`Host ${parsed.hostname} is not in the allowlist`);
  }
  return fetch(url);
}

Two subtleties matter here. First, the allowlist check must happen on the resolved hostname, not on the raw URL string — a URL like https://api.github.com@internal.corp/path has api.github.com in the string but routes to internal.corp. Using new URL(url).hostname handles this correctly. Second, if your server is running in an environment where DNS can be controlled by an adversary (a shared hosting environment, for example), an allowlist on the hostname alone is not sufficient — you also need to resolve the hostname first and verify the resulting IP isn't in a private range.

Principle 3: Treat the calling model as an untrusted caller

The most practically demanding shift in the zero-trust MCP model is treating the model's intent as unreliable. In a human-operated API, you might assume that a caller sending delete_item(id=42) was instructed to delete item 42 by a legitimate user. In an MCP context, the model may have been instructed to call delete_item(id=42) by injected content in a document it was processing — content that the user never saw.

This doesn't mean making every tool call require explicit human confirmation (that defeats the purpose of agent automation). It means designing your tool surface so that individual tool calls cannot cause irreversible harm in isolation. Specifically:

Destructive operations should require explicit confirmation arguments — a confirm: true parameter that the user, not the model, must set in the initial instruction
Scope-expanding operations should be narrowed — a search_files(pattern) tool that can match any path on the filesystem is a broader blast radius than one constrained to a declared working directory
High-value write operations should have audit logs that record what argument the model provided, so a human can review the call history without relying on the model's own reporting

Principle 4: Minimum-privilege permission surface

Zero-trust's minimum-privilege principle applies directly to MCP servers. The server should request only the OAuth scopes, filesystem paths, and API credentials that the handlers actually need — not what might be useful in the future, and not a convenience broad grant to avoid adding scopes later.

In the SkillAudit corpus, scope-vs-handler drift is detectable in servers that request repo (full repository read/write) when their handlers only perform read operations, or that request admin:org when they only need read:org. The extra scope doesn't make the server more useful; it expands the blast radius of a compromised argument.

The practical discipline is: when you register a new tool handler, audit what permissions it actually exercises and compare against what your server's manifest declares. The security review checklist includes this as a named step — scope vs. handler drift is one of the five things a reviewer should check that static analysis alone may not surface.

What zero-trust does not solve

Zero-trust at the MCP layer is a set of implementation controls, not a complete security architecture. It doesn't solve:

Prompt-injection susceptibility in the model itself — zero-trust validation in the handler catches bad arguments, but a model that has already been injected into calling the right tool with legitimate-looking arguments will pass validation. The structural protection is wrapping external content in markers that the model is trained to distrust, not validating the argument the injection produced.
Supply-chain risks in your dependencies — a compromised npm package in your server's dependency tree can bypass all your handler-level validation. This is the domain of dependency security rather than zero-trust.
Authentication and authorization between the user and the agent — zero-trust at the MCP server level assumes the agent orchestration layer has already authenticated the user. If the orchestration layer is compromised, no handler-level control helps.

Zero-trust is the right frame for the tool handler layer — it's the layer MCP server authors control directly. But it should be composed with prompt-injection controls, dependency hygiene, and authentication discipline at the other layers rather than treated as a complete solution.

How SkillAudit scores zero-trust adherence

The SkillAudit security and permissions axes grade the zero-trust properties directly. The security axis catches violations of Principles 1, 2, and 3 — SSRF (no outbound allowlist), command injection (no argument validation before shell use), and prompt-injection surface (external content returned unsanitized). The permissions axis catches violations of Principle 4 — scope requests broader than what the handlers exercise.

A server that implements all four zero-trust principles will score well on both axes, and the combination is the primary driver of A grades in the corpus. Of the 19 servers that earned an A across 101 audited, none had violations on the outbound-allowlist or argument-validation dimensions. The permissions axis was the more varied one — most A-grade servers had some degree of scope excess that didn't rise to a finding but left optimization headroom.

Run a free audit at skillaudit.dev to see where your server sits against the zero-trust criteria, or read the methodology page for the complete grading rubric.