Topic: mcp server prompt injection

MCP Server Prompt Injection — Detection, Prevention, and Testing

Prompt injection in MCP servers is a distinct threat class from web-application prompt injection: the attack surface is tool call arguments and the content returned from tool executions, not a chat input field. This page explains the three patterns that appear most frequently in the 101-server public corpus, how automated detection differs from static analysis, and what prevention looks like at the code level.

What makes MCP prompt injection different

In a web application, prompt injection typically enters through a user-controlled text field that is embedded into an LLM system prompt. In an MCP server, the injection surface is different: the model itself generates the tool call arguments. This means the attack chain is: attacker-controlled content → model context → tool arguments → tool handler code → tool return value → model context again. The loop is closed inside the model's reasoning, not at a web form boundary.

Three concrete entry points from the corpus:

Why static analysis cannot fully catch prompt injection

SSRF and command-exec can be detected statically: the engine looks for untrusted input flowing to fetch() or exec(). Prompt injection cannot be detected the same way because the question is not "does this code do a dangerous thing?" but "would an LLM follow embedded instructions in this data?" — a semantic question, not a syntactic one.

The SkillAudit engine uses an LLM-assisted probe on handlers that return externally-sourced string content: it constructs a synthetic payload containing an embedded instruction, submits it through the handler, and checks whether the model's response incorporates the instruction. This approach catches the behavioral cases that a grep or taint-flow analysis misses. It is also the reason the Security axis contributes more to the overall grade than a simple line-count of dangerous patterns would suggest: one undetected prompt injection finding in a handler that processes attacker-reachable content is a critical finding regardless of how clean the rest of the codebase is.

Prevention at the code level

Three prevention patterns, in order of how often each prevents a real corpus finding:

  1. Parse, don't pass through. For any tool that calls an external API, define a typed return schema and populate only the fields in that schema from the API response. Never return res.text() or JSON.stringify(rawResponse) directly. If you only need the article title and publish date from a web page, parse those two fields and return a { title: string, publishedAt: string } object. The injected content in the article body never reaches the model.
  2. Mark untrusted content explicitly. For tools that genuinely need to return free-form text from external sources (search results, document content), prefix the return value with a static marker that the system prompt can use to set trust level: return "UNTRUSTED_EXTERNAL_CONTENT:\n" + sanitized_text. Document this convention in the tool description so the model knows to treat the content as data, not as instructions.
  3. Validate arg shapes aggressively. Constrain every tool argument to the narrowest type that satisfies the tool's requirements. A tool that looks up a user by ID should accept a UUID string, not an arbitrary string. A tool that filters records by date should accept an ISO-8601 date string, not a freeform query. The tighter the schema, the smaller the surface for prompt-level manipulation of tool behavior through crafted arguments.

Testing your server for prompt injection

Manual testing requires thinking like an adversarial content author. For each tool that retrieves external content:

Automated testing via the SkillAudit audit form runs this probe systematically across all handlers that return external content, with a library of payloads calibrated against the 101-server corpus's failure patterns. The public audit report shows per-handler findings with the specific payload and response that triggered the finding — giving you the exact code path to fix.

Test your MCP server for prompt injection in 60 seconds.

Run a free audit → Read the rubric →