Topic: mcp server regex denial of service security

MCP server ReDoS (regex denial-of-service) security — catastrophic backtracking, safe-regex, timeout defense

Regular expressions with ambiguous quantifiers can cause catastrophic backtracking when matched against specially crafted inputs. In Node.js, which runs a single-threaded event loop, a single ReDoS input can block the entire MCP server for seconds or minutes — making it unresponsive to all concurrent agent sessions. MCP servers are particularly exposed because they commonly apply validation regexes to user-supplied tool arguments (URLs, email addresses, paths, identifiers) without realizing the regex complexity.

1. How catastrophic backtracking works

Regular expression engines using backtracking (the NFA-based engine used by V8, which powers Node.js) attempt all possible ways to match a pattern when they encounter ambiguity. Patterns with nested or overlapping quantifiers — like (a+)+ or (\w+\s?)+ — can generate an exponential number of match attempts against adversarial inputs. The canonical example:

// The classic ReDoS pattern: nested quantifiers with overlapping character classes
const vulnerable = /^(a+)+$/;

// Benign input: matches immediately
console.log(vulnerable.test("aaaa"));       // true, near-instant

// Evil input: engineered to cause exponential backtracking
// Adding one more 'a' roughly doubles the time
console.log(vulnerable.test("aaaaaaaaaaaaaaaaX")); // hangs for seconds/minutes

// Why: the regex engine tries all possible groupings of 'a+' before failing
// For 'aaaaaaaaX', possible groupings: (aaaaaaaaX), (a)(aaaaaaX), (aa)(aaaaX)...
// Complexity is O(2^n) in the length of the non-matching prefix

// Another common ReDoS pattern: alternation with overlap
const emailLike = /^[\s\S]*?(\w+)*@/;  // DANGEROUS on inputs without '@'

// Measure the backtracking cost:
const start = Date.now();
try {
  vulnerable.test("a".repeat(30) + "X");
} catch (e) {}
console.log(`Matched in ${Date.now() - start}ms`); // May take minutes

2. Common vulnerable patterns in MCP server validation code

MCP servers apply regexes to tool arguments for validation. These patterns look reasonable but have catastrophic worst-case complexity on crafted inputs:

// VULNERABLE patterns commonly found in MCP server tool handlers

// Email validation — the classic ReDoS target
const VULNERABLE_EMAIL = /^([a-zA-Z0-9])(([\-.]|[_]+)?([a-zA-Z0-9]+))*(@){1}[a-z0-9]+[.]{1}(([a-z]{2,3})|([a-z]{2,3}[.]{1}[a-z]{2,3}))$/;
// Evil input: "a@" + "a".repeat(50) — exponential backtracking on the domain part

// URL validation with nested groups
const VULNERABLE_URL = /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/;
// Evil input: "https://" + "a/".repeat(50) — the ([\/\w \.-]*)* group backtracks exponentially

// Identifier validation with multiple optional groups
const VULNERABLE_ID = /^([a-zA-Z_][a-zA-Z0-9_]*\.?)*[a-zA-Z_][a-zA-Z0-9_]*$/;
// Evil input: "a.".repeat(50) + "!" — catastrophic on the repeated group

// Path validation with overlapping character classes
const VULNERABLE_PATH = /^(\/[a-zA-Z0-9\-\_\.\~\!\*\(\)]+)*\/?$/;
// Evil input: "/" + "a".repeat(50) + "!" — exponential on path segments

// These all appear in the wild in MCP server tool schemas
server.tool("send_email", {
  to: z.string().regex(VULNERABLE_EMAIL), // DANGEROUS
  subject: z.string(),
}, async ({ to, subject }) => { /* ... */ });

3. Static detection with safe-regex

The safe-regex npm package statically analyzes a regex and returns false if it has polynomial or exponential worst-case complexity. Use it in tests, CI, or at server startup to catch vulnerable patterns before they reach production:

import safeRegex from "safe-regex";

// Static check at module load time — crashes fast if a regex is dangerous
const regexesToValidate = {
  email: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/,
  url: /^https?:\/\/[^\s/$.?#].[^\s]*$/,
  identifier: /^[a-zA-Z_][a-zA-Z0-9_]{0,63}$/,
  path: /^(\/[a-zA-Z0-9._-]{1,255})+\/?$/,
};

for (const [name, regex] of Object.entries(regexesToValidate)) {
  if (!safeRegex(regex)) {
    // Fail fast at startup — don't let a dangerous regex reach production
    throw new Error(
      `Potentially unsafe regex detected for '${name}': ${regex.source}\n` +
      `Replace with a RE2-compatible or non-backtracking alternative.`
    );
  }
}

// In CI: lint all regex literals in the codebase
// npm install -g recheck  (recheck CLI: https://github.com/nicowillis/recheck)
// recheck check '([a-zA-Z0-9])(([\-.]|[_]+)?([a-zA-Z0-9]+))*(@){1}[a-z0-9]+'

4. Timeout defense using a Worker thread

For regexes you can't replace (legacy validation, third-party patterns), run the match in a Worker thread with a hard CPU timeout. If the Worker doesn't respond within the deadline, terminate it and fail safe:

// regex-worker.js (run in Worker thread)
const { workerData, parentPort } = require("worker_threads");
const { pattern, flags, input } = workerData;
const regex = new RegExp(pattern, flags);
const result = regex.test(input);
parentPort.postMessage({ result });

// main.js: run regex in Worker with timeout
import { Worker } from "worker_threads";
import path from "path";

function safeRegexTest(
  pattern: string,
  flags: string,
  input: string,
  timeoutMs = 50 // 50ms limit — generous for legit inputs, catches ReDoS
): Promise<boolean> {
  return new Promise((resolve, reject) => {
    const worker = new Worker(path.resolve("./regex-worker.js"), {
      workerData: { pattern, flags, input },
    });

    const timer = setTimeout(() => {
      worker.terminate(); // Kill the Worker thread — stops the backtracking
      reject(new Error(`Regex timeout: pattern /${pattern}/${flags} on input of length ${input.length}`));
    }, timeoutMs);

    worker.on("message", (msg) => {
      clearTimeout(timer);
      worker.terminate();
      resolve(msg.result);
    });

    worker.on("error", (err) => {
      clearTimeout(timer);
      reject(err);
    });
  });
}

// Usage in MCP tool handler
server.tool("validate_email", { email: z.string().max(254) }, async ({ email }) => {
  let valid: boolean;
  try {
    valid = await safeRegexTest(
      "^([a-zA-Z0-9])(([-.]|[_]+)?([a-zA-Z0-9]+))*@[a-z0-9]+[.][a-z]{2,6}$",
      "i",
      email,
      50 // 50ms hard limit
    );
  } catch (err) {
    // Timeout or error — treat as invalid input (fail safe)
    valid = false;
  }
  return { content: [{ type: "text", text: valid ? "valid" : "invalid" }] };
});

5. RE2 — guaranteed O(n) time matching

RE2 is a regex engine that guarantees linear-time matching by design — it uses deterministic finite automata (DFA) rather than backtracking NFA. The node-re2 package provides a drop-in RE2 engine for Node.js that is API-compatible with the built-in RegExp for most patterns. The limitation: RE2 does not support backreferences or lookahead/lookbehind assertions, which most validation regexes don't need anyway:

import RE2 from "re2";

// RE2 drop-in replacement for RegExp — guaranteed O(n) time
// No catastrophic backtracking possible, regardless of input
const emailRegex = new RE2(/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/);
const urlRegex = new RE2(/^https?:\/\/[^\s/$.?#][^\s]*$/);
const identifierRegex = new RE2(/^[a-zA-Z_][a-zA-Z0-9_]{0,63}$/);

// RE2 syntax incompatibilities to watch for:
// - No backreferences: \1, \2, etc. → use a different approach
// - No lookahead: (?=...) → rewrite without lookahead
// - No lookbehind: (?<=...) → rewrite without lookbehind
// - Unicode: \p{...} is supported in node-re2 with flag 'u'

// Testing: the RE2 version of an evil input completes instantly
const start = Date.now();
emailRegex.test("a".repeat(100) + "X");  // Returns false in O(n) time
console.log(`RE2 completed in ${Date.now() - start}ms`); // ~0ms

// Replace in MCP tool schema validation
function createSafeValidators() {
  return {
    isValidEmail: (s: string) => emailRegex.test(s),
    isValidUrl: (s: string) => urlRegex.test(s),
    isValidIdentifier: (s: string) => identifierRegex.test(s),
  };
}

const validators = createSafeValidators();

server.tool("process_webhook", {
  callbackUrl: z.string().max(2048).refine(validators.isValidUrl, "Invalid URL"),
  userId: z.string().max(64).refine(validators.isValidIdentifier, "Invalid user ID"),
}, async ({ callbackUrl, userId }) => {
  // Both validations complete in guaranteed O(n) time
  // No ReDoS risk regardless of what callbackUrl or userId contain
  return { content: [{ type: "text", text: "Webhook registered" }] };
});

6. Replacing vulnerable patterns with non-backtracking alternatives

Most validation regexes can be rewritten to eliminate ambiguous quantifiers while preserving the same semantic validation. The pattern is to replace nested quantifiers with flat, anchored patterns that match the exact character set at each position:

// BEFORE (vulnerable) → AFTER (safe) substitutions

// Email: replace nested groups with flat character class matching
// BEFORE: /^([a-zA-Z0-9])(([\-.]|[_]+)?([a-zA-Z0-9]+))*(@){1}[a-z0-9]+[.]{1}(([a-z]{2,3})|([a-z]{2,3}[.]{1}[a-z]{2,3}))$/
// AFTER: RFC-5321 compliant, no nested quantifiers
const SAFE_EMAIL = /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]{1,64}@[a-zA-Z0-9-]{1,63}(\.[a-zA-Z0-9-]{1,63})*\.[a-zA-Z]{2,}$/;

// URL: avoid the ([\/\w \.-]*)* trap — use a flat path character class
// BEFORE: /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
// AFTER: flat, no nested quantifiers
const SAFE_URL = /^https?:\/\/[a-zA-Z0-9.-]{1,253}(\/[a-zA-Z0-9._~:/?#[\]@!$&'()*+,;=%-]{0,2048})?$/;

// Identifier: replace optional repeated group with simple character class
// BEFORE: /^([a-zA-Z_][a-zA-Z0-9_]*\.?)*[a-zA-Z_][a-zA-Z0-9_]*$/
// AFTER: flat, bounded length, no nested quantifiers
const SAFE_IDENTIFIER = /^[a-zA-Z_][a-zA-Z0-9_.]{0,127}$/;

// IP address: use explicit octet validation rather than nested numeric groups
// BEFORE: /^(\d{1,3}\.){3}\d{1,3}$/ (linear — actually safe, but shown for completeness)
// BETTER: validate semantically after simple structural match
function isValidIPv4(s: string): boolean {
  const SIMPLE_IP = /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/;
  if (!SIMPLE_IP.test(s)) return false;
  return s.split(".").every(part => {
    const n = parseInt(part, 10);
    return n >= 0 && n <= 255;
  });
}

SkillAudit findings and grade impacts

Finding → Grade Impact
Critical Catastrophic backtracking regex applied to user-supplied input without timeout — a single crafted input blocks the event loop for minutes. −25 points.
High ReDoS-vulnerable email or URL validation regex on the critical tool-call path — most-trafficked path also highest-impact for DoS. −15 points.
High Regex on user input not checked with safe-regex or recheck — potential polynomial/exponential patterns not detected before deployment. −12 points.
High Event-loop-blocking regex in stdio MCP server — blocks all stdio reads and writes including tool calls from the LLM client. −10 points.
Medium No regex timeout or interruption mechanism — if a slow regex is executed, there is no way to recover other than a process restart. −6 points.
Medium Complex regexes without documentation of complexity bounds — no comment explaining worst-case behavior, no safe-regex CI check. −4 points.

Audit your MCP server for ReDoS-vulnerable regexes. SkillAudit's static analysis runs safe-regex against every regex literal in your codebase and flags patterns with polynomial or exponential worst-case complexity on tool-argument validation paths. Run a free audit →