Topic: mcp server regex dos

MCP server ReDoS — Regular Expression Denial of Service in tool handlers

Regular Expression Denial of Service (ReDoS) occurs when a regex engine backtracks exponentially on a crafted input. In MCP servers, tool handlers frequently apply regular expressions to LLM-supplied strings — to validate input, parse structured data, or extract information. A LLM that has been prompt-injected can supply an input specifically crafted to trigger catastrophic backtracking, consuming 100% of one CPU core for seconds to minutes per tool call. In a single-threaded Node.js process, this blocks the entire event loop. This page covers which regex patterns are vulnerable, how to detect them, and what to use instead.

Quick reference

Vulnerable patterns: Nested quantifiers like (a+)+, alternation with overlap like (a|aa)+, and chained optional groups like (a?){20} are the classic catastrophic backtracking patterns. Avoid them in regexes applied to untrusted input.
Input length cap: Apply a .max(N) constraint in your Zod schema before the regex sees the input. ReDoS exploit strings are typically long — capping at 500 characters dramatically reduces the available attack surface.
Linear-time regex engine: For complex patterns applied to untrusted input, use the re2 npm package (Google's RE2 engine). RE2 guarantees linear-time matching — no backtracking — at the cost of not supporting backreferences and lookaheads.
Timeout wrapper: For regexes you cannot replace with RE2, wrap the match call in a worker thread with a timeout so a slow match doesn't block the main event loop.

How catastrophic backtracking happens

JavaScript's regex engine uses backtracking NFA (Non-deterministic Finite Automaton) evaluation. For most regexes this is fast, but certain patterns cause exponential backtracking: the engine tries every possible combination of matches before concluding that no match exists. The vulnerable pattern family involves overlapping alternations or nested quantifiers where the engine cannot prune the search space.

// Classic vulnerable patterns — test how long each takes on a 30-char crafted input:

// 1. Nested quantifier — (a+)+ or (a*)*
const VULN_EMAIL = /^([a-zA-Z0-9._-]+@)+[a-zA-Z]{2,}$/
// Attack input: 'aaaaaaaaaaaaaaaaaaaaaaaaa@aaaaaaa@'
// Each 'a' before the '@' can be matched by the inner group in multiple ways

// 2. Alternation with overlapping matches — (a|aa)+
const VULN_DOMAIN = /^(https?:\/\/|ftp:\/\/)?([\w\-]+(\.[\w\-]+)+)(\/.*)?$/
// Certain inputs with many dots cause exponential backtracking

// 3. Chained optional groups — (a?){20} followed by a{20}
const VULN_REPEAT = /^(a?){30}a{30}$/
// For an input of 30 'a's that doesn't match, tries 2^30 combinations

// Test a regex for ReDoS vulnerability:
// 1. Try a crafted input that is just inside the expected format
//    but fails at the last character (e.g., 30 'a's + '@')
// 2. Time it with console.time / console.timeEnd
// 3. If > 10ms for a 50-char input, it's likely vulnerable

console.time('regex-test')
VULN_EMAIL.test('aaaaaaaaaaaaaaaaaaaaaaaaa@')
console.timeEnd('regex-test')
// Vulnerable regex: may take seconds or minutes for this input

Safe alternatives in MCP tool handlers

// Option 1: Use RE2 (linear-time engine, no backtracking)
// npm install re2
import RE2 from 're2'

// RE2 drop-in for the safe validation patterns below
const emailRe2 = new RE2('^[a-zA-Z0-9._%+\\-]{1,64}@[a-zA-Z0-9.\\-]{1,253}\\.[a-zA-Z]{2,}$')

server.tool('validate_email', 'Validate an email address format', {
  email: z.string().max(320),  // RFC 5321 max email length
}, async ({ email }) => {
  const valid = emailRe2.test(email)
  return { content: [{ type: 'text', text: JSON.stringify({ valid, email }) }] }
})

// Option 2: Use safe, non-backtracking regex patterns
// These patterns avoid nested quantifiers and overlapping alternation

// SAFE email format check (not full RFC 5321 — safe for input validation)
const SAFE_EMAIL_RE = /^[a-zA-Z0-9._%+\-]{1,64}@[a-zA-Z0-9.\-]{1,253}\.[a-zA-Z]{2,}$/
// Why safe: no nested quantifiers, no overlapping alternation
// Worst case: linear scan of the input — no backtracking

// SAFE URL format check
const SAFE_URL_RE = /^https?:\/\/[a-zA-Z0-9\-\.]{1,253}(:\d{1,5})?(\/[^\s]*)?$/
// Why safe: the character classes [a-zA-Z0-9\-\.] and [^\s] don't overlap
// with the adjacent literal characters — no catastrophic backtracking path

// SAFE slug/identifier check
const SAFE_SLUG_RE = /^[a-z0-9][a-z0-9\-]{0,253}[a-z0-9]$/

// Option 3: Validate structure first, then apply simple regex to each component
function safeParseEmail(email: string): { local: string; domain: string } | null {
  // Split at the last '@' first — no regex needed for structural check
  const atIndex = email.lastIndexOf('@')
  if (atIndex < 1 || atIndex > 64) return null
  const local = email.slice(0, atIndex)
  const domain = email.slice(atIndex + 1)
  if (!local || !domain) return null
  if (domain.length > 253) return null
  // Now apply simple, safe regexes to each component separately
  if (!/^[a-zA-Z0-9._%+\-]+$/.test(local)) return null
  if (!/^[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/.test(domain)) return null
  return { local, domain }
}

Worker thread timeout wrapper for unavoidable complex regexes

When a library you depend on uses complex regexes internally (e.g., a markdown parser, URL normalizer, or schema validator), you can't always replace the regex with RE2. Wrap the call in a Worker thread with a timeout to prevent a slow match from blocking the event loop.

import { Worker, isMainThread, parentPort, workerData } from 'worker_threads'
import { setTimeout as setTimeoutPromise } from 'timers/promises'

// Run a potentially-slow regex match in a worker thread with a timeout
function regexMatchWithTimeout(pattern: RegExp, input: string, timeoutMs: number): Promise {
  return new Promise((resolve, reject) => {
    const worker = new Worker(`
      const { parentPort, workerData } = require('worker_threads')
      const re = new RegExp(workerData.pattern.source, workerData.pattern.flags)
      const result = re.exec(workerData.input)
      parentPort.postMessage(result)
    `, {
      eval: true,
      workerData: { pattern, input },
    })

    const timer = setTimeout(() => {
      worker.terminate()
      reject(new Error(`Regex match timed out after ${timeoutMs}ms — possible ReDoS`))
    }, timeoutMs)

    worker.on('message', (result) => {
      clearTimeout(timer)
      resolve(result)
    })

    worker.on('error', (err) => {
      clearTimeout(timer)
      reject(err)
    })
  })
}

// Usage:
server.tool('parse_log_line', 'Extract fields from a log line', {
  logLine: z.string().max(10_000),
}, async ({ logLine }) => {
  const COMPLEX_LOG_RE = /^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?Z?)\s+\[([^\]]+)\]\s+(.+)$/
  try {
    const match = await regexMatchWithTimeout(COMPLEX_LOG_RE, logLine, 100)  // 100ms timeout
    if (!match) return { content: [{ type: 'text', text: 'No match' }] }
    return { content: [{ type: 'text', text: JSON.stringify({ time: match[1], level: match[2], msg: match[3] }) }] }
  } catch (err) {
    throw new Error('Log line parsing failed: input may have triggered ReDoS protection')
  }
})

What SkillAudit checks

Nested quantifiers in regexes applied to LLM-supplied input — HIGH; catastrophic backtracking can block the Node.js event loop for seconds per call
Overlapping alternation in regexes applied to LLM-supplied input — HIGH; similar exponential complexity on crafted inputs
No input length cap in Zod schema before regex application — WARN; longer inputs amplify backtracking time exponentially
Complex third-party parser (markdown, URL) applied to unsanitized LLM input without timeout — WARN; library may internally use vulnerable regex patterns

Detecting vulnerable regexes in your codebase

Three tools to scan your MCP server's source for ReDoS-vulnerable regex patterns:

safe-regex npm package — npx safe-regex '/your/regex/here'; returns true if the regex is safe (linear-time), false if potentially vulnerable
vuln-regex-detector — more comprehensive analysis; can be run in CI as a lint step
Manual crafting — for any regex matching repeated groups, test with a string of 30+ characters that nearly matches but fails at the last character. If the test takes >10ms, it's likely vulnerable.

How catastrophic backtracking happens

Safe alternatives in MCP tool handlers

Worker thread timeout wrapper for unavoidable complex regexes

What SkillAudit checks

Detecting vulnerable regexes in your codebase

See also