Topic: mcp server input fuzzing security

MCP server input fuzzing security — black-box fuzzing tool handlers, mutation testing, and crash triage for MCP APIs

MCP tool handlers are distinct from conventional REST APIs in a critical way: their inputs are generated by an LLM that has been processing arbitrary user text, including content from external documents, web pages, and tool outputs that may contain prompt injection payloads. The LLM's argument generation is probabilistic and non-deterministic — it can produce structurally valid JSON that is semantically absurd, and a prompt injection payload can nudge that generation toward edge cases that expose path traversals, type confusion bugs, and crashes. Black-box fuzzing finds these before attackers do.

Why MCP tool inputs are a fuzzing target

Conventional API input validation assumes a human or a well-typed SDK is generating the request. MCP tool arguments come from an LLM, which means:

Type coercion assumptions break. The schema says count is an integer, but the LLM may provide "42", 42.7, -1, or 9999999999. A handler that does db.query(LIMIT args.count) without integer validation is vulnerable to injection via coercion.
Deeply nested structures appear. Object and array arguments can be nested to arbitrary depth if the LLM is constructing them from parsed text. Recursive schema validation and object spread operations that assume shallow inputs will crash or consume excessive CPU.
Required fields may be null. The LLM may pass null for a required field if prompted text suggested the field is optional. Handlers that assume required fields are non-null will throw at the first property access.
Unicode edge cases arrive. User text that flows into tool arguments may contain right-to-left override characters, null bytes, homoglyph substitutions (Cyrillic а for Latin a in a domain name), or overlong UTF-8 sequences that bypass string-level checks.

A minimal fuzzing harness for MCP tool handlers

You don't need AFL++ or a full fuzzing infrastructure to usefully fuzz an MCP server. A type-aware mutation fuzzer targeting your JSON Schema is enough to find the majority of handler crashes:

// fuzz-runner.ts — minimal schema-aware MCP tool fuzzer
import { myServer } from './server';

// Mutation strategies — each transforms a seed value
const mutators = {
  string: (v: string) => [
    '', // empty string
    v + '\x00', // null byte
    v.repeat(10_000), // oversized
    '../../../etc/passwd', // path traversal
    '‮' + v, // right-to-left override
    "'; DROP TABLE users; --", // SQL injection attempt
    '{{' + v + '}}', // template injection
    null, // null where string expected
    42, // wrong type
  ],
  number: (v: number) => [
    -1, 0, -2147483648, 2147483647, 9999999999,
    0.1, NaN, Infinity, -Infinity,
    '42', // string coercion
    null, undefined,
  ],
  array: (v: unknown[]) => [
    [], // empty
    new Array(10_000).fill(v[0]), // oversized
    [...v, null], // null element
    null, // null where array expected
  ],
};

async function fuzzTool(toolName: string, seedArgs: Record<string, unknown>) {
  const results: Array<{ args: unknown; error: string }> = [];

  for (const [key, seedValue] of Object.entries(seedArgs)) {
    const type = typeof seedValue === 'string' ? 'string'
      : typeof seedValue === 'number' ? 'number'
      : Array.isArray(seedValue) ? 'array'
      : 'string';

    const mutations = mutators[type]?.(seedValue as never) ?? [null, '', 0];

    for (const mutant of mutations) {
      const args = { ...seedArgs, [key]: mutant };
      try {
        await myServer.callTool(toolName, args);
      } catch (err) {
        const msg = err instanceof Error ? err.message : String(err);
        // Expected validation errors are OK; unexpected crashes are findings
        if (!msg.includes('Invalid') && !msg.includes('required')) {
          results.push({ args, error: msg });
        }
      }
    }
  }

  return results;
}

// Example: fuzz read_file tool
const crashes = await fuzzTool('read_file', { path: 'readme.md' });
console.log('Unexpected crashes:', crashes.length);
crashes.forEach(c => console.log(JSON.stringify(c)));

What to look for in fuzzing output

Not all thrown errors are findings. Classify crash outputs into three categories:

Expected validation errors (not findings) — "path is required", "invalid argument type", "count must be a positive integer." These mean your validation is working correctly.
Unexpected internal errors (MEDIUM findings) — "Cannot read property 'join' of null", "ENOENT: no such file or directory: /app/../../../etc/passwd", "RangeError: Maximum call stack size exceeded." These indicate a missing null check, a path traversal escaping the sandbox, or a recursion depth issue.
Information-disclosing errors (HIGH findings) — stack traces that reveal internal file paths, database connection strings in error messages, SQL query templates with embedded unsanitized arguments. These combine a crash with a reconnaissance payload.

Automated fuzzing in CI with fast-check

Property-based testing via fast-check integrates fuzzing into your normal test suite without a separate fuzzing infrastructure:

import fc from 'fast-check';
import { callTool } from './server';

describe('read_file fuzzing', () => {
  it('never crashes on arbitrary path inputs', async () => {
    await fc.assert(
      fc.asyncProperty(fc.string(), async (path) => {
        try {
          await callTool('read_file', { path });
        } catch (err) {
          const msg = String(err);
          // Assertion: errors must be safe validation messages
          expect(msg).not.toMatch(/ENOENT.*\.\.\//); // no path traversal in error
          expect(msg).not.toMatch(/Cannot read prop/); // no null reference crash
          expect(msg).not.toMatch(/at Object\./); // no stack trace leak
        }
      }),
      { numRuns: 1_000 }
    );
  });

  it('integer fields reject non-integers without crashing', async () => {
    const arbitraryNonInteger = fc.oneof(
      fc.string(), fc.double(), fc.constant(null),
      fc.constant(undefined), fc.constant(Infinity)
    );
    await fc.assert(
      fc.asyncProperty(arbitraryNonInteger, async (count) => {
        try {
          await callTool('list_items', { count });
        } catch (err) {
          // Must throw a clean validation error — not a process crash
          expect(err).toBeInstanceOf(Error);
        }
      }),
      { numRuns: 500 }
    );
  });
});

SkillAudit detection

HIGH: Path traversal found via fuzzing — ../ or absolute paths escape allowed root without a containment check.
HIGH: Null dereference that leaks internal path or stack trace in error message — information disclosure combined with crash.
MEDIUM: Integer fields accept strings without coercion check — downstream SQL or system call constructed from unvalidated type.
MEDIUM: Recursive schema validation without depth limit — deeply nested input causes stack overflow or CPU exhaustion.
LOW: No property-based or mutation fuzz tests in test suite — static analysis alone is insufficient for LLM-generated inputs.
INFO: No maxLength constraint on string fields — oversized inputs may cause memory pressure without crashing.

Static analysis catches known-bad patterns in source code; fuzzing catches unknown-bad behavior at runtime. See the MCP scanner vs. SAST comparison for why both are needed, or run a SkillAudit scan to get static findings on your server's input validation coverage.