Published 3 June 2026 · Blog

Property-based fuzzing for MCP server security testing

Unit tests verify the cases you already thought of. Property-based fuzzing automatically generates thousands of adversarial inputs — null bytes, path traversal strings, JSON with unexpected types, integers at the edges of range — and breaks your tool handlers before an attacker does. Here is how to wire it up for MCP servers.

A typical MCP server tool handler has five or six happy-path unit tests. A developer writes one for an empty input, one for a valid file path, maybe one for a missing resource. The test suite passes. The handler ships.

Three months later, a security researcher submits a path like ../../etc/passwd and the realpath check — present in the code, tested in isolation — fails to fire because the full path construction uses string concatenation before the check, and the test never exercised that specific ordering. Or a numeric parameter passed as a string causes a silent parseInt(NaN) that becomes 0, which reads the first record in the database instead of returning an error. Or a Unicode zero-width joiner bypasses the blocklist regex that was tested only against ASCII inputs.

These are not exotic bugs. They are the exact kind of input-space edge cases that property-based fuzzing finds automatically, without anyone having to think to write the test case first.

What property-based testing actually is

Property-based testing inverts the unit test model. Instead of specifying "given input X, expect output Y," you specify a property that must hold for any valid input the framework generates: "for any file path my tool receives, the resolved absolute path must always start with the allowed base directory." The framework generates hundreds or thousands of random inputs, checks whether the property holds, and if it finds a counterexample it minimizes (shrinks) that counterexample to the smallest failing case before reporting it.

For MCP server security testing, properties map naturally to security invariants:

These properties cannot be exhaustively verified by hand-written tests. Property-based testing explores the input space systematically.

Setup: fast-check for Node.js MCP servers

fast-check is the standard property-based testing library for TypeScript and JavaScript. It integrates with any test runner — Jest, Vitest, Node's built-in test runner — and includes built-in arbitrary generators for strings, integers, arrays, objects, and Unicode text.

npm install --save-dev fast-check vitest

For a Python MCP server, the equivalent is Hypothesis with pytest:

pip install hypothesis pytest

The examples below use fast-check + TypeScript, but the Hypothesis equivalent is structurally identical.

Test 1: Path traversal property

The most important property for any file-reading tool is path containment. Here is how to express and test it:

import { describe, it, expect } from 'vitest';
import fc from 'fast-check';
import path from 'path';
import { readDocumentTool } from '../src/tools/read-document.js';

const WORKSPACE_ROOT = '/app/workspace';

describe('read_document — path containment property', () => {
  it('always resolves within workspace root for any string input', async () => {
    await fc.assert(
      fc.asyncProperty(
        fc.string({ unit: 'grapheme', minLength: 1, maxLength: 512 }),
        async (inputPath) => {
          const result = await readDocumentTool({ path: inputPath });

          if (result.isError) {
            // A rejection response is always valid — the tool rejected the input
            return true;
          }

          // If the tool succeeded, the path it actually read must be within root
          const resolvedPath = path.resolve(WORKSPACE_ROOT, inputPath);
          expect(resolvedPath.startsWith(WORKSPACE_ROOT + path.sep)).toBe(true);
        }
      ),
      { numRuns: 1000, verbose: true }
    );
  });
});

Running this test against a tool implementation that uses path.join(root, userPath) but checks containment before resolving symlinks will discover the traversal: fast-check generates ../../../etc/passwd within the first few dozen runs, the containment check fails (the joined-but-not-resolved path appears safe), and the framework shrinks the counterexample to the minimal traversal string.

The key design decision is the "rejection is always valid" branch. Properties for MCP tools should not require a successful response — only that if a response is successful, the invariants hold. This prevents false positives from valid error handling.

Test 2: No exception leakage

MCP servers that propagate uncaught exceptions to the LLM leak stack traces containing internal paths, dependency versions, and sometimes credentials. This property verifies that any input produces a structured response, never an exception:

describe('fetch_url — no exception leakage property', () => {
  it('never throws for any URL-shaped string input', async () => {
    await fc.assert(
      fc.asyncProperty(
        // Generate adversarial URL-shaped strings: valid URLs, invalid URLs,
        // javascript: scheme, file:// scheme, data: URLs, excessively long URLs
        fc.oneof(
          fc.webUrl(),
          fc.string({ unit: 'binary', minLength: 0, maxLength: 2048 }),
          fc.constant('javascript:alert(1)'),
          fc.constant('file:///etc/passwd'),
          fc.constant('data:text/html,'),
          fc.constant(''),
          fc.constant('a'.repeat(8192)),
        ),
        async (url) => {
          let result: unknown;
          let threw = false;

          try {
            result = await fetchUrlTool({ url });
          } catch (e) {
            threw = true;
          }

          // The handler must never throw — it must catch and return MCP error
          expect(threw).toBe(false);

          // If it returned, result must have MCP error or content shape
          expect(result).toBeDefined();
        }
      ),
      { numRuns: 500 }
    );
  });
});

This test reliably catches handlers that forget to wrap the URL fetch in a try-catch, or that validate the scheme but crash on the URL parse for malformed inputs like http:// with no host.

Test 3: Numeric argument bounds

Integer parameters in MCP tools are frequently used to control pagination, depth limits, and timeouts. Improper handling of boundary values (negative numbers, zero, MAX_SAFE_INTEGER, floating-point values passed as integers) causes amplification attacks and unexpected behavior:

describe('list_items — numeric bounds property', () => {
  it('rejects or safely handles any integer for the limit parameter', async () => {
    await fc.assert(
      fc.asyncProperty(
        fc.oneof(
          fc.integer({ min: -1_000_000, max: 1_000_000 }),
          fc.constant(0),
          fc.constant(-1),
          fc.constant(Number.MAX_SAFE_INTEGER),
          fc.constant(Number.MIN_SAFE_INTEGER),
          fc.constant(Infinity),
          fc.constant(NaN),
        ),
        async (limit) => {
          const result = await listItemsTool({ limit });

          if (result.isError) {
            // Rejection is fine — must include a non-leaking error message
            expect(result.content[0].text).not.toMatch(/TypeError|RangeError|at Object\./);
            return;
          }

          // If successful, the number of items returned must be within [0, MAX_PAGE_SIZE]
          const MAX_PAGE_SIZE = 100;
          const items = JSON.parse(result.content[0].text).items;
          expect(items.length).toBeGreaterThanOrEqual(0);
          expect(items.length).toBeLessThanOrEqual(MAX_PAGE_SIZE);
        }
      ),
      { numRuns: 200 }
    );
  });
});

A common bug this catches: handlers that use LIMIT ? in SQL queries without clamping first. Passing -1 to SQLite returns all rows; passing Number.MAX_SAFE_INTEGER triggers a memory exhaustion attempt.

Test 4: Unicode and binary input handling

MCP tool arguments are JSON strings. JSON strings can contain any Unicode codepoint, including control characters, right-to-left override characters, zero-width joiners, and null bytes embedded via . Security checks that use regex or simple string comparison often fail silently on these inputs:

describe('execute_query — blocklist bypass property', () => {
  // Generates strings that "contain" dangerous keywords but with Unicode tricks
  const adversarialString = fc.oneof(
    fc.string({ unit: 'grapheme' }),
    // Null byte injections
    fc.string({ unit: 'binary' }),
    // Strings that look like SQL keywords with Unicode lookalikes
    fc.constant('SEL​ECT * FROM users'),    // zero-width space
    fc.constant('SELECT'),                  // null byte
    fc.constant('DropTable'),                     // case variation
    fc.constant('drop table'),               // lowercase o via codepoint
    fc.constant('SELECT\r\nFROM'),                // CRLF
    fc.constant('SELECT'),                  // BOM prefix
  );

  it('blocklist is not bypassed by any Unicode variant of a blocked term', async () => {
    await fc.assert(
      fc.asyncProperty(
        adversarialString,
        async (query) => {
          const normalizedQuery = query.normalize('NFC').replace(/\0/g, '');
          const result = await executeQueryTool({ query });

          // If the normalized query contains a blocked term, the tool must reject
          const BLOCKED_TERMS = ['select', 'drop', 'delete', 'insert', 'update'];
          const hasBlocked = BLOCKED_TERMS.some(t =>
            normalizedQuery.toLowerCase().includes(t)
          );

          if (hasBlocked) {
            expect(result.isError).toBe(true);
          }
        }
      ),
      { numRuns: 300 }
    );
  });
});

The shrinking advantage

When fast-check finds a failing input, it does not just report it — it shrinks it to the minimal failing case. This is the feature that makes property-based testing practical for security work:

Initial failing input

../../../../../../../../../../../../../../../../../../../../../../../etc/passwd?q=x&r=y#anchor

Shrunk minimal case

../etc/passwd

The shrunk case is the one that appears in the test failure output. It is the smallest input that still triggers the property violation — easier to understand, easier to turn into a regression test, easier to use as the basis for a CVE proof of concept.

fast-check's shrinking is automatic for all built-in arbitraries. Custom arbitraries can define their own shrinker. For MCP security tests, the built-in string, integer, and URL arbitraries shrink well out of the box.

Hypothesis for Python MCP servers

The Python equivalent uses Hypothesis's @given decorator and st strategy namespace:

from hypothesis import given, settings, HealthCheck
from hypothesis import strategies as st
import pytest
from mcp_server.tools.read_file import read_file_tool

WORKSPACE_ROOT = "/app/workspace"

@given(st.text(min_size=1, max_size=512))
@settings(max_examples=1000, suppress_health_check=[HealthCheck.too_slow])
def test_read_file_path_containment(path_input):
    import os
    result = read_file_tool(path=path_input)

    if result.get("isError"):
        return  # Rejection is always valid

    resolved = os.path.realpath(os.path.join(WORKSPACE_ROOT, path_input))
    assert resolved.startswith(WORKSPACE_ROOT + os.sep), (
        f"Path escaped workspace root: {resolved}"
    )


@given(st.text() | st.binary().map(lambda b: b.decode('latin-1')))
@settings(max_examples=500)
def test_no_exception_leakage(query_input):
    try:
        result = read_file_tool(path=query_input)
    except Exception as e:
        pytest.fail(f"Tool raised an exception instead of returning MCP error: {e}")

    assert result is not None

Hypothesis maintains a database of failing examples across runs. If a specific input causes a failure, it is stored locally and replayed on every subsequent run — a persistent regression test created automatically from fuzzing.

Wiring into CI

Property-based tests belong in CI, but with appropriate configuration to avoid flakiness from randomness:

# vitest.config.ts — project-level fast-check settings
import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    globals: true,
    // Property tests can be slow — give them generous timeout
    testTimeout: 60_000,
    // Use a fixed seed in CI for reproducibility
    // Override with FAST_CHECK_SEED=random for local exploratory runs
    env: {
      FAST_CHECK_SEED: process.env.CI ? '12345' : String(Date.now()),
    },
  },
});

// In your property tests, read the seed:
const seed = parseInt(process.env.FAST_CHECK_SEED ?? String(Date.now()));
await fc.assert(myProperty, { seed, numRuns: 1000 });

Using a fixed seed in CI means the same runs execute on every build, making failures deterministic and attributable to code changes rather than random input generation. Use a random seed locally to explore new input space each run.

GitHub Actions snippet

Add a dedicated property-test job that runs after your unit test job:

jobs:
  property-tests:
    runs-on: ubuntu-latest
    needs: unit-tests
    env:
      FAST_CHECK_SEED: "12345"
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '22' }
      - run: npm ci
      - run: npx vitest run --reporter=verbose tests/property/

What property-based fuzzing does not catch

Property-based fuzzing is a tool, not a complete security solution. It does not catch:

This is why prompt injection and broader security checklists remain necessary alongside fuzzing. Property-based testing covers the input validation and error-handling layer; other approaches cover the semantic and configuration layers.

The SkillAudit connection

When SkillAudit runs a security audit on an MCP server, the static analysis component checks for several patterns that property-based testing would catch dynamically:

SkillAudit check What it detects statically What property test validates dynamically
path-traversal-no-realpath path.join(root, input) without subsequent realpath() + prefix check Path containment property over 1000 generated paths
unhandled-exception-surface Tool handlers without try-catch at the outermost scope No-exception property over 500 generated inputs
numeric-no-range-validation Integer params used in SQL LIMIT or array slice without clamping Numeric bounds property over boundary values
regex-blocklist-unicode-bypass Blocklist implemented as regex without Unicode normalization Adversarial Unicode string property

The static check tells you the pattern exists; the property test confirms whether the actual runtime behavior is safe. Both are necessary because static analysis can have false positives (the pattern is present but the context is safe) and property tests can have false negatives (the input space explored did not include the specific edge case that breaks the invariant).

Starting point: a fuzzing template

Here is a minimal template to add to any MCP server project. Drop it in tests/property/tool-security.test.ts:

import fc from 'fast-check';
import { describe, it, expect } from 'vitest';
import { yourTool } from '../../src/tools/your-tool.js';

const WORKSPACE_ROOT = '/your/root';

describe('security properties — your-tool', () => {
  it('path inputs never escape workspace root', async () => {
    await fc.assert(
      fc.asyncProperty(fc.string(), async (input) => {
        const result = await yourTool({ path: input });
        if (result.isError) return true;
        const resolved = require('path').resolve(WORKSPACE_ROOT, input);
        expect(resolved.startsWith(WORKSPACE_ROOT)).toBe(true);
      }),
      { numRuns: 500 }
    );
  });

  it('never throws for any string input', async () => {
    await fc.assert(
      fc.asyncProperty(
        fc.oneof(fc.string(), fc.string({ unit: 'binary' })),
        async (input) => {
          let threw = false;
          try { await yourTool({ path: input }); } catch { threw = true; }
          expect(threw).toBe(false);
        }
      ),
      { numRuns: 500 }
    );
  });
});

Start with two properties. Path containment and no-exception leakage cover the two highest-severity classes of MCP tool handler bugs. Everything else — numeric bounds, Unicode bypass, size amplification — is added incrementally. A 30-line property test file that runs in CI is worth more than a comprehensive fuzzing plan that never ships.

If you want SkillAudit to check whether your server's input validation holds up — including static analysis for the patterns that property tests catch dynamically — you can run a free audit. The audit report includes the specific handlers where we detected missing validation, which is the right starting point for writing the first property tests.

Related reading