Engineering·Testing·Security

MCP server security testing with Vitest: unit tests for auth, input validation, and error handling

Static analysis and dynamic scanning catch many MCP server security issues — but they can't verify the runtime behavior your specific handler logic produces. A unit test suite targeting auth bypass, validation edge cases, and error response shape gives you a repeatable contract: "this server does not return stack traces; this server enforces authorization on every path; this server rejects null bytes in filenames." This guide shows you how to write that suite with Vitest, and maps each test category to the SkillAudit findings it covers.

Why unit tests for security?

Most MCP server developers write tests for functional correctness — does get_file return the right content? Security unit tests ask a different question: does the handler produce the wrong output for adversarial input? These are two separate test dimensions that rarely overlap.

The value of security unit tests over scanner-only coverage:

They run in CI on every PR — a regression in auth logic is caught before merge, not in a weekly SkillAudit re-scan.
They test your specific logic — a scanner knows that you're using Express and checks for known Express CVEs. A unit test knows that your get_org_report tool is supposed to enforce org membership and can verify it actually does.
They document your security invariants — the test suite is machine-readable spec for what properties you're committing to. New engineers reading it know exactly what the security contract is.
SkillAudit rewards them directly — the presence of a test file matching *.security.test.* or *.auth.test.* is a positive signal in the Security axis. Finding one during static analysis shifts the initial grade estimate upward before we even run the dynamic checks.

SkillAudit static analysis looks for vitest, jest, or mocha test files that import your tool handler functions and assert on error response shapes. Presence of security-targeted tests is a LOW-positive finding that contributes to a higher Security axis score, independent of whether the tests pass (we can't run them — but their existence indicates a security-conscious development process).

Project setup

Assume a standard MCP server structure:

src/
  tools/
    get-file.ts        # tool handler
    run-query.ts
    create-report.ts
  auth.ts             # auth helper
  errors.ts           # ToolError, AuthError, ValidationError
  safe-tool.ts        # top-level wrapper
vitest.config.ts
package.json

Install Vitest if you haven't already:

npm install --save-dev vitest @vitest/coverage-v8

Add to package.json:

"scripts": {
  "test": "vitest run",
  "test:security": "vitest run --reporter=verbose src/**/*.security.test.ts",
  "test:coverage": "vitest run --coverage"
}

And a minimal vitest.config.ts:

import { defineConfig } from "vitest/config";

export default defineConfig({
  test: {
    globals: true,
    environment: "node",
    coverage: {
      provider: "v8",
      reporter: ["text", "json", "html"],
      include: ["src/**/*.ts"],
      exclude: ["src/**/*.test.ts"],
    },
  },
});

Authorization bypass tests

Authorization bugs are the most common HIGH finding in SkillAudit reports. The typical pattern: auth is implemented correctly for the happy path (authenticated user, correct org), but a missing check on an edge case — null token, wrong token format, IDOR via manipulated ID — lets an attacker through. Unit tests are uniquely well-suited to exhaustively cover these cases because you control all the inputs.

Risky pattern

Testing only the happy path

If your auth tests only verify that a valid token returns the right data, you haven't tested your auth at all — you've tested your data-fetching logic.

Common auth edge cases that break in production but are never tested:

Missing Authorization header entirely
Malformed token (no Bearer prefix, wrong number of JWT segments)
Expired token (exp claim in the past)
Token signed with a different algorithm (none algorithm confusion)
IDOR: resource ID that belongs to a different org than the caller's token claims

Here's a complete auth bypass test suite for a hypothetical get_report tool that fetches a report by ID, enforcing that the caller's org matches the report's org:

// src/tools/get-report.security.test.ts
import { describe, it, expect, vi, beforeEach } from "vitest";
import { getReportHandler } from "./get-report.js";
import { db } from "../db.js";

vi.mock("../db.js");

const validToken = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1c2VyLTEiLCJvcmciOiJvcmctYWJjIiwiZXhwIjo5OTk5OTk5OTk5fQ.sig";
const expiredToken = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1c2VyLTEiLCJvcmciOiJvcmctYWJjIiwiZXhwIjoxfQ.sig";
const otherOrgToken = "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1c2VyLTIiLCJvcmciOiJvcmcteHl6IiwiZXhwIjo5OTk5OTk5OTk5fQ.sig";

beforeEach(() => {
  vi.mocked(db.reports.findById).mockResolvedValue({
    id: "report-123",
    orgId: "org-abc",
    data: { findings: [] },
  });
});

describe("get_report — authorization", () => {
  it("returns report for authenticated caller with matching org", async () => {
    const result = await getReportHandler(
      { reportId: "report-123" },
      { authorization: `Bearer ${validToken}` }
    );
    expect(result.isError).toBeFalsy();
    expect(result.content[0].text).toContain("findings");
  });

  it("rejects missing Authorization header", async () => {
    const result = await getReportHandler(
      { reportId: "report-123" },
      {} // no auth header
    );
    expect(result.isError).toBe(true);
    expect(result.content[0].text).toBe("Unauthorized");
    // IMPORTANT: same message as "not found" to prevent enumeration
  });

  it("rejects malformed token (no Bearer prefix)", async () => {
    const result = await getReportHandler(
      { reportId: "report-123" },
      { authorization: validToken } // missing "Bearer " prefix
    );
    expect(result.isError).toBe(true);
    expect(result.content[0].text).toBe("Unauthorized");
  });

  it("rejects expired token", async () => {
    const result = await getReportHandler(
      { reportId: "report-123" },
      { authorization: `Bearer ${expiredToken}` }
    );
    expect(result.isError).toBe(true);
    expect(result.content[0].text).toBe("Unauthorized");
  });

  it("rejects IDOR: report belongs to different org than caller's token", async () => {
    const result = await getReportHandler(
      { reportId: "report-123" }, // org-abc report
      { authorization: `Bearer ${otherOrgToken}` } // org-xyz token
    );
    expect(result.isError).toBe(true);
    // Must return same message as "not found" — not "forbidden" (which confirms the resource exists)
    expect(result.content[0].text).toBe("Unauthorized");
    expect(result.content[0].text).not.toContain("org");
    expect(result.content[0].text).not.toContain("permission");
    expect(result.content[0].text).not.toContain("report");
  });

  it("does NOT call db when token is invalid (short-circuits correctly)", async () => {
    await getReportHandler(
      { reportId: "report-123" },
      { authorization: "Bearer invalid-token" }
    );
    expect(db.reports.findById).not.toHaveBeenCalled();
  });
});

The IDOR test on line 50 deserves attention: it verifies not just that access is denied, but that the error message doesn't differ between "this report doesn't exist" and "this report exists but belongs to another org." Different messages let an attacker enumerate which report IDs exist across orgs. The last test verifies short-circuit behavior — if the token is invalid, your handler should never reach the database.

Input validation edge cases

Validation tests cover a different attack surface than auth tests: what happens when an attacker sends unexpected input shapes? Null bytes in strings, oversized arrays, negative integers where only positive are expected, Unicode lookalikes in email addresses, path traversal sequences in filenames. Most of these are easy to generate as test cases but hard to reason about from code review alone.

Good pattern

Property-based thinking for adversarial inputs

Group edge cases by attack category, not by field. Every string field can receive null bytes, path traversal, and oversized input. Test them systematically.

A practical structure: for each string field in each tool's input schema, run the same adversarial input battery.

// src/tools/write-file.security.test.ts
import { describe, it, expect } from "vitest";
import { writeFileHandler } from "./write-file.js";

// Adversarial string inputs — run against every string field
const ADVERSARIAL_PATHS = [
  "../../../etc/passwd",      // path traversal
  "../../.env",               // env file traversal
  "/etc/shadow",              // absolute path override
  "foo\x00bar",              // null byte injection
  "a".repeat(10_001),         // oversized input (> 10k)
  "",                         // empty string
  "   ",                      // whitespace only
  ".",                        // current directory
  "..",                       // parent directory
  "con",                      // Windows reserved name
  "foo/bar",                  // embedded slash
];

const ADVERSARIAL_CONTENT = [
  "a".repeat(10_000_001),     // oversized content (> 10MB)
  "\x00".repeat(100),         // null bytes in content
];

describe("write_file — input validation", () => {
  for (const path of ADVERSARIAL_PATHS) {
    it(`rejects adversarial path: ${JSON.stringify(path)}`, async () => {
      const result = await writeFileHandler({
        path,
        content: "hello",
      });
      expect(result.isError).toBe(true);
      // Must not expose the real FS path in the error message
      expect(result.content[0].text).not.toContain("/etc");
      expect(result.content[0].text).not.toContain("ENOENT");
      expect(result.content[0].text).not.toContain("__dirname");
    });
  }

  for (const content of ADVERSARIAL_CONTENT) {
    it(`rejects oversized/adversarial content (length ${content.length})`, async () => {
      const result = await writeFileHandler({
        path: "output.txt",
        content,
      });
      expect(result.isError).toBe(true);
    });
  }

  it("accepts a valid relative filename", async () => {
    const result = await writeFileHandler({
      path: "output.txt",
      content: "hello world",
    });
    expect(result.isError).toBeFalsy();
  });

  it("rejects integer coerced as string path (type safety)", async () => {
    const result = await writeFileHandler({
      path: 42 as unknown as string,
      content: "hello",
    });
    expect(result.isError).toBe(true);
  });
});

The parameterized loop (lines 24–35) runs 11 separate test cases from one block. When you add a new adversarial input to ADVERSARIAL_PATHS, every string-field tool gets the test automatically if you share this array across test files. The assertion on line 31 is as important as result.isError: you want to verify that the rejection doesn't leak internal filesystem details in the error message.

URL field validation

If your tool accepts a URL that it fetches (SSRF surface), the adversarial input set is different:

// Adversarial URLs — for any tool that fetches user-supplied URLs
const SSRF_PAYLOADS = [
  "http://169.254.169.254/latest/meta-data/",          // AWS metadata
  "http://metadata.google.internal/computeMetadata/",  // GCP metadata
  "http://localhost:5432/",                             // local postgres
  "http://127.0.0.1:6379/",                            // local redis
  "http://0.0.0.0/",                                   // SSRF bypass
  "http://[::1]/",                                     // IPv6 loopback
  "file:///etc/passwd",                                // file:// scheme
  "ftp://internal-server/",                            // non-http scheme
  "http://192.168.1.1/admin",                          // RFC1918
  "http://10.0.0.1/",                                  // RFC1918
  "http://evil.com@169.254.169.254/",                  // userinfo bypass
  "http://169.254.169.254.evil.com/",                  // lookalike domain
];

describe("fetch_url — SSRF validation", () => {
  for (const url of SSRF_PAYLOADS) {
    it(`blocks SSRF payload: ${url}`, async () => {
      const result = await fetchUrlHandler({ url });
      expect(result.isError).toBe(true);
      // Must not return the response body of the blocked request
      expect(result.content[0].text).not.toContain("ami-id");
      expect(result.content[0].text).not.toContain("computeMetadata");
    });
  }
});

SSRF URL tests are among the first things SkillAudit's dynamic scanner tries. If you have a fetch_url or similar tool, these tests in your repo tell us you're aware of the attack surface — even if we still run the live payloads to confirm the block works at runtime.

Error information leakage tests

These tests verify that your error responses don't accidentally expose internal state. The shape of a correct rejection is narrow: a human-readable message that describes what went wrong at a high level, with no file paths, stack traces, library names, environment variable names, database error strings, or internal identifiers.

// src/tools/run-query.security.test.ts
import { describe, it, expect, vi } from "vitest";
import { runQueryHandler } from "./run-query.js";
import { db } from "../db.js";

vi.mock("../db.js");

// Patterns that should NEVER appear in any error response
const FORBIDDEN_LEAK_PATTERNS = [
  /at\s+\w+\s+\(.*:\d+:\d+\)/,      // stack frame
  /Error: ENOENT/,                    // Node FS error
  /SyntaxError:/,                     // raw JS error
  /TypeError:/,                       // raw JS error
  /\/app\//,                          // internal file path
  /node_modules/,                     // dependency path
  /process\.env\./,                   // env var reference
  /password/i,                        // credential mention
  /secret/i,                          // credential mention
  /DATABASE_URL/,                     // connection string var
  /sqlite3\.DatabaseError/,           // DB driver error
  /SQLITE_ERROR/,                     // SQLite error code
  /column .* does not exist/,         // DB schema leak
  /relation .* does not exist/,       // Postgres schema leak
];

describe("run_query — error information leakage", () => {
  it("does not leak db connection error details when db is unavailable", async () => {
    vi.mocked(db.query).mockRejectedValue(
      new Error("SQLITE_CANTOPEN: unable to open database file: /app/data/prod.db")
    );

    const result = await runQueryHandler({ query: "SELECT 1" });

    expect(result.isError).toBe(true);
    for (const pattern of FORBIDDEN_LEAK_PATTERNS) {
      expect(result.content[0].text).not.toMatch(pattern);
    }
  });

  it("does not leak stack trace when handler throws unexpectedly", async () => {
    vi.mocked(db.query).mockImplementation(() => {
      throw new TypeError("Cannot read properties of undefined (reading 'rows')");
    });

    const result = await runQueryHandler({ query: "SELECT 1" });

    expect(result.isError).toBe(true);
    for (const pattern of FORBIDDEN_LEAK_PATTERNS) {
      expect(result.content[0].text).not.toMatch(pattern);
    }
    // Must return a generic message
    expect(result.content[0].text.length).toBeLessThan(200);
  });

  it("does not differentiate between auth failure and not-found (enumeration prevention)", async () => {
    vi.mocked(db.query).mockResolvedValue({ rows: [] }); // empty result

    const missingResult = await runQueryHandler({
      query: "SELECT * FROM reports WHERE id = 'nonexistent'",
      context: { orgId: "org-abc" }
    });

    const forbiddenResult = await runQueryHandler({
      query: "SELECT * FROM reports WHERE id = 'belongs-to-other-org'",
      context: { orgId: "org-abc" }
    });

    // Both must return identical messages
    expect(missingResult.content[0].text).toBe(forbiddenResult.content[0].text);
  });
});

The FORBIDDEN_LEAK_PATTERNS array (lines 9–24) is the most reusable part of this test file. Import it in every tool's security test to apply the same invariant everywhere. If any of these patterns match in a test run, you have an information-leakage bug.

Rate limiting and resource exhaustion tests

If your MCP server has rate limiting — and it should — you need tests that verify the limiter actually fires. These tests are harder to write because they require sending multiple requests in sequence, but Vitest handles async iteration cleanly:

// src/tools/run-query.rate-limit.test.ts
import { describe, it, expect, beforeEach } from "vitest";
import { runQueryHandler } from "./run-query.js";
import { resetRateLimiter } from "../rate-limiter.js";

beforeEach(() => {
  resetRateLimiter(); // clear counters between tests
});

describe("run_query — rate limiting", () => {
  it("allows requests up to the limit", async () => {
    const limit = 10; // your configured per-minute limit
    const results = await Promise.all(
      Array.from({ length: limit }, () =>
        runQueryHandler({ query: "SELECT 1" }, { callerId: "user-1" })
      )
    );
    expect(results.every(r => !r.isError)).toBe(true);
  });

  it("blocks requests beyond the limit", async () => {
    const limit = 10;
    // Exhaust the limit
    await Promise.all(
      Array.from({ length: limit }, () =>
        runQueryHandler({ query: "SELECT 1" }, { callerId: "user-1" })
      )
    );

    // The (limit+1)th request must be blocked
    const overLimitResult = await runQueryHandler(
      { query: "SELECT 1" },
      { callerId: "user-1" }
    );
    expect(overLimitResult.isError).toBe(true);
    expect(overLimitResult.content[0].text).toContain("rate limit");
  });

  it("applies limits per caller ID, not globally", async () => {
    const limit = 10;
    // Exhaust limit for user-1
    await Promise.all(
      Array.from({ length: limit + 1 }, () =>
        runQueryHandler({ query: "SELECT 1" }, { callerId: "user-1" })
      )
    );

    // user-2 must still be allowed
    const user2Result = await runQueryHandler(
      { query: "SELECT 1" },
      { callerId: "user-2" }
    );
    expect(user2Result.isError).toBeFalsy();
  });
});

Running security tests in CI

Add a dedicated security test job to your GitHub Actions workflow that runs separately from the functional test suite. This makes security test failures visible at a glance and prevents them from being buried in a long test output:

# .github/workflows/security-tests.yml
name: Security tests

on:
  push:
    branches: [main]
  pull_request:

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "22"
          cache: "npm"
      - run: npm ci
      - name: Run security unit tests
        run: npm run test:security
      - name: Run full test suite with coverage
        run: npm run test:coverage
      - name: Enforce coverage threshold
        run: |
          # Require 80% line coverage on security-relevant paths
          node -e "
            const cov = require('./coverage/coverage-summary.json');
            const auth = cov['src/auth.ts'];
            const errors = cov['src/errors.ts'];
            for (const [file, data] of [['src/auth.ts', auth], ['src/errors.ts', errors]]) {
              const pct = data.lines.pct;
              if (pct < 80) {
                console.error(\`\${file}: line coverage \${pct}% < 80% threshold\`);
                process.exit(1);
              }
            }
          "

The coverage threshold check on lines 22–34 ensures that auth and error-handling code — the most security-sensitive files — are covered by at least 80% of lines in the test suite. This prevents the common failure mode where engineers add new auth paths without adding corresponding tests.

SkillAudit grade impact table

What you tested	SkillAudit finding it covers	Severity	Grade if missing	Grade if fixed
IDOR test: same error message for missing vs. forbidden	Resource enumeration via differential error response	MEDIUM	C	A
Auth short-circuit: DB not called when token invalid	Auth bypass: handler reaches data layer without valid token	HIGH	D	A
FORBIDDEN_LEAK_PATTERNS on all error paths	Stack trace exposure, internal path disclosure	HIGH	D	B
SSRF payload battery on URL-accepting tools	SSRF to internal services / metadata service	HIGH	F	A
Path traversal battery on filename fields	Path traversal / arbitrary file read	HIGH	F	A
Rate limiter per-caller isolation test	Missing rate limiting / global rate limiter bypass	MEDIUM	C	B
Algorithm confusion: expired token rejected	JWT validation: expired tokens accepted	HIGH	D	A

The shared FORBIDDEN_LEAK_PATTERNS helper

Extract the patterns array into a shared test utility so every tool's security tests import the same set:

// src/test-utils/security-patterns.ts
export const FORBIDDEN_LEAK_PATTERNS = [
  /at\s+\w+\s+\(.*:\d+:\d+\)/,
  /Error: ENOENT/,
  /SyntaxError:/,
  /TypeError:/,
  /\/app\//,
  /node_modules/,
  /process\.env\./,
  /password/i,
  /secret/i,
  /DATABASE_URL/,
  /sqlite3\./i,
  /SQLITE_/,
  /column .* does not exist/,
  /relation .* does not exist/,
  /ORA-\d{5}/,   // Oracle error codes
  /PG\d{5}/,     // Postgres error codes
  /\[object Object\]/,   // accidentally stringified object
] as const;

export function assertNoLeaks(responseText: string): void {
  for (const pattern of FORBIDDEN_LEAK_PATTERNS) {
    if (pattern.test(responseText)) {
      throw new Error(
        `Information leak detected — response matches forbidden pattern ${pattern}: "${responseText.slice(0, 200)}"`
      );
    }
  }
}

// Usage:
// import { assertNoLeaks } from "../test-utils/security-patterns.js";
// assertNoLeaks(result.content[0].text);

With assertNoLeaks as a one-liner, adding the information-leakage check to any error path test takes one import and one function call.

What these tests don't replace

Unit tests verify that your handler logic produces the right output for a given input in isolation. They can't catch:

Transitive dependencies with vulnerabilities — a compromised npm package that ships a backdoored version of a library you use. Static analysis (SkillAudit's dependency scanning axis) and npm audit cover this.
Runtime configuration errors — a correct handler deployed with environment variables missing or a misconfigured reverse proxy stripping auth headers. Integration tests or staging environment checks cover this.
Novel prompt injection patterns — adversarial LLM-generated inputs that exploit your tool's semantic behavior in ways you didn't anticipate. SkillAudit's dynamic LLM red-team covers this.
Supply chain attacks — a malicious package in your node_modules that subverts your handler at runtime. SBOM verification and lockfile pinning cover this.

The right model: unit security tests + SkillAudit static analysis + SkillAudit LLM red-team = layered coverage. Unit tests catch regressions in code you control; SkillAudit catches what unit tests can't see. See the related guide on running a full MCP server security audit and what static analysis can and can't find for the other layers.

Ready to see how your server scores? Run a free SkillAudit — paste your GitHub URL and get a graded report in 60 seconds. The Security axis score reflects static findings from your code; your new test suite is a positive signal that appears in the analysis notes.