Topic: mcp server error handling security

MCP server error handling security — stack trace leakage, internal path exposure, and safe error normalization

In a conventional web application an error message reaches a browser. In an MCP server that same message reaches an AI model — one that may relay it verbatim to a user, embed it in reasoning context, or (during a prompt injection attack) treat it as reconnaissance about server internals. A Node.js stack trace exposes absolute file paths and function names. A PostgreSQL error exposes table and column names. A Zod validation error lists every field name and its expected type. Even the latency difference between a "not found" and a "wrong password" error is enough to enumerate valid accounts at scale.

Quick reference

Never interpolate err.stack or err.message from internal errors into tool response content — log them internally with a correlation ID and return only the ID.
Map database driver errors (pg, mysql2, prisma) by SQLSTATE code to generic business messages; never forward the raw driver error string.
Map Zod and Joi validation failures to a single generic "invalid input" response — never include field paths, types, or constraint descriptions in the error returned to the caller.
Normalize error response timing with setTimeout or a fixed minimum response floor so that "not found" and "wrong password" paths are indistinguishable by latency.
Use isError: true in the MCP response alongside a machine-readable code field so the LLM can route errors without parsing message text.

Pattern 1: Raw Error objects returned to the caller

The fastest path to information disclosure is a catch block that returns err.stack or err.message to the tool caller. A Node.js stack trace contains the absolute filesystem path of every frame in the call stack — /app/src/handlers/user.ts:47:12 — plus function names, class names, and column offsets. A prompt injection payload that deliberately triggers an exception can enumerate your server's directory layout, identify which ORM or database driver you use, and discover internal module names across a handful of tool calls. Even without an active attack, raw error strings in MCP responses end up in conversation logs, model context windows, and any downstream logging infrastructure that stores tool outputs.

WRONG — catch block forwards err.stack directly to the LLM:

// The error returned to the LLM contains something like:
// "TypeError: Cannot read properties of undefined (reading 'id')
//     at /app/src/handlers/search.ts:23:14
//     at processTicksAndRejections (node:internal/process/task_queues:95:5)"
// This reveals: file path, line number, framework internals, Node.js version.

server.tool('search', searchSchema, async ({ query }) => {
  try {
    const results = await searchIndex.query(query);
    return { content: [{ type: 'text', text: JSON.stringify(results) }] };
  } catch (err) {
    // WRONG: err.stack and err.message expose server internals
    return {
      content: [{ type: 'text', text: `Search failed: ${err.stack}` }]
    };
  }
});

// Also wrong — err.message alone still leaks internal detail:
// "AxiosError: connect ECONNREFUSED 10.0.1.45:5432" (reveals internal IP + port)
return { content: [{ type: 'text', text: `Error: ${err.message}` }] };

RIGHT — correlation ID crosses the boundary; full error stays in internal structured log:

import crypto from 'node:crypto';

// All internal errors are logged with full detail internally;
// only a UUID correlation ID is returned to the caller.
function logAndGetCorrelationId(toolName: string, err: unknown): string {
  const id = crypto.randomUUID();
  // Structured log — goes to your SIEM / log aggregator, never to the LLM:
  console.error(JSON.stringify({
    event:        'tool_error',
    tool:         toolName,
    correlationId: id,
    errorName:    err instanceof Error ? err.name    : 'Unknown',
    errorMessage: err instanceof Error ? err.message : String(err),
    stack:        err instanceof Error ? err.stack   : undefined,
    ts:           new Date().toISOString(),
  }));
  return id;
}

server.tool('search', searchSchema, async ({ query }) => {
  try {
    const results = await searchIndex.query(query);
    return { content: [{ type: 'text', text: JSON.stringify(results) }] };
  } catch (err) {
    const id = logAndGetCorrelationId('search', err);
    // The LLM sees only: "Search temporarily unavailable (ref: a3f7c1d2-...)"
    // Support can look up the full stack trace using the correlation ID.
    return {
      isError: true,
      content: [{ type: 'text', text: `Search temporarily unavailable (ref: ${id})` }]
    };
  }
});

The correlation ID pattern also improves incident response: when a user reports "I got an error," the ID maps directly to the full stack trace in your log aggregator. No internal information left the trust boundary, and debugging is not impaired. For errors that represent expected business conditions (not-found, permission-denied, rate-limited), return a structured { error: { code, message } } object rather than a correlation ID — save IDs for genuinely unexpected internal failures.

Pattern 2: Database error messages forwarded to the caller

PostgreSQL driver errors are exceptionally information-rich. A unique constraint violation on users.email produces: duplicate key value violates unique constraint "users_email_key". That single string reveals the table name (users), the column name (email), and the constraint naming convention of your database schema. A foreign key violation reveals two table names and the relationship between them. A check constraint error reveals a column name and the constraint expression. Returning any of these to the LLM is equivalent to delivering a partial schema dump on every failed write operation. The same applies to Prisma and Sequelize — their error messages include model names, field names, and query fragments.

WRONG — raw PostgreSQL driver error string forwarded to the tool response:

import { Pool } from 'pg';
const pool = new Pool({ connectionString: process.env.DATABASE_URL });

server.tool('registerUser', registerSchema, async ({ email, password }) => {
  try {
    const hash = await bcrypt.hash(password, 12);
    await pool.query(
      'INSERT INTO users (email, password_hash, created_at) VALUES ($1, $2, NOW())',
      [email, hash]
    );
    return { content: [{ type: 'text', text: 'Registered.' }] };
  } catch (err: any) {
    // WRONG: err.message contains:
    // "duplicate key value violates unique constraint \"users_email_key\""
    // Leaks: table "users", column "email", constraint naming convention
    return { content: [{ type: 'text', text: `Registration failed: ${err.message}` }] };
  }
});

RIGHT — map SQLSTATE codes to safe business messages; unknown errors get correlation IDs:

// PostgreSQL SQLSTATE codes — stable across pg driver versions
// Full list: https://www.postgresql.org/docs/current/errcodes-appendix.html
const PG = {
  UNIQUE_VIOLATION:      '23505',   // duplicate key
  FOREIGN_KEY_VIOLATION: '23503',   // referenced row missing
  NOT_NULL_VIOLATION:    '23502',   // required field null
  CHECK_VIOLATION:       '23514',   // check constraint failed
  EXCLUSION_VIOLATION:   '23P01',   // exclusion constraint
  SERIALIZATION_FAILURE: '40001',   // retry-able concurrent update conflict
  DEADLOCK_DETECTED:     '40P01',   // retry-able deadlock
};

// Map SQLSTATE to a safe user-facing message.
// Deliberately omit: table names, column names, constraint names.
function pgCodeToSafeMessage(err: any): string | null {
  switch (err?.code) {
    case PG.UNIQUE_VIOLATION:
      return 'An account with this email address already exists.';
    case PG.FOREIGN_KEY_VIOLATION:
      return 'A referenced record could not be found.';
    case PG.NOT_NULL_VIOLATION:
      return 'A required field was missing.';
    case PG.CHECK_VIOLATION:
    case PG.EXCLUSION_VIOLATION:
      return 'A field value did not meet requirements.';
    case PG.SERIALIZATION_FAILURE:
    case PG.DEADLOCK_DETECTED:
      return 'A concurrent update conflict occurred — please retry.';
    default:
      return null;   // Unknown SQLSTATE: treat as internal error
  }
}

server.tool('registerUser', registerSchema, async ({ email, password }) => {
  try {
    const hash = await bcrypt.hash(password, 12);
    await pool.query(
      'INSERT INTO users (email, password_hash, created_at) VALUES ($1, $2, NOW())',
      [email, hash]
    );
    return { content: [{ type: 'text', text: 'Registered.' }] };
  } catch (err: any) {
    const safeMsg = pgCodeToSafeMessage(err);
    if (safeMsg) {
      // Known business error — safe to return without a correlation ID
      return { isError: true, content: [{ type: 'text', text: safeMsg }] };
    }
    // Unknown error — log full detail internally, return only correlation ID
    const id = logAndGetCorrelationId('registerUser', err);
    return {
      isError: true,
      content: [{ type: 'text', text: `Registration failed (ref: ${id})` }]
    };
  }
});

The key discipline is to branch on err.code (the SQLSTATE string), not on err.message. SQLSTATE codes are part of the PostgreSQL protocol specification and are stable across versions. Message text changes between PostgreSQL versions and is not part of the protocol contract. Branching on message text is fragile, and any code that reads err.message must be carefully audited to ensure it never forwards it to callers.

Pattern 3: Verbose validation errors revealing schema structure

Zod, Joi, and similar validation libraries produce detailed error objects that describe exactly where input failed and why: field path, received type, expected type, minimum/maximum values, regex patterns, and enum members. This information is essential during development but constitutes a schema blueprint when returned to untrusted callers. An attacker who can make tool calls can probe your validation logic by submitting malformed inputs and reading the error output to reconstruct the exact shape of your internal data model — field names, types, allowed values, and constraints — without ever having read your source code.

WRONG — Zod error forwarded verbatim to the tool response:

import { z } from 'zod';

const CreateOrderSchema = z.object({
  customerId:  z.string().uuid(),
  productId:   z.string().uuid(),
  quantity:    z.number().int().min(1).max(10000),
  couponCode:  z.string().regex(/^[A-Z]{4}-[0-9]{4}$/).optional(),
  internalTag: z.enum(['PRIORITY', 'BULK', 'TRIAL']),   // internal field!
});

server.tool('createOrder', { input: z.unknown() }, async ({ input }) => {
  const result = CreateOrderSchema.safeParse(input);
  if (!result.success) {
    // WRONG: Zod's formatted errors reveal:
    // - Every field name (customerId, productId, internalTag, ...)
    // - Expected types ("Expected uuid", "Expected number")
    // - Constraints (min: 1, max: 10000)
    // - Enum values (['PRIORITY', 'BULK', 'TRIAL']) — including internal ones
    // - Regex pattern (^[A-Z]{4}-[0-9]{4}$)
    return {
      isError: true,
      content: [{ type: 'text', text: JSON.stringify(result.error.format()) }]
    };
  }
  // ...
});

RIGHT — generic "invalid input" response; field-level detail stays in internal logs:

import { z, ZodError } from 'zod';

// Safe validation error handler: log full Zod detail internally,
// return only a generic message (with optional field count) to the caller.
function safeValidationError(toolName: string, err: ZodError): object {
  // Log the full Zod issue list internally for debugging:
  console.warn(JSON.stringify({
    event:   'validation_error',
    tool:    toolName,
    issues:  err.issues,   // full field paths, types, constraints — stays internal
    ts:      new Date().toISOString(),
  }));

  // Return only a count of issues — no field names, no types, no constraints:
  return {
    isError: true,
    content: [{
      type: 'text',
      text: JSON.stringify({
        error: {
          code:    'VALIDATION_ERROR',
          message: `Input validation failed (${err.issues.length} issue${err.issues.length !== 1 ? 's' : ''}).`,
          // Optionally include top-level field names if they are not sensitive,
          // but NEVER include type info, constraints, or enum values:
          // fields: [...new Set(err.issues.map(i => i.path[0]).filter(Boolean))]
        }
      })
    }]
  };
}

server.tool('createOrder', { input: z.unknown() }, async ({ input }) => {
  const result = CreateOrderSchema.safeParse(input);
  if (!result.success) {
    return safeValidationError('createOrder', result.error);
    // Caller sees: { "error": { "code": "VALIDATION_ERROR",
    //                           "message": "Input validation failed (2 issues)." } }
    // No field names, no types, no enum values, no regex patterns disclosed.
  }
  const order = result.data;
  // proceed with validated data...
});

The same principle applies to Joi (error.details contains field paths and types), class-validator (errors[].constraints), and any other validation library. Write a single normalization function per library and route all validation failures through it. If the LLM genuinely needs field-level feedback to help the user correct their input — for example, in a form-filling flow — limit the disclosure to field names only, never types, constraints, or enum lists, and only for fields that are part of the public-facing API contract.

Pattern 4: Error response timing differences enabling oracle attacks

Even when two error paths return identical message text, they can differ in latency — and that difference is measurable. A "user not found" path returns immediately after a fast index miss; a "wrong password" path runs bcrypt.compare which takes ~100ms. A "resource not found" path returns in ~2ms; a "resource exists but you lack permission" path queries the database and then checks ownership in ~12ms. An attacker who can make repeated tool calls and measure response timestamps can distinguish these cases without triggering any authentication error, enabling enumeration of valid users, valid resource IDs, or valid email addresses.

WRONG — try/catch paths have measurably different timing; resource existence leaks:

// Both branches return the text "Not found." — but timing betrays the truth.

// Path A: user does not exist — returns in ~1ms (fast index miss)
// Path B: user exists, wrong password — returns in ~100ms (bcrypt.compare runs)
// Path C: user exists, correct password — returns success
// Attacker measures latency: ~1ms → email unregistered, ~100ms → email registered.

server.tool('login', loginSchema, async ({ email, password }) => {
  const user = await db.query('SELECT * FROM users WHERE email = $1', [email]);
  if (!user.rows[0]) {
    return { isError: true, content: [{ type: 'text', text: 'Not found.' }] };
    // Returns here in ~1ms — timing leaks "email is not registered"
  }
  const valid = await bcrypt.compare(password, user.rows[0].password_hash);
  if (!valid) {
    return { isError: true, content: [{ type: 'text', text: 'Not found.' }] };
    // Returns here in ~100ms — timing leaks "email IS registered"
  }
  return { content: [{ type: 'text', text: 'OK' }] };
});

// Same problem with resource existence:
server.tool('getDocument', docSchema, async ({ docId, callerId }) => {
  const doc = await db.findDocument(docId);  // fast miss ~2ms, hit ~12ms
  if (!doc || doc.ownerId !== callerId) {
    return { isError: true, content: [{ type: 'text', text: 'Not found.' }] };
    // ~2ms → document doesn't exist; ~12ms → document exists but isn't yours
  }
  return { content: [{ type: 'text', text: doc.content }] };
});

RIGHT — constant-time error paths via dummy operations and minimum response floor:

import { timingSafeEqual } from 'crypto';

// Pre-computed bcrypt hash used as a dummy comparison target when no user row exists.
// bcrypt.compare against this dummy takes ~100ms — same as comparing against a real hash.
// Generate once at server startup (NOT inside the request handler):
const DUMMY_HASH = await bcrypt.hash('__sentinel_never_matches__', 12);

server.tool('login', loginSchema, async ({ email, password }) => {
  try {
    const result = await db.query(
      'SELECT id, password_hash, is_active FROM users WHERE email = $1',
      [email]
    );
    const user = result.rows[0];

    // Always run bcrypt.compare regardless of whether the user exists.
    // If no user row, compare against DUMMY_HASH — takes the same ~100ms.
    // This equalizes timing for "email not found" and "wrong password" paths.
    const hashToCompare = user?.password_hash ?? DUMMY_HASH;
    const valid = await bcrypt.compare(password, hashToCompare);

    // Single unified rejection for all failure conditions — no timing oracle:
    if (!user || !valid || !user.is_active) {
      return {
        isError: true,
        content: [{ type: 'text', text: JSON.stringify({
          error: { code: 'AUTH_FAILED', message: 'Invalid credentials.' }
        })}]
      };
    }
    const session = await createSession(user.id);
    return { content: [{ type: 'text', text: JSON.stringify({ sessionToken: session }) }] };

  } catch (err) {
    const id = logAndGetCorrelationId('login', err);
    return { isError: true, content: [{ type: 'text', text: `Login failed (ref: ${id})` }] };
  }
});

// For resource-lookup timing oracles, impose a minimum response floor
// so that fast misses and slower hits have overlapping latency distributions.
// Use a Promise.race with a floor timer to ensure minimum elapsed time:
async function withMinimumLatency(
  minMs: number,
  fn: () => Promise
): Promise {
  const [result] = await Promise.all([
    fn(),
    new Promise(resolve => setTimeout(resolve, minMs))
  ]);
  return result;
}

server.tool('getDocument', docSchema, async ({ docId, callerId }) => {
  return withMinimumLatency(20, async () => {
    const doc = await db.findDocument(docId);
    // Both "not found" (~2ms real) and "permission denied" (~12ms real)
    // now return after at least 20ms — distributions overlap, oracle collapses.
    if (!doc || doc.ownerId !== callerId) {
      return { isError: true, content: [{ type: 'text', text: JSON.stringify({
        error: { code: 'NOT_FOUND', message: 'Document not found.' }
      })}] };
    }
    return { content: [{ type: 'text', text: doc.content }] };
  });
});

The withMinimumLatency pattern works by running the real operation and a timer concurrently with Promise.all. The response is not sent until both the operation and the minimum-time timer have completed. This means fast paths (cache hits, immediate misses) are padded to the floor, while slow paths (database queries, bcrypt) only incur the timer cost if they finish faster than the floor. Choose a floor value slightly above your p99 fast-path latency so that legitimate fast responses are not excessively delayed while timing leaks are closed. For authentication handlers, the dummy bcrypt approach is preferable because it preserves the full natural timing distribution rather than introducing artificial padding.

How SkillAudit detects error handling vulnerabilities

SkillAudit's static analysis performs AST-level taint tracking from catch block error bindings to tool response content strings. It flags any code path where err.stack, err.message, or any property of the error object from a database driver (pg, mysql2, prisma, sequelize, knex) flows into the return value of a tool handler. Zod and Joi error.format(), error.errors, and error.details references in tool response context are flagged as schema leakage. Timing oracle patterns — if (!user) return before a bcrypt or PBKDF2 comparison, or findById results used in access control checks without a latency floor — are flagged as medium-severity timing oracle risks. Each finding includes a grade deduction and a link to the relevant pattern in this documentation. Run a free audit at skillaudit.dev to scan your MCP server's error handling paths alongside authentication, injection, and supply-chain findings.

Scan your MCP server for stack trace leakage, database schema disclosure, and timing oracle vulnerabilities.

Run a free audit →