Engineering·Error Handling·Security

MCP server error handling: what to return when something goes wrong

Error handling is where most MCP servers leak their internals. A stack trace in a tool response tells an attacker exactly which library you're using, which version, and which file path threw. A bare process.exit(1) in a tool handler takes down the entire server for all callers. A swallowed error that returns success creates a silent failure that no one will debug for weeks. This post walks through the four error categories every MCP server must handle, what to return in each case, how to log errors without leaking sensitive state, and how these decisions map to SkillAudit findings.

The core tension: caller needs vs. attacker surface

Good error messages help legitimate callers debug their integration. The same information helps attackers map your internals. The resolution is not to make errors maximally vague — that just makes your tool unusable — but to separate what you log from what you return.

The rule is simple: log the real error internally; return a structured, opaque error to the caller. The caller gets enough to fix their integration (what failed, in which tool, why at a high level), but no stack traces, no file paths, no library names, no database schema details, no environment variable names.

SkillAudit classifies stack trace exposure as a HIGH finding under the Credential Exposure axis (internal paths + library versions accelerate targeted exploit development). Returning raw error.message from a database library is typically MEDIUM — the message often contains table names or query fragments.

The four error categories

MCP tool handlers encounter four distinct error categories, each requiring a different response strategy:

  1. Input validation errors — the caller sent bad input (wrong type, disallowed value, missing required field)
  2. Authorization errors — the caller is not permitted to invoke this tool or access this resource
  3. Downstream service errors — your tool called a database, API, or filesystem and that call failed
  4. Unexpected/panic errors — something truly unexpected happened (null dereference, assertion failure, unhandled case)

Each category has a different caller-facing shape and a different logging requirement. Let's walk through each.

Pattern 1: input validation errors — tell the caller exactly what's wrong

Safe

Specific validation errors are fine to return

The caller sent the input; telling them why it was rejected helps them fix their code without revealing anything about your internals.

When a caller sends { "repoUrl": "not-a-url" }, returning "repoUrl must be a valid GitHub URL starting with https://github.com/" is safe and helpful. The field name and constraint are part of your public API contract. There are no internals here to leak.

import { z } from "zod";

const GetIssueSchema = z.object({
  issueId: z.string()
    .regex(/^[A-Z][A-Z0-9]+-\d+$/, "issueId must match pattern PROJECT-123"),
  includeComments: z.boolean().optional().default(false),
});

server.tool("get_issue", GetIssueSchema, async (args) => {
  // Zod has already validated — args.issueId is clean
  const result = await fetchIssue(args.issueId, args.includeComments);
  return { content: [{ type: "text", text: JSON.stringify(result) }] };
});

If you use Zod at the entry point, validation errors surface automatically with field-level messages. If you validate manually, return { isError: true, content: [{ type: "text", text: "issueId: must match pattern PROJECT-123" }] }. Never throw — throwing an unhandled error will propagate your framework's default error format, which usually includes a stack trace.

Pattern 2: authorization errors — be consistent and opaque

Careful

Don't distinguish "no permission" from "doesn't exist"

Returning different errors for "you don't have access" vs "this resource doesn't exist" lets attackers enumerate what exists by probing for differential responses.

A common mistake is returning a 403-equivalent message for permission denial and a 404-equivalent message for missing resources. An attacker can then probe with IDs they're not authorized to see: if they get "no permission", the resource exists; if they get "not found", it doesn't. This is an enumeration vulnerability.

The safe pattern is to always return "not found or not accessible" for any resource the caller can't see, regardless of whether it exists or they lack permission. Your internal logs record the real reason.

async function getDocument(callerId: string, docId: string) {
  const doc = await db.documents.findById(docId);

  // Don't leak whether the doc exists:
  // BAD: if (!doc) throw new Error("Document not found");
  // BAD: if (!hasAccess(callerId, doc)) throw new Error("Permission denied");

  // GOOD: same message for both cases
  if (!doc || !hasAccess(callerId, doc)) {
    log.warn({ callerId, docId, reason: doc ? "unauthorized" : "not_found" });
    return toolError("Document not found or not accessible");
  }

  return doc;
}

For authentication errors (missing or invalid API key), use the same message regardless of whether the key was missing, malformed, or invalid. Different messages ("key missing" vs "key invalid") let attackers confirm they have a key format that exists in your system.

Pattern 3: downstream service errors — abstract, don't propagate

High Risk

Never return raw library errors to the caller

Database error messages contain table names, query fragments, and schema details. HTTP client errors contain internal service URLs. Filesystem errors contain absolute paths.

This is the most common source of HIGH findings in SkillAudit audits. A handler that catches a database error and returns error.message might expose something like:

// What the caller sees with raw error propagation:
{
  "error": "SQLITE_CONSTRAINT: UNIQUE constraint failed: users.email",
  "stack": "Error: SQLITE_CONSTRAINT...\n  at /app/node_modules/better-sqlite3/lib/...\n  at /app/src/handlers/create-user.ts:47:22"
}

This tells an attacker: SQLite is the database, the table name is users, the unique field is email, and the server is running Node.js with better-sqlite3 installed at that exact path. The stack trace is effectively a map of your server's internals.

The fix is a thin error abstraction layer. Write it once; use it everywhere:

// src/lib/tool-error.ts
export class ToolError extends Error {
  constructor(
    public readonly toolName: string,
    public readonly userMessage: string,
    public readonly cause?: unknown,
  ) {
    super(userMessage);
    this.name = "ToolError";
  }
}

export function toolError(toolName: string, userMessage: string, cause?: unknown) {
  return new ToolError(toolName, userMessage, cause);
}

// src/lib/handle-error.ts
import { log } from "./logger.js";

export function handleToolError(toolName: string, err: unknown): never {
  if (err instanceof ToolError) {
    // Already abstracted — log cause if present, return user message
    if (err.cause) {
      log.error({ toolName, err: err.cause }, "tool error with cause");
    }
    // Return structured MCP error response
    return {
      isError: true,
      content: [{ type: "text", text: err.userMessage }],
    } as never;
  }

  // Unexpected error — log everything, return nothing useful
  log.error({ toolName, err }, "unexpected tool error");
  return {
    isError: true,
    content: [{ type: "text", text: `${toolName}: an unexpected error occurred` }],
  } as never;
}
// Using the abstraction in a handler:
server.tool("search_documents", SearchSchema, async (args) => {
  try {
    const results = await db.search(args.query, args.limit);
    return { content: [{ type: "text", text: JSON.stringify(results) }] };
  } catch (err) {
    if (err instanceof DatabaseError && err.code === "QUERY_TIMEOUT") {
      throw toolError("search_documents", "Search timed out — try a more specific query", err);
    }
    if (err instanceof DatabaseError) {
      throw toolError("search_documents", "Search temporarily unavailable", err);
    }
    throw err; // Re-throw unexpected errors for the top-level handler
  }
});

Notice the tiered approach: known database errors get an appropriate user-facing message; unknown errors are re-thrown for the top-level handler to catch and log. The real error (err) travels as cause in your structured logs — never in the response body.

Pattern 4: unexpected errors — catch everything at the top level

High Risk

An unhandled error must not take down the server process

One caller sending a pathological input that triggers an unhandled exception should not deny service to all other callers.

MCP servers are long-running processes. A single tool invocation that throws an unhandled error will crash the server only if there is no top-level catch. In the @modelcontextprotocol/sdk, the server.tool callback is already wrapped by the SDK's transport layer — unhandled throws become JSON-RPC error responses rather than process crashes. But that default behavior still leaks the raw error message and stack trace to the caller.

The defensive pattern is a top-level wrapper around every handler that catches anything the inner code didn't handle:

// src/lib/safe-tool.ts
import { log } from "./logger.js";

export function safeTool<TInput, TOutput>(
  name: string,
  handler: (args: TInput) => Promise<TOutput>,
): (args: TInput) => Promise<TOutput> {
  return async (args: TInput) => {
    try {
      return await handler(args);
    } catch (err) {
      if (err instanceof ToolError) {
        log.warn({ tool: name, msg: err.userMessage, cause: err.cause });
        return {
          isError: true,
          content: [{ type: "text", text: err.userMessage }],
        } as unknown as TOutput;
      }
      // Truly unexpected — log with full context, return opaque message
      log.error({ tool: name, err }, "unhandled tool error");
      return {
        isError: true,
        content: [{ type: "text", text: `${name}: internal error` }],
      } as unknown as TOutput;
    }
  };
}

// Usage:
server.tool(
  "get_issue",
  GetIssueSchema,
  safeTool("get_issue", async (args) => {
    // handler code here
  }),
);

With this wrapper, every tool has a guaranteed last line of defense. Unexpected panics (null dereference, failed assertion, unhandled promise rejection) all resolve to "tool_name: internal error" in the caller's response, while the full error with stack trace lands in your structured logs for debugging.

Logging errors without leaking sensitive state

Good error handling requires good error logging — but logging is another surface for accidental credential exposure. The risk is logging structured objects that contain secrets:

// DANGEROUS: logs the entire args object, which may contain API keys
log.error({ err, args }, "tool invocation failed");

// DANGEROUS: logs the raw error which may contain query parameters with tokens
log.error({ err: error.toString() }, "downstream API call failed");

The safe approach is to log a hash of sensitive arguments (not the value), use structured field allowlists, and scrub known sensitive fields before logging:

import { createHash } from "crypto";

function argsHash(args: unknown): string {
  return createHash("sha256")
    .update(JSON.stringify(args))
    .digest("hex")
    .slice(0, 12);
}

// Safe error log: correlatable (argsHash), no sensitive values
log.error({
  tool: name,
  argsHash: argsHash(args),
  errType: err instanceof Error ? err.constructor.name : typeof err,
  errMsg: err instanceof ToolError ? err.userMessage : "unexpected error",
}, "tool invocation failed");

This gives you enough to correlate errors across calls (via argsHash) and diagnose patterns (via errType) without logging credential values. If you need the actual error for debugging, log it separately at a lower level and gate that log emission on a debug flag.

For more on structured audit logging without credential exposure, see the post on MCP server observability and security logging.

Error handling and the SkillAudit grade

Here's how common error handling mistakes map to SkillAudit findings:

Pattern Axis Finding Grade impact
Stack trace returned in error response Credential Exposure
HIGH Internal path and library version exposure
D
Raw error.message from DB library Credential Exposure
HIGH Schema/query details in error response
D
Differential auth vs not-found errors Security
MED Resource enumeration via error discrimination
C
Unhandled errors crash the process Security
HIGH Denial-of-service via input-triggered panic
D
Error logs contain raw args (potential creds) Credential Exposure
MED Possible credential exposure in log output
C
ToolError abstraction + safe error logs Credential Exposure No findings A

Putting it together: a complete error handling setup

Here's the full stack assembled for a production MCP server. Three files; most servers can copy this verbatim:

// src/lib/errors.ts — error types
export class ToolError extends Error {
  constructor(
    public readonly toolName: string,
    public readonly userMessage: string,
    public readonly cause?: unknown,
  ) {
    super(userMessage);
    this.name = "ToolError";
    if (cause instanceof Error) {
      this.stack = `${this.stack}\nCaused by: ${cause.stack}`;
    }
  }
}

export class AuthError extends ToolError {
  constructor(toolName: string) {
    super(toolName, "Unauthorized");
    this.name = "AuthError";
  }
}

export class ValidationError extends ToolError {
  constructor(toolName: string, field: string, constraint: string) {
    super(toolName, `${field}: ${constraint}`);
    this.name = "ValidationError";
  }
}
// src/lib/safe-tool.ts — top-level wrapper
import { createHash } from "crypto";
import { log } from "./logger.js";
import { ToolError } from "./errors.js";

export function safeTool<T, R>(name: string, fn: (args: T) => Promise<R>) {
  return async (args: T): Promise<R> => {
    const argsHash = createHash("sha256")
      .update(JSON.stringify(args))
      .digest("hex")
      .slice(0, 12);

    try {
      const result = await fn(args);
      return result;
    } catch (err) {
      if (err instanceof ToolError) {
        log.warn({ tool: name, argsHash, type: err.name, msg: err.userMessage });
        return {
          isError: true,
          content: [{ type: "text", text: err.userMessage }],
        } as unknown as R;
      }
      log.error({ tool: name, argsHash, err }, "unhandled error");
      return {
        isError: true,
        content: [{ type: "text", text: `${name}: internal error` }],
      } as unknown as R;
    }
  };
}
// Usage in a handler:
import { safeTool, ToolError, AuthError } from "../lib/errors.js";
import { authenticate } from "../lib/auth.js";
import { db } from "../lib/db.js";

server.tool(
  "list_documents",
  ListDocumentsSchema,
  safeTool("list_documents", async (args) => {
    const caller = await authenticate(args.apiKey).catch(() => {
      throw new AuthError("list_documents");
    });

    const docs = await db.list(caller.orgId, args.limit).catch((err) => {
      throw new ToolError("list_documents", "Failed to retrieve documents", err);
    });

    return { content: [{ type: "text", text: JSON.stringify(docs) }] };
  }),
);

This setup earns full marks on the Credential Exposure axis in SkillAudit: no stack traces, no library details, no schema information, no raw error messages in responses. The audit logger uses argsHash for correlation without storing credential values. See the security-first architecture walkthrough for the complete picture.

Summary

The four rules that cover 95% of MCP error handling:

  1. Validate at the boundary with Zod — return field-level messages for input errors; these are safe to expose because they're part of your public API.
  2. Abstract downstream errors — catch database, HTTP, and filesystem errors and replace them with a user-facing message; log the original as cause.
  3. Equivocate on authorization vs. not-found — return the same message for "you can't see this" and "this doesn't exist" to prevent resource enumeration.
  4. Wrap every handler in a top-level catch — no uncaught exception should leak a stack trace or crash the server process.

For related patterns, see input validation patterns, when to reject vs return an error, and security logging without credential exposure.