Topic: mcp server logging security

MCP server logging security — what to log, what never to log, and why it matters

Logging is security infrastructure. Done right, server logs let you reconstruct what happened after an incident, detect anomalous call patterns, and audit the tool-call history a user's agent generated. Done wrong, logs become a secondary credential leak vector — one of the four credential finding classes in the SkillAudit corpus is environment variable echoes in logger output, present across 7 of 101 scanned servers. The logging architecture matters as much as the credential storage architecture.

TL;DR

Safe logging for MCP servers: log structured events (tool name, call ID, duration, outcome) not raw arguments. Never log process.env, OAuth tokens, or API keys. Never log tool arguments verbatim in production — they may contain PII, user secrets, or injected payloads. Use a redacting formatter in production that scrubs known secret patterns from log lines before they're written. Separate debug trace logs (tool arguments, raw responses) from operational logs (events, errors) and keep debug traces local-only or behind access control.

Why MCP logging has different risks than server logging

In a conventional web server, request logging typically captures method, path, status code, and duration. Logging the request body is unusual and often disabled by default because it captures user data. MCP servers face the same risk but in a shape that's easy to overlook: the "request body" equivalent is the tool argument the model sent, and tool arguments can contain anything the model's context window contained.

Consider a tool that reads a document and extracts structured data from it. The tool argument might be a file path — innocuous. But the model might be reading a document that was crafted to include injection payloads, credential patterns, or PII. If the server logs the document content verbatim (perhaps in a debug trace) and that log is shipped to a centralized logging service, the injected payload or the PII leaves the user's environment without their knowledge.

The three specific risks that make MCP logging different:

What to log (safe categories)

The safe logging targets for an MCP server are operational events — information that supports incident response, anomaly detection, and usage analysis without capturing sensitive content:

// Safe structured logging
import pino from 'pino';

const logger = pino({ level: process.env.LOG_LEVEL ?? 'info' });

server.tool('fetch_page', FetchPageArgs, async (args, { callId }) => {
  const start = Date.now();
  try {
    const result = await fetchPage(args.url);
    logger.info({
      event: 'tool_call',
      tool: 'fetch_page',
      callId,
      duration_ms: Date.now() - start,
      outcome: 'success',
      // NOT: url: args.url (could log a user-controlled URL)
      // NOT: result: result (could log sensitive content)
    });
    return result;
  } catch (err) {
    logger.error({
      event: 'tool_call_error',
      tool: 'fetch_page',
      callId,
      duration_ms: Date.now() - start,
      error_type: err instanceof Error ? err.constructor.name : 'UnknownError',
      // NOT: error_message: err.message (might contain the URL that caused the error)
      // NOT: args: args (logs the tool argument)
    });
    throw err;
  }
});

What never to log

Explicit prohibition list for production MCP server logs:

Redacting formatters

A redacting formatter is a logging middleware that scrubs known sensitive patterns from log lines before they're written to any destination. This is a defense-in-depth measure — it catches credential echoes that slipped past the "don't log secrets" coding discipline:

// pino redact configuration
const logger = pino({
  redact: {
    paths: [
      'req.headers.authorization',
      'req.headers.cookie',
      '*.apiKey',
      '*.api_key',
      '*.accessToken',
      '*.access_token',
      '*.secret',
      '*.password',
      '*.token',
    ],
    censor: '[REDACTED]'
  }
});

// For patterns that aren't known object paths, add a serializer
const sensitivePatterns = [
  /sk-[a-zA-Z0-9]{32,}/,   // OpenAI key pattern
  /gh[pousr]_[a-zA-Z0-9]{36}/,  // GitHub PAT patterns
  /AKIA[A-Z0-9]{16}/,       // AWS access key
];

function redactString(s: string): string {
  return sensitivePatterns.reduce(
    (acc, pattern) => acc.replace(pattern, '[REDACTED]'),
    s
  );
}

Pino's built-in redact option handles structured object paths. For string-based redaction (catching secrets that end up in error messages or log strings), a pre-write serializer is the right approach. Neither alone is sufficient — both together catch the common slip cases.

Separating debug traces from operational logs

Debug traces — full argument logging, response body logging, verbose tool call chains — are useful for development and debugging production incidents. They're not appropriate as the default production log destination because they capture sensitive content by design.

The right architecture: two log destinations with different retention and access controls. The operational log (events, errors, metrics — no sensitive content) ships to the centralized logging service. The debug trace log (verbose argument and response logging) goes only to local disk on the server's host, with a short retention window (24-48 hours) and access restricted to operators who need it for incident investigation.

Gate debug logging behind an explicit environment variable (LOG_LEVEL=debug or DEBUG_TRACES=1) that's off in production by default. The presence of debug logging in a production deployment should require a conscious operator decision, not be the default behavior.

How SkillAudit detects logging issues

The SkillAudit credentials axis includes logging-specific checks. The static pass scans for console.log(process.env), console.log(...process.env), logger.info({ env: process.env }), and Python equivalents (print(os.environ), logging.info(os.environ)) in both production code paths and startup/error handlers. These are the patterns flagged as HIGH in the corpus — the anatomy post names all 13 instances across the 7 servers where this appeared.

More subtle patterns — logging the full tool argument object, logging upstream response bodies — are flagged as WARN rather than HIGH, because they're sensitive to context: logging args.tool_name is fine, logging JSON.stringify(args) in production is not.

The security testing page has the grep commands for checking these patterns in your own codebase before running a full audit.

Further reading