Topic: mcp server logging security

MCP server logging security — what to log, what never to log, and why it matters

Logging is security infrastructure. Done right, server logs let you reconstruct what happened after an incident, detect anomalous call patterns, and audit the tool-call history a user's agent generated. Done wrong, logs become a secondary credential leak vector — one of the four credential finding classes in the SkillAudit corpus is environment variable echoes in logger output, present across 7 of 101 scanned servers. The logging architecture matters as much as the credential storage architecture.

TL;DR

Safe logging for MCP servers: log structured events (tool name, call ID, duration, outcome) not raw arguments. Never log process.env, OAuth tokens, or API keys. Never log tool arguments verbatim in production — they may contain PII, user secrets, or injected payloads. Use a redacting formatter in production that scrubs known secret patterns from log lines before they're written. Separate debug trace logs (tool arguments, raw responses) from operational logs (events, errors) and keep debug traces local-only or behind access control.

Why MCP logging has different risks than server logging

In a conventional web server, request logging typically captures method, path, status code, and duration. Logging the request body is unusual and often disabled by default because it captures user data. MCP servers face the same risk but in a shape that's easy to overlook: the "request body" equivalent is the tool argument the model sent, and tool arguments can contain anything the model's context window contained.

Consider a tool that reads a document and extracts structured data from it. The tool argument might be a file path — innocuous. But the model might be reading a document that was crafted to include injection payloads, credential patterns, or PII. If the server logs the document content verbatim (perhaps in a debug trace) and that log is shipped to a centralized logging service, the injected payload or the PII leaves the user's environment without their knowledge.

The three specific risks that make MCP logging different:

Argument content amplification: tool arguments are generated from the model's context window, which may include the full contents of documents, web pages, or other external content the model was processing. Logging arguments verbatim logs all of that content.
Credential echo via logging: a console.log(process.env) call in a startup or error handler exposes every environment variable the server can see — including all credentials it was configured with. This is a HIGH finding in the SkillAudit corpus because the log output is typically shipped to stdout, which gets captured by the agent orchestration layer and potentially stored indefinitely.
Distributed log storage: production MCP deployments typically ship logs to centralized services (CloudWatch, Datadog, Splunk). Every piece of sensitive data that reaches a log line ends up in those services' long-term storage, with their own access control surfaces that the server author doesn't control.

What to log (safe categories)

The safe logging targets for an MCP server are operational events — information that supports incident response, anomaly detection, and usage analysis without capturing sensitive content:

Tool invocation events: tool name, call ID (a random UUID per call for correlation), invocation timestamp, duration in milliseconds, outcome (success / error / timeout)
Error events: error type (class name or error code), error message (if not derived from tool arguments), stack trace (in development, not in production)
Authentication events: OAuth token issuance (token ID, not the token value), token refresh, authentication failures
Rate limiting events: calls throttled, calls rejected, upstream rate limit responses received
Health events: server startup (with version), server shutdown, dependency availability checks

// Safe structured logging
import pino from 'pino';

const logger = pino({ level: process.env.LOG_LEVEL ?? 'info' });

server.tool('fetch_page', FetchPageArgs, async (args, { callId }) => {
  const start = Date.now();
  try {
    const result = await fetchPage(args.url);
    logger.info({
      event: 'tool_call',
      tool: 'fetch_page',
      callId,
      duration_ms: Date.now() - start,
      outcome: 'success',
      // NOT: url: args.url (could log a user-controlled URL)
      // NOT: result: result (could log sensitive content)
    });
    return result;
  } catch (err) {
    logger.error({
      event: 'tool_call_error',
      tool: 'fetch_page',
      callId,
      duration_ms: Date.now() - start,
      error_type: err instanceof Error ? err.constructor.name : 'UnknownError',
      // NOT: error_message: err.message (might contain the URL that caused the error)
      // NOT: args: args (logs the tool argument)
    });
    throw err;
  }
});

What never to log

Explicit prohibition list for production MCP server logs:

process.env or os.environ — ever, in any form. Not on startup, not in error handlers, not in debug dumps. This is the credential echo pattern that drives HIGH findings.
OAuth tokens, API keys, or secrets derived from environment variables — the token value is as sensitive as the credential it's derived from. Log token IDs (the last 4 characters of a token, or a UUID you associate with the token at issuance) rather than token values.
Tool arguments verbatim in production — the argument values may contain PII, user secrets, or attacker-controlled content. If you need argument logging for debugging, gate it behind an explicit debug flag that's off by default in production, and redact before writing.
Full response bodies from upstream APIs — upstream responses may contain secrets, PII, or other sensitive data. Log response codes and sizes; don't log bodies.
Stack traces in production — stack traces reveal internal file paths, class names, and implementation details that help an attacker understand your server's structure. Log stack traces to a separate debug-only destination, or only to local console output.

Redacting formatters

A redacting formatter is a logging middleware that scrubs known sensitive patterns from log lines before they're written to any destination. This is a defense-in-depth measure — it catches credential echoes that slipped past the "don't log secrets" coding discipline:

// pino redact configuration
const logger = pino({
  redact: {
    paths: [
      'req.headers.authorization',
      'req.headers.cookie',
      '*.apiKey',
      '*.api_key',
      '*.accessToken',
      '*.access_token',
      '*.secret',
      '*.password',
      '*.token',
    ],
    censor: '[REDACTED]'
  }
});

// For patterns that aren't known object paths, add a serializer
const sensitivePatterns = [
  /sk-[a-zA-Z0-9]{32,}/,   // OpenAI key pattern
  /gh[pousr]_[a-zA-Z0-9]{36}/,  // GitHub PAT patterns
  /AKIA[A-Z0-9]{16}/,       // AWS access key
];

function redactString(s: string): string {
  return sensitivePatterns.reduce(
    (acc, pattern) => acc.replace(pattern, '[REDACTED]'),
    s
  );
}

Pino's built-in redact option handles structured object paths. For string-based redaction (catching secrets that end up in error messages or log strings), a pre-write serializer is the right approach. Neither alone is sufficient — both together catch the common slip cases.

Separating debug traces from operational logs

Debug traces — full argument logging, response body logging, verbose tool call chains — are useful for development and debugging production incidents. They're not appropriate as the default production log destination because they capture sensitive content by design.

The right architecture: two log destinations with different retention and access controls. The operational log (events, errors, metrics — no sensitive content) ships to the centralized logging service. The debug trace log (verbose argument and response logging) goes only to local disk on the server's host, with a short retention window (24-48 hours) and access restricted to operators who need it for incident investigation.

Gate debug logging behind an explicit environment variable (LOG_LEVEL=debug or DEBUG_TRACES=1) that's off in production by default. The presence of debug logging in a production deployment should require a conscious operator decision, not be the default behavior.

How SkillAudit detects logging issues

The SkillAudit credentials axis includes logging-specific checks. The static pass scans for console.log(process.env), console.log(...process.env), logger.info({ env: process.env }), and Python equivalents (print(os.environ), logging.info(os.environ)) in both production code paths and startup/error handlers. These are the patterns flagged as HIGH in the corpus — the anatomy post names all 13 instances across the 7 servers where this appeared.

More subtle patterns — logging the full tool argument object, logging upstream response bodies — are flagged as WARN rather than HIGH, because they're sensitive to context: logging args.tool_name is fine, logging JSON.stringify(args) in production is not.

The security testing page has the grep commands for checking these patterns in your own codebase before running a full audit.