MCP server observability: what to log, what not to log, and how to use logs for security detection
Logs are the only forensic instrument you have when an MCP server is misused. But most MCP servers either log too little to reconstruct an incident, or log credentials and prompt contents that make a breach worse. This guide walks through the complete event taxonomy, the no-log list, and the detection rules that turn a log stream into a real-time security sensor.
Why observability is a security requirement, not a nice-to-have
When someone asks "did anything bad happen to this MCP server?" the only honest answer comes from logs. Without structured, tamper-evident event records, you cannot answer three questions that matter:
- Was this tool invoked by an authorised session, or by a prompt injection attack? Prompt injection often leaves a fingerprint — a tool call made from an unusual context, at an unusual time, with arguments that no human would produce.
- What data left the server? If a data-exfiltration attempt happened, the response payload size is your first indicator. Logs without response sizes are blind to slow exfil.
- When did it start? Incident response needs a timestamp accurate to the second, not a developer's memory of "sometime last Tuesday."
The MCP threat model makes this more urgent than most server categories. The consumer of your tool is an LLM — an entity that can be instructed via prompt injection to behave in ways no human user would. A compromised agent session can invoke your tool thousands of times in a loop, exfiltrate structured data incrementally, or probe your tool's error handling to map internal state. Standard web access logs (IP, path, status code) are almost useless for detecting this pattern. You need application-level structured events.
SkillAudit finding: In our analysis of MCP servers submitted to the Anthropic Skills Directory, 61% had no application-level logging at all — only the default process stdout. Of those with logging, 34% were logging raw argument objects that frequently contained API keys and session tokens passed by the calling agent.
The event taxonomy: what to log on every tool call
Every tool invocation should produce a single structured JSON event — one line, one object, one JSON.stringify call. The alternative (free-text log messages scattered across handler code) is difficult to parse in a SIEM and easy to accidentally include sensitive data in.
A complete tool invocation event has three layers: the envelope (session and request identity), the call metadata (which tool, what arguments, how long it took), and the outcome (success, error, or rate limit). Here's the reference implementation:
// logger.js — structured event emitter for MCP tool handlers
const crypto = require('crypto');
function logToolEvent(envelope) {
const event = {
ts: new Date().toISOString(), // ISO-8601, always UTC
session_id: envelope.sessionId, // from MCP transport context
request_id: envelope.requestId // per-call nonce for dedup
|| crypto.randomUUID(),
tool: envelope.toolName,
arg_keys: Object.keys(envelope.args || {}), // argument NAMES only
arg_count: Object.keys(envelope.args || {}).length,
// arg_values: NEVER LOG — see "What not to log" section
latency_ms: envelope.latencyMs,
outcome: envelope.outcome, // "success" | "error" | "rate_limited"
error_code: envelope.errorCode, // e.g. "SSRF_BLOCKED", "SCHEMA_INVALID"
response_bytes: envelope.responseBytes, // size of the returned content
caller_ip: envelope.callerIp, // transport-level if available
server_version: process.env.SERVER_VERSION || 'unknown',
};
// Strip undefined fields so JSON is clean
const clean = Object.fromEntries(
Object.entries(event).filter(([, v]) => v !== undefined)
);
// Write to stdout as newline-delimited JSON (NDJSON)
// Collector (Loki, Elastic, CloudWatch) parses this natively
process.stdout.write(JSON.stringify(clean) + '\n');
}
module.exports = { logToolEvent };
The critical choice here is arg_keys, not arg_values. You need to know that a tool was called with a file_path argument — that's part of the event signature. You don't need to know that file_path was /etc/shadow. That distinction lets you detect path traversal patterns in aggregate without storing the actual attempted paths (which are PII- and compliance-relevant in some jurisdictions).
The four event categories every MCP server must emit
Tool invocation events
One event per tool call: start, outcome, and latency. The backbone of any incident reconstruction.
Tool invocation events are the most important log type. They should be emitted on every call — success, error, or rate limit — without exception. Omitting error events is a common mistake: error events are often the most forensically valuable because they show what an attacker probed.
Emit one event on call completion (not start). The completion event captures outcome and latency in a single record, which simplifies aggregation. If you need start events for latency tracing in a distributed system, prefix the outcome field with "started" and emit a separate completion event later.
{
"ts": "2026-06-11T09:14:23.441Z",
"session_id": "ses_abc123",
"request_id": "req_9f7d2a",
"tool": "read_file",
"arg_keys": ["path", "encoding"],
"arg_count": 2,
"latency_ms": 14,
"outcome": "success",
"response_bytes": 4096
}
Authentication and authorisation events
Session open, session close, permission denied. Critical for establishing the boundary of a session during incident review.
Log when a session is established, when it ends (whether by explicit close or timeout), and whenever an authorisation check fails. The authorisation denial log is particularly important: it shows what a session tried to do and was refused, which is often the most useful attack-detection signal.
{
"ts": "2026-06-11T09:14:18.002Z",
"event": "session_opened",
"session_id": "ses_abc123",
"transport": "stdio",
"client_version": "claude-code-1.5.2"
}
{
"ts": "2026-06-11T09:14:55.891Z",
"event": "permission_denied",
"session_id": "ses_abc123",
"tool": "execute_shell",
"reason": "tool_not_in_allowlist"
}
The permission_denied event is something most MCP servers don't emit at all — they simply return an error to the caller. Adding a server-side log here is a two-line change that dramatically improves your attack detection surface.
External call events
Every outbound HTTP request, database query, or file system operation your tool makes. Required for SSRF detection and data-exfil auditing.
If your tool makes outbound calls on behalf of the agent — HTTP requests to external APIs, SQL queries, S3 operations — log them as child events linked to the parent request_id. This creates a call chain you can replay during incident review.
For HTTP calls, log the method, host, and path but not the full URL with query parameters (which may contain tokens). Log the response status code and body size, but not the response body itself.
{
"ts": "2026-06-11T09:14:23.438Z",
"event": "external_http",
"parent_request_id": "req_9f7d2a",
"method": "GET",
"host": "api.github.com",
"path": "/repos/owner/repo/contents/src",
"status": 200,
"response_bytes": 1842,
"latency_ms": 210
}
SSRF detection works by comparing the host field against your known-good allowlist. Any external call to an RFC-1918 address or a cloud metadata endpoint (169.254.169.254, fd00:ec2::254) is a critical alert — log it as outcome: "ssrf_blocked" and emit a separate security_alert event.
Security alert events
Discrete events for policy violations: SSRF block, schema validation fail, rate limit hit, suspicious argument pattern. These are the events your on-call alert fires on.
Security alert events are separate from tool invocation events because they have a different lifecycle: they need to reach an alert channel (PagerDuty, Slack webhook, SIEM) on a different pathway than operational logs. Mixing them means a log volume spike can delay alert delivery.
{
"ts": "2026-06-11T09:15:01.003Z",
"event": "security_alert",
"severity": "critical",
"rule": "ssrf_private_ip_blocked",
"session_id": "ses_abc123",
"parent_request_id": "req_9f7d2b",
"detail": "outbound request to private IP range blocked",
"destination_class": "rfc1918"
// destination_ip: omit from standard log, include in high-security alert only
}
Use a distinct "event": "security_alert" field so log collectors can route these events to a higher-priority stream. Every other event type can be shipped to cold storage after 30 days; security alerts should be retained for the full compliance window (typically 1–3 years depending on your standard).
What not to log: the no-log list
Logging the wrong data makes a breach worse. If your logs contain credentials or prompt contents, then a log aggregation system compromise (Elastic, Loki, S3 bucket) is a full credential breach. The no-log list below is not optional — each item has caused real incidents in production systems.
Argument values
Tool arguments can contain API keys, personal data, file contents, or prompt injections. Log argument names and count only — never values.
Response bodies
The tool's return value may contain data the server fetched on behalf of the user. Logging it duplicates a dataset that has its own access controls elsewhere.
Environment variables
Never log process.env snapshots, config dumps, or startup diagnostics that include env vars. API keys almost always live in env vars.
HTTP request/response headers
Authorization, Cookie, and X-Api-Key headers are credentials. Log method, host, path, and status only.
Prompt or message content
The text the LLM sent to your tool is often sensitive user data. Logging it creates a shadow copy outside whatever data retention controls govern the primary conversation.
Full error stack traces in production
Stack traces leak file paths, internal module names, and sometimes variable values. Log error codes and message prefixes; ship full traces to a separate, access-controlled error tracker.
The test for whether something belongs on the no-log list: if a stranger read this log line, would they gain access to anything, or learn anything about the user that the user didn't consent to share? If yes, it should not be in the log.
Structured logging in practice: a complete handler wrapper
The cleanest implementation wraps every tool handler at the registration layer so logging is opt-out rather than opt-in. This prevents individual handlers from forgetting to emit events:
// tool-registry.js — wraps every registered handler with logging
const { logToolEvent } = require('./logger');
function registerTool(server, toolName, schema, handler) {
server.tool(toolName, schema, async (args, context) => {
const requestId = crypto.randomUUID();
const start = Date.now();
let outcome = 'success';
let errorCode;
let responseBytes = 0;
try {
const result = await handler(args, context);
// Measure result size without storing it
responseBytes = Buffer.byteLength(JSON.stringify(result));
return result;
} catch (err) {
outcome = 'error';
errorCode = err.code || 'UNKNOWN_ERROR';
// Re-throw sanitised error — no stack trace to caller
throw { isError: true, content: [{ type: 'text', text: err.message }] };
} finally {
logToolEvent({
sessionId: context?.sessionId,
requestId,
toolName,
args, // logger reads only Object.keys(args)
latencyMs: Date.now() - start,
outcome,
errorCode,
responseBytes,
callerIp: context?.remoteAddress,
});
}
});
}
The finally block guarantees the event is emitted whether the handler succeeds or throws. The responseBytes calculation happens before the return in the try block so it's available when logging.
Notice the error path: the handler re-throws a sanitised error object without a stack trace. The internal error object (with full context) is available to your error tracker via err, but the caller only sees err.message. This is the structured error pattern described in a prior post — the logging wrapper is the natural place to enforce it consistently.
Detection rules: turning logs into a security sensor
Raw logs only become a security sensor when you add detection rules. A rule is a pattern applied to the log stream that triggers an alert or incident when matched. Here are the highest-signal rules for MCP servers, ordered by severity:
| Rule name | Signal | Threshold | Severity |
|---|---|---|---|
| SSRF private IP | Outbound HTTP to RFC-1918 or cloud metadata IP | Any single occurrence | Critical |
| Credential pattern in log line | Log line matches /\b(sk-|ghp_|xoxb-|AKIA)[A-Za-z0-9]{16,}/ |
Any single occurrence | Critical |
| Abnormal call rate | Single session: >20 tool calls in 60s | Any occurrence | High |
| Permission denial spike | Same session: >5 permission_denied events in 60s |
Per session | High |
| Large response exfil | Single tool response: response_bytes > 512KB |
Any single occurrence | High |
| Error storm | Same session: >10 outcome: "error" in 60s |
Per session | Medium |
| Off-hours access | Tool call outside 06:00–23:00 in org timezone | Configurable | Low |
| New tool first-call | Tool not seen in prior 7 days of logs is called | Per deployment | Low |
The credential-pattern-in-log-line rule is both a detection rule and a canary: if it fires, it means a logging statement somewhere is capturing argument values it shouldn't. The fix is to audit the handler that produced the log line and move the credential to a no-log path — but the alert also tells you the credential was in a log file, so you need to rotate it.
Retention, shipping, and log integrity
Log usefulness degrades fast without a shipping and retention strategy. Logs that live only on the process's local disk are lost if the process crashes, the host is terminated, or an attacker deletes them. A minimal production setup:
- Emit NDJSON to stdout. The container runtime or systemd captures this at the host level, outside your process's reach.
- Ship to a managed log store. Loki, CloudWatch Logs, Elastic Cloud, or Datadog — any of these is sufficient. The key property is that log data is off the origin host within seconds of being emitted.
- Separate operational logs from security alerts. Route
"event": "security_alert"lines to a high-priority stream or alert channel. Operational logs can go to cold storage; security events should trigger real-time notification. - Set a minimum retention of 90 days. Most security incidents are discovered weeks or months after the initiating event. Thirty-day retention means you often can't reconstruct the full timeline.
For servers handling team or enterprise data, add write-once log integrity. AWS CloudTrail and GCP Cloud Audit Logs provide this natively. For self-hosted stacks, write an HMAC chain: each log batch includes a hash of the previous batch. An attacker who deletes or modifies log entries breaks the chain, which is itself detectable.
Compliance note: GDPR Article 30 requires a record of processing activities; SOC 2 CC7.2 requires monitoring for security events. Structured audit logs that capture session identity, tool name, and outcome without capturing argument values satisfy both requirements simultaneously — you have a record of what happened without a secondary copy of user data.
How SkillAudit grades logging hygiene
SkillAudit's static analysis pass checks three things in the Credential Exposure and Maintenance axes that relate directly to logging:
1. Is any console.log or logger.* call passing an argument object directly? Patterns like console.log('args:', args), logger.info({ args }), or JSON.stringify(context) in a log call are flagged as potential credential-exposure findings. Argument values may contain tokens; logging them raw is a credential-safety violation.
2. Are there any log statements in the codebase at all? A server with zero logging statements earns a Maintenance finding. No logs means no audit trail, no incident reconstruction, and no way to verify compliance claims. Audit logging is listed as a required control in the SkillAudit methodology.
3. Are error objects logged with full stack traces? console.error(err) and logger.error(err) where err is an Error object will log the full stack trace including internal paths. This is a low-severity finding but contributes to information leakage score.
The grading impact:
- A — Structured NDJSON logging, no argument values logged, security alert events present, retention note in README
- B — Structured logging present, no argument values logged, but no dedicated security alert events
- C — Text logging only (not machine-parseable), or argument objects partially logged
- D — No logging at all, or argument values confirmed in log output during dynamic test
- F — Credentials confirmed in log output during dynamic test (credential exposure automatic F in that axis)
Three quick wins for an existing MCP server
If you have a deployed server and want to improve its observability posture without a full rewrite:
Quick win 1: Audit your log statements for argument values. Search your codebase for console.log, logger.info, logger.debug, and any call that passes args, params, body, req, or context directly. Replace with Object.keys(args) for the names-only pattern. This takes 30 minutes and eliminates the highest-severity logging finding.
Quick win 2: Add a permission-denied log event. Find every place in your code where you return a permission error or a "tool not available" response. Add a single logToolEvent({ outcome: 'permission_denied', toolName, sessionId }) call before the return. This is typically a 5-line change and immediately enables detection rule #4 from the table above.
Quick win 3: Add response byte logging. In your tool handler, after the result is computed, add responseBytes: Buffer.byteLength(JSON.stringify(result)) to your log event. This single field enables the large-response exfil detection rule and costs zero performance overhead (you're serialising the result anyway to return it).
Summary
Observability is the security control that makes all other security controls auditable. An SSRF block is good; an SSRF block that emits a security_alert event that fires your on-call alert is a security control you can actually rely on. The structured logging framework above — one NDJSON event per tool invocation, argument names without values, dedicated security alert events shipped off-host within seconds — is achievable in a single focused afternoon for most MCP servers.
The no-log list is equally important. Logging too much, in the wrong format, creates a secondary credential store with weaker access controls than your primary secrets manager. The audit question isn't just "do you have logs?" — it's "are your logs safe to ship to a third-party aggregator without rotating all your credentials first?"
Run a free SkillAudit scan on your server's GitHub URL to see what our static analysis finds in your logging paths. The credential-exposure axis catches the most common logging mistakes before they reach a log aggregator — and long before a security reviewer does.