Topic: monitoring and alerting security

MCP server monitoring and alerting security — audit logs, SIEM integration, and runtime anomaly detection

An MCP server that runs without security monitoring is a blind spot in your LLM tool chain. You cannot detect a prompt-injection attack that caused your server to exfiltrate data, because you have no record of the outbound call. You cannot detect a credential misuse incident, because you have no log of which tool calls used the credential and when. Five security monitoring patterns — structured audit logging, SIEM integration, tool call anomaly detection, credential use alerting, and security event thresholds — give you visibility into what your MCP server is doing at runtime.

Quick reference

Structured audit logs first: Before you can alert, you need events. Every tool call should produce a structured log entry: timestamp, tool name, session ID, outcome (success/error/blocked), response status code, latency. Never log tool arguments — they may contain PII or confidential data.
Separate security events from operational logs: Security events (SSRF block, rate limit hit, CORS rejection, auth failure) should be logged to a separate stream from operational metrics (request latency, cache hits). This makes them searchable and alertable without log noise from normal operation.
Anomaly detection for agentic loops: An agentic loop — where an LLM is repeatedly calling the same tool in a tight cycle — generates a distinctive call pattern. A per-session call rate threshold (e.g., more than 20 calls to the same tool within 60 seconds) is a useful first-level anomaly signal.
Credential use outside expected context is a high-value alert: If your MCP server uses a credential during off-hours (weekend at 3am), calls an API endpoint you've never seen it call before, or shows a sudden spike in API call volume, these are credential misuse signals worth waking someone up for.
Log forwarding to SIEM enables correlation: MCP server security events forwarded to a SIEM (Splunk, Elastic, Datadog) can be correlated with cloud provider audit logs, endpoint detection events, and network flow records. A prompt-injection exfiltration attempt leaves evidence across multiple log sources simultaneously.

1. Structured audit logging

The foundation of MCP server security monitoring is a structured audit log that records every tool invocation at the appropriate level of detail — enough to reconstruct what happened, without logging sensitive data.

interface AuditEvent {
  timestamp: string           // ISO 8601
  sessionId: string           // per-session identifier (not user identity)
  toolName: string            // tool that was called
  outcome: 'success' | 'error' | 'blocked' | 'rate_limited'
  durationMs: number          // tool execution time
  upstreamStatusCode?: number // HTTP status from upstream API
  errorCode?: string          // structured error code, not message text
  // NOTE: never include toolArguments — may contain PII
  // NOTE: never include toolResult — may contain confidential data
}

// Middleware pattern: wrap all tool handlers with audit logging
function withAudit<T>(
  toolName: string,
  handler: (args: T) => Promise<unknown>
): (args: T) => Promise<unknown> {
  return async (args: T) => {
    const start = Date.now()
    const sessionId = getSessionId()
    try {
      const result = await handler(args)
      auditLog({ timestamp: new Date().toISOString(), sessionId, toolName,
        outcome: 'success', durationMs: Date.now() - start })
      return result
    } catch (err) {
      const outcome = err instanceof RateLimitError ? 'rate_limited'
                    : err instanceof BlockedError   ? 'blocked'
                    : 'error'
      auditLog({ timestamp: new Date().toISOString(), sessionId, toolName,
        outcome, durationMs: Date.now() - start, errorCode: err.code })
      throw err
    }
  }
}

2. Separate security event stream

Security-relevant events — SSRF blocks, rate limit hits, validation failures, credential errors — are qualitatively different from operational metrics. They need to be searchable independently and need different retention and alert policies. The pattern is to write security events to a dedicated stream alongside (not instead of) operational logs.

type SecurityEventType =
  | 'ssrf_blocked'           // outbound call to disallowed hostname
  | 'private_ip_blocked'     // outbound call resolved to private IP
  | 'rate_limit_exceeded'    // session or org call budget exhausted
  | 'validation_failure'     // tool argument failed schema validation
  | 'auth_failure'           // upstream API returned 401/403
  | 'loop_detected'          // same tool called >threshold times in window
  | 'unexpected_endpoint'    // call to upstream URL not in baseline

interface SecurityEvent {
  type: SecurityEventType
  severity: 'info' | 'warn' | 'high' | 'critical'
  sessionId: string
  toolName: string
  detail: string           // human-readable, no PII
  timestamp: string
}

// Emit to a dedicated channel — separate from operational stdout
function emitSecurityEvent(event: SecurityEvent): void {
  // Write to a structured log file, UDP syslog, or HTTP sink
  process.stderr.write(JSON.stringify({ ...event, _stream: 'security' }) + '\n')
  // If severity is high or critical: also send to alerting webhook
  if (event.severity === 'high' || event.severity === 'critical') {
    notifyOncall(event)
  }
}

3. Tool call anomaly detection

Agentic loops — where an LLM session keeps calling the same tool repeatedly — are one of the most detectable security-adjacent anomalies in MCP servers. They can indicate a prompt-injection attack that put the LLM into a loop, a runaway agent task, or a deliberate denial-of-service attempt against your upstream API. Detecting them in real time lets you terminate the session before the loop causes damage.

class ToolCallAnomalyDetector {
  // Per-session, per-tool call counts in rolling window
  private windows = new Map<string, { count: number; windowStart: number }>()
  private readonly WINDOW_MS = 60_000     // 1-minute window
  private readonly LOOP_THRESHOLD = 20   // calls/window = likely loop
  private readonly BURST_THRESHOLD = 10  // calls/window = monitor

  record(sessionId: string, toolName: string): void {
    const key = `${sessionId}:${toolName}`
    const now = Date.now()
    const window = this.windows.get(key)

    if (!window || now - window.windowStart > this.WINDOW_MS) {
      this.windows.set(key, { count: 1, windowStart: now })
      return
    }

    window.count++

    if (window.count === this.LOOP_THRESHOLD) {
      emitSecurityEvent({
        type: 'loop_detected',
        severity: 'high',
        sessionId,
        toolName,
        detail: `${window.count} calls to ${toolName} in ${this.WINDOW_MS / 1000}s window`,
        timestamp: new Date().toISOString(),
      })
    } else if (window.count === this.BURST_THRESHOLD) {
      emitSecurityEvent({
        type: 'loop_detected',
        severity: 'warn',
        sessionId,
        toolName,
        detail: `Burst: ${window.count} calls to ${toolName} in ${this.WINDOW_MS / 1000}s window`,
        timestamp: new Date().toISOString(),
      })
    }
  }
}

4. SIEM log forwarding

For team deployments, forwarding MCP server security events to a SIEM enables correlation with other security signals. A prompt-injection attack that causes data exfiltration will typically appear in three places simultaneously: the MCP server's outbound call log, the cloud provider's network flow records, and the upstream API's access log. Only a SIEM can correlate these in real time.

## Forwarding MCP server security logs to Splunk via HTTP Event Collector (HEC):

## docker-compose.yml
services:
  mcp-server:
    image: myorg/mcp-server:latest
    logging:
      driver: "fluentd"
      options:
        fluentd-address: "fluentd:24224"
        tag: "mcp.security"

  fluentd:
    image: fluent/fluentd:v1.16
    volumes:
      - ./fluentd.conf:/fluentd/etc/fluent.conf

## fluentd.conf — route _stream:security events to Splunk HEC
<source>
  @type forward
  port 24224
</source>
<filter mcp.**>
  @type grep
  <regexp>
    key _stream
    pattern /security/
  </regexp>
</filter>
<match mcp.**>
  @type splunk_hec
  hec_host splunk.internal
  hec_port 8088
  hec_token "#{ENV['SPLUNK_HEC_TOKEN']}"
  index mcp_security
</match>

## For Datadog: use the Datadog Agent log collection with a custom source tag
## For Elastic: use Filebeat with a custom index template

5. Credential use alerting thresholds

When an MCP server's credential is used in an unexpected pattern — off-hours, unusual volume, new API endpoint — these are indicators of credential compromise or misuse. Alerting on these patterns requires a baseline of normal usage. The simplest baseline is time-of-day and volume; more sophisticated baselines include API endpoint usage profiles.

class CredentialUseMonitor {
  private hourlyCallCounts = new Array(24).fill(0)  // baseline call volume per hour
  private knownEndpoints = new Set<string>()          // baseline API endpoints called

  recordCall(endpoint: string): void {
    const hour = new Date().getHours()
    const isOffHours = hour < 7 || hour > 20
    const isNewEndpoint = !this.knownEndpoints.has(endpoint)
    const isHighVolume = this.hourlyCallCounts[hour] > 3 * this.getP95Calls()

    if (isNewEndpoint) {
      emitSecurityEvent({
        type: 'unexpected_endpoint',
        severity: isOffHours ? 'high' : 'warn',
        sessionId: 'system',
        toolName: 'credential-monitor',
        detail: `First call to endpoint: ${new URL(endpoint).pathname}`,
        timestamp: new Date().toISOString(),
      })
      this.knownEndpoints.add(endpoint)
    }

    if (isHighVolume) {
      emitSecurityEvent({
        type: 'rate_limit_exceeded',
        severity: 'high',
        sessionId: 'system',
        toolName: 'credential-monitor',
        detail: `Anomalous call volume: ${this.hourlyCallCounts[hour]} calls this hour (P95: ${this.getP95Calls()})`,
        timestamp: new Date().toISOString(),
      })
    }

    this.hourlyCallCounts[hour]++
  }

  private getP95Calls(): number {
    const sorted = [...this.hourlyCallCounts].sort((a, b) => a - b)
    return sorted[Math.floor(sorted.length * 0.95)]
  }
}

What SkillAudit checks for monitoring

SkillAudit's Maintenance sub-score includes a monitoring readiness check that looks for evidence of security observability in the codebase:

Structured logging: Presence of a structured log library (pino, winston, bunyan) configured with JSON output — INFO signal, not a finding
Security event emission: Code patterns that write security-relevant events to a separate stream or with a security-specific prefix — positive signal
No logging of sensitive fields: Patterns that would log toolArguments, response.body, or environment variables — MEDIUM finding on the Credential Exposure sub-score
Exception handling: Unhandled promise rejections and uncaught exceptions indicate missing error observability — WARN finding

For the audit log requirements that SOC 2 and GDPR impose on MCP server operators, see the audit trail and compliance guide. For the incident response procedures that security events feed into, see the MCP incident response playbook.

Check your server's observability posture

SkillAudit's Maintenance scan checks for logging, error handling, and security observability patterns. See where your server stands.

Run a free audit