Topic: monitoring and alerting security
MCP server monitoring and alerting security — audit logs, SIEM integration, and runtime anomaly detection
An MCP server that runs without security monitoring is a blind spot in your LLM tool chain. You cannot detect a prompt-injection attack that caused your server to exfiltrate data, because you have no record of the outbound call. You cannot detect a credential misuse incident, because you have no log of which tool calls used the credential and when. Five security monitoring patterns — structured audit logging, SIEM integration, tool call anomaly detection, credential use alerting, and security event thresholds — give you visibility into what your MCP server is doing at runtime.
1. Structured audit logging
The foundation of MCP server security monitoring is a structured audit log that records every tool invocation at the appropriate level of detail — enough to reconstruct what happened, without logging sensitive data.
interface AuditEvent {
timestamp: string // ISO 8601
sessionId: string // per-session identifier (not user identity)
toolName: string // tool that was called
outcome: 'success' | 'error' | 'blocked' | 'rate_limited'
durationMs: number // tool execution time
upstreamStatusCode?: number // HTTP status from upstream API
errorCode?: string // structured error code, not message text
// NOTE: never include toolArguments — may contain PII
// NOTE: never include toolResult — may contain confidential data
}
// Middleware pattern: wrap all tool handlers with audit logging
function withAudit<T>(
toolName: string,
handler: (args: T) => Promise<unknown>
): (args: T) => Promise<unknown> {
return async (args: T) => {
const start = Date.now()
const sessionId = getSessionId()
try {
const result = await handler(args)
auditLog({ timestamp: new Date().toISOString(), sessionId, toolName,
outcome: 'success', durationMs: Date.now() - start })
return result
} catch (err) {
const outcome = err instanceof RateLimitError ? 'rate_limited'
: err instanceof BlockedError ? 'blocked'
: 'error'
auditLog({ timestamp: new Date().toISOString(), sessionId, toolName,
outcome, durationMs: Date.now() - start, errorCode: err.code })
throw err
}
}
}
2. Separate security event stream
Security-relevant events — SSRF blocks, rate limit hits, validation failures, credential errors — are qualitatively different from operational metrics. They need to be searchable independently and need different retention and alert policies. The pattern is to write security events to a dedicated stream alongside (not instead of) operational logs.
type SecurityEventType =
| 'ssrf_blocked' // outbound call to disallowed hostname
| 'private_ip_blocked' // outbound call resolved to private IP
| 'rate_limit_exceeded' // session or org call budget exhausted
| 'validation_failure' // tool argument failed schema validation
| 'auth_failure' // upstream API returned 401/403
| 'loop_detected' // same tool called >threshold times in window
| 'unexpected_endpoint' // call to upstream URL not in baseline
interface SecurityEvent {
type: SecurityEventType
severity: 'info' | 'warn' | 'high' | 'critical'
sessionId: string
toolName: string
detail: string // human-readable, no PII
timestamp: string
}
// Emit to a dedicated channel — separate from operational stdout
function emitSecurityEvent(event: SecurityEvent): void {
// Write to a structured log file, UDP syslog, or HTTP sink
process.stderr.write(JSON.stringify({ ...event, _stream: 'security' }) + '\n')
// If severity is high or critical: also send to alerting webhook
if (event.severity === 'high' || event.severity === 'critical') {
notifyOncall(event)
}
}
3. Tool call anomaly detection
Agentic loops — where an LLM session keeps calling the same tool repeatedly — are one of the most detectable security-adjacent anomalies in MCP servers. They can indicate a prompt-injection attack that put the LLM into a loop, a runaway agent task, or a deliberate denial-of-service attempt against your upstream API. Detecting them in real time lets you terminate the session before the loop causes damage.
class ToolCallAnomalyDetector {
// Per-session, per-tool call counts in rolling window
private windows = new Map<string, { count: number; windowStart: number }>()
private readonly WINDOW_MS = 60_000 // 1-minute window
private readonly LOOP_THRESHOLD = 20 // calls/window = likely loop
private readonly BURST_THRESHOLD = 10 // calls/window = monitor
record(sessionId: string, toolName: string): void {
const key = `${sessionId}:${toolName}`
const now = Date.now()
const window = this.windows.get(key)
if (!window || now - window.windowStart > this.WINDOW_MS) {
this.windows.set(key, { count: 1, windowStart: now })
return
}
window.count++
if (window.count === this.LOOP_THRESHOLD) {
emitSecurityEvent({
type: 'loop_detected',
severity: 'high',
sessionId,
toolName,
detail: `${window.count} calls to ${toolName} in ${this.WINDOW_MS / 1000}s window`,
timestamp: new Date().toISOString(),
})
} else if (window.count === this.BURST_THRESHOLD) {
emitSecurityEvent({
type: 'loop_detected',
severity: 'warn',
sessionId,
toolName,
detail: `Burst: ${window.count} calls to ${toolName} in ${this.WINDOW_MS / 1000}s window`,
timestamp: new Date().toISOString(),
})
}
}
}
4. SIEM log forwarding
For team deployments, forwarding MCP server security events to a SIEM enables correlation with other security signals. A prompt-injection attack that causes data exfiltration will typically appear in three places simultaneously: the MCP server's outbound call log, the cloud provider's network flow records, and the upstream API's access log. Only a SIEM can correlate these in real time.
## Forwarding MCP server security logs to Splunk via HTTP Event Collector (HEC):
## docker-compose.yml
services:
mcp-server:
image: myorg/mcp-server:latest
logging:
driver: "fluentd"
options:
fluentd-address: "fluentd:24224"
tag: "mcp.security"
fluentd:
image: fluent/fluentd:v1.16
volumes:
- ./fluentd.conf:/fluentd/etc/fluent.conf
## fluentd.conf — route _stream:security events to Splunk HEC
<source>
@type forward
port 24224
</source>
<filter mcp.**>
@type grep
<regexp>
key _stream
pattern /security/
</regexp>
</filter>
<match mcp.**>
@type splunk_hec
hec_host splunk.internal
hec_port 8088
hec_token "#{ENV['SPLUNK_HEC_TOKEN']}"
index mcp_security
</match>
## For Datadog: use the Datadog Agent log collection with a custom source tag
## For Elastic: use Filebeat with a custom index template
5. Credential use alerting thresholds
When an MCP server's credential is used in an unexpected pattern — off-hours, unusual volume, new API endpoint — these are indicators of credential compromise or misuse. Alerting on these patterns requires a baseline of normal usage. The simplest baseline is time-of-day and volume; more sophisticated baselines include API endpoint usage profiles.
class CredentialUseMonitor {
private hourlyCallCounts = new Array(24).fill(0) // baseline call volume per hour
private knownEndpoints = new Set<string>() // baseline API endpoints called
recordCall(endpoint: string): void {
const hour = new Date().getHours()
const isOffHours = hour < 7 || hour > 20
const isNewEndpoint = !this.knownEndpoints.has(endpoint)
const isHighVolume = this.hourlyCallCounts[hour] > 3 * this.getP95Calls()
if (isNewEndpoint) {
emitSecurityEvent({
type: 'unexpected_endpoint',
severity: isOffHours ? 'high' : 'warn',
sessionId: 'system',
toolName: 'credential-monitor',
detail: `First call to endpoint: ${new URL(endpoint).pathname}`,
timestamp: new Date().toISOString(),
})
this.knownEndpoints.add(endpoint)
}
if (isHighVolume) {
emitSecurityEvent({
type: 'rate_limit_exceeded',
severity: 'high',
sessionId: 'system',
toolName: 'credential-monitor',
detail: `Anomalous call volume: ${this.hourlyCallCounts[hour]} calls this hour (P95: ${this.getP95Calls()})`,
timestamp: new Date().toISOString(),
})
}
this.hourlyCallCounts[hour]++
}
private getP95Calls(): number {
const sorted = [...this.hourlyCallCounts].sort((a, b) => a - b)
return sorted[Math.floor(sorted.length * 0.95)]
}
}
What SkillAudit checks for monitoring
SkillAudit's Maintenance sub-score includes a monitoring readiness check that looks for evidence of security observability in the codebase:
- Structured logging: Presence of a structured log library (
pino,winston,bunyan) configured with JSON output — INFO signal, not a finding - Security event emission: Code patterns that write security-relevant events to a separate stream or with a security-specific prefix — positive signal
- No logging of sensitive fields: Patterns that would log
toolArguments,response.body, or environment variables — MEDIUM finding on the Credential Exposure sub-score - Exception handling: Unhandled promise rejections and uncaught exceptions indicate missing error observability — WARN finding
For the audit log requirements that SOC 2 and GDPR impose on MCP server operators, see the audit trail and compliance guide. For the incident response procedures that security events feed into, see the MCP incident response playbook.
Check your server's observability posture
SkillAudit's Maintenance scan checks for logging, error handling, and security observability patterns. See where your server stands.
Run a free audit