Security·Detection Engineering

MCP server behavioral anomaly detection: ML-based baseline modeling for agentic tool access

A compromised AI agent does not look like a crashed process or a connection timeout — it looks like normal tool calls with subtly wrong arguments, or normal tool names in abnormal sequences, or familiar patterns at hours when no humans are working. Behavioral anomaly detection builds a baseline of what normal MCP tool access looks like, then alerts on deviations that match known compromise signatures.

Why traditional security monitoring misses agentic threats

Traditional security monitoring was designed for human users and service-to-service API calls. The threat models are: brute-force authentication (many failed logins), bulk data exfiltration (large response sizes), and known-bad IP addresses or signatures. MCP server threats from compromised agents look different:

A prompt injection attack causes the agent to call a legitimate, allowed tool with arguments that exfiltrate data to an attacker-controlled URL — no authentication failure, normal response size, no signature match.
A credential-stuffed agent session calls tools at a normal rate but accesses a different user's resources than the session identity suggests — no rate anomaly, no IP anomaly.
A slow data exfiltration attack uses an agent to make slightly-above-normal API calls over days, staying under rate limit thresholds — traditional UEBA wouldn't trigger on daily call volume alone.

These scenarios require behavioral analysis that understands the semantic content of tool calls, not just their volume and authentication status.

Building a behavioral baseline for MCP tool access

A useful baseline captures four dimensions of normal behavior:

Tool call sequences. For a given user role, which sequences of tools are normally called together? A clinical AI that processes lab results might normally call get_patient_demographics → get_latest_labs → summarize_findings. If a session suddenly calls get_patient_demographics → list_all_patients → bulk_export_records, that sequence is anomalous regardless of whether each individual tool call is authorized.

Argument distributions. Tool arguments have statistical distributions under normal usage. Patient ID length is stable (MRN format). Query date ranges are bounded (never more than a year). Search strings have a vocabulary distribution. An argument that is 10x longer than any normal argument, or that contains URL-like substrings in a field that normally contains names, is anomalous.

Temporal patterns. Human-initiated AI workflows follow human working hours. Agent automation may run 24/7, but irregular bursts at 3am in a normally 9-to-5 deployment are worth flagging. Rate patterns within sessions also matter: normal conversational agents have tool call rates bounded by the LLM's generation speed; a compromised agent making 1,000 tool calls per second has bypassed the LLM entirely.

Resource access graph. Which resources does a given user normally access? A cardiologist's agent should access cardiology patient records. Access to oncology records, pediatric records, or records from a different facility is anomalous — even if the agent's role technically has access to those resources.

ML approaches for anomaly detection

Isolation Forest is the practical starting point for MCP server behavioral anomaly detection. It requires no labeled anomaly data (unsupervised), handles mixed feature types (categorical tool names, numeric argument sizes, temporal features), and produces interpretable anomaly scores. It is effective at detecting point anomalies — single tool calls that are individually unusual.

LSTM autoencoders are better suited for sequence anomalies — tool call sequences that are individually normal but whose ordering or combination is unusual. Train an LSTM autoencoder on normal tool call sequences; sequences that the autoencoder reconstructs with high error are anomalous sequences. This is the right approach for detecting prompt injection attacks that cause the agent to call tools in unusual sequences.

Simple statistical baselines (mean ± 3 standard deviations on argument sizes, time-of-day histograms, tool call rate percentiles) often outperform complex ML approaches in low-data regimes and are far easier to tune and explain to a security operations team. Start with statistical baselines before investing in ML infrastructure.

Alert integration and tuning

Behavioral anomaly detection is only useful if the alerts are actionable. Key tuning principles for MCP server anomaly alerts:

Baseline the agent, not the user. Different AI agents have different behavioral profiles even when operated by the same user. Baseline per agent-type, not per user identity.
Context-aware thresholds. A bulk export tool called during a scheduled ETL job is not anomalous; the same call during an interactive session is. Use session metadata (interactive vs. scheduled, user role, time of day) to set context-aware thresholds.
Start with high-confidence signals. Initial deployment should alert only on the highest-confidence signals (tool calls that have never appeared in baseline for this user/role, argument sizes 100x above normal, cross-user resource access patterns). Tune precision before expanding coverage.
Close the loop. Feed confirmed true-positive incidents back into the baseline as labeled anomalies to train supervised detection on top of the unsupervised baseline over time.

SkillAudit's role in behavioral security

Behavioral anomaly detection is a runtime control. SkillAudit's static analysis is a pre-deployment control. They work at different layers: SkillAudit identifies the code-level vulnerabilities (missing input validation, prompt injection surface area) that make anomalous behavior possible; behavioral monitoring detects when those vulnerabilities are being exploited in production. A clean SkillAudit report reduces the attack surface that behavioral detection needs to cover; behavioral detection catches the exploitation of vulnerabilities that static analysis missed.

Pre-deployment security before runtime monitoring. Fix static analysis findings first — behavioral anomaly detection is more effective when the attack surface is smaller. Run a SkillAudit scan →

← MCP server observability and security logging · Fuzzing pipelines for MCP servers →