Topic: mcp server data exfiltration prevention

MCP server data exfiltration prevention — detecting and stopping bulk data theft via tool calls

A conventional web application is slow-exfiltration-resistant by design: rate limiting, CAPTCHA, and the friction of a human operator mean bulk data theft takes hours and generates visible anomalies. An MCP server with a database read tool has no such friction. A prompt injection payload that instructs the LLM orchestrator to dump all rows via repeated tool calls can exfiltrate a million-row database in minutes, before any human-visible alert fires. Response payload limits, per-session data budgets, and structured audit logging convert exfiltration from a silent crime into a detectable event.

The machine-speed exfiltration model

Data exfiltration through MCP tools differs from web scraping in two ways. First, the LLM orchestrator can issue tool calls continuously without the browser-rendering overhead that slows conventional scrapers — requests arrive at the rate the model can generate arguments, typically hundreds per minute. Second, the data reaches the LLM's context window directly, where it can be formatted, summarized, or re-transmitted via a subsequent tool call (email send, HTTP POST, file write) without any human-readable log of the transfer.

The attack path for a prompt injection-driven exfiltration is: injected payload instructs the model to enumerate users → model calls list_users(offset=0, limit=1000) → model calls list_users(offset=1000, limit=1000) → ... → model calls a send_email or write_file tool with the accumulated data. Without server-side limits, this completes in the background of what looks like a normal session.

Response payload size limits

The first control is a hard cap on the bytes returned by any single tool call. An MCP tool that returns up to 1 MB of JSON per call can be called 1,000 times to exfiltrate 1 GB. A tool that returns at most 50 KB forces at least 20,000 calls to reach the same total — and 20,000 calls against a per-session quota triggers the quota long before the exfiltration is complete.

const MAX_RESPONSE_BYTES = 50_000; // 50 KB per tool call

export async function list_users(args: {
  offset: number;
  limit: number;
}) {
  // Enforce server-side limit regardless of client-supplied limit
  const safeLimit = Math.min(args.limit, 100); // max 100 rows per call

  const rows = await db.users.findMany({
    skip: args.offset,
    take: safeLimit,
    select: {
      id: true, name: true, email: true
      // Exclude: passwordHash, apiKeys, paymentInfo
    }
  });

  const payload = JSON.stringify({ rows, total: rows.length });

  if (Buffer.byteLength(payload) > MAX_RESPONSE_BYTES) {
    // Truncate to safe size — log the event as an anomaly
    log.warn({ tool: 'list_users', args, size: payload.length }, 'Response truncated');
    return JSON.parse(payload.substring(0, MAX_RESPONSE_BYTES) + '"}');
  }

  return { rows, total: rows.length };
}

Per-session data budget

A session-level data quota tracks total bytes returned to the LLM context across all tool calls in the session. When the quota is exhausted, further data-reading tool calls return an error. This converts a sustained exfiltration attempt (which would fly under per-call limits) into a detectable event:

const SESSION_DATA_BUDGET_BYTES = 5_000_000; // 5 MB per session

class SessionQuota {
  private bytesUsed = 0;

  charge(bytes: number): void {
    this.bytesUsed += bytes;
    if (this.bytesUsed > SESSION_DATA_BUDGET_BYTES) {
      log.warn({ bytesUsed: this.bytesUsed }, 'Session data budget exceeded');
      throw new McpError(
        ErrorCode.InternalError,
        'Session data limit reached. Start a new session to continue.'
      );
    }
  }

  get used(): number { return this.bytesUsed; }
}

// One quota instance per session — passed to each tool call
export function createToolContext(sessionId: string) {
  return { quota: new SessionQuota(), sessionId };
}

// Tool uses quota
export async function read_file(args: { path: string }, ctx: ToolContext) {
  const content = await fs.readFile(safePath(args.path), 'utf8');
  ctx.quota.charge(Buffer.byteLength(content));
  return content;
}

Column-level access control — exclude sensitive fields

The most direct exfiltration prevention is not returning sensitive fields at all. A list_users tool that can return password hashes, API keys, or payment info has zero reason to include those fields in its schema. Use explicit field selection — never SELECT * in tools that return data to the LLM context:

// HIGH finding — SELECT * returns all columns including secrets
const rows = await db.query('SELECT * FROM users WHERE active = 1');

// Correct — explicit column list, excludes all sensitive fields
const rows = await db.users.findMany({
  select: {
    id: true,
    displayName: true,
    email: true,
    createdAt: true
    // passwordHash: NOT included
    // apiKeyHash: NOT included
    // stripeCustomerId: NOT included
  }
});

Audit logging for detection

Even with size limits and quotas in place, exfiltration attempts should generate audit log entries that operations can alert on. Structure the log for anomaly detection queries:

interface ToolAuditEntry {
  sessionId: string;
  tool: string;
  args: Record<string, unknown>; // sanitized — no secrets
  responseBytes: number;
  sessionTotalBytes: number;
  durationMs: number;
  ts: string; // ISO 8601
}

function auditLog(entry: ToolAuditEntry): void {
  // Structured JSON — queryable by sessionId + tool + responseBytes
  process.stdout.write(JSON.stringify(entry) + '\n');
}

// Detection query (Loki/Splunk/CloudWatch Insights):
// fields sessionId, tool, sum(responseBytes) as totalBytes
// | where totalBytes > 2000000
// | sort totalBytes desc
// → sessions that extracted > 2 MB are candidates for review

SkillAudit detection

HIGH: SELECT * in a tool that returns data to the LLM — includes all sensitive columns without field selection.
HIGH: No server-side limit on limit or take parameters — client can request unlimited rows in a single call.
MEDIUM: No response payload size cap — large responses allowed without truncation or audit.
MEDIUM: No per-session data budget — sustained exfiltration across many calls is undetected.
MEDIUM: No audit logging of tool calls with response size — exfiltration leaves no queryable trace.
LOW: Sensitive fields (passwordHash, apiKeyHash, stripeCustomerId) present in tool schema even if conditionally excluded at runtime.

For more on the ambient credential problem that makes data exfiltration through MCP servers so impactful, see the ambient token problem post. Run a SkillAudit scan to find specific tool schemas that expose oversized responses or sensitive field access.