Topic: mcp server concurrent request security

MCP server concurrent request security — race conditions, shared state corruption, per-caller semaphores

LLM agents make tool calls at machine speed — potentially dozens of concurrent requests per second. MCP servers that weren't designed for high concurrency expose race conditions between authorization and execution (TOCTOU), shared mutable state that can be corrupted across concurrent requests, rate limiters that can be bypassed by racing concurrent calls, and resource pools that can be exhausted by a single caller's parallel requests. These vulnerabilities often don't appear in sequential testing and only emerge under load.

1. TOCTOU — authorization check and data access aren't atomic

Time-of-check-to-time-of-use (TOCTOU) races occur when a security check runs in one async operation and the guarded action runs in a separate async operation — with a window between them where the condition can change:

// VULNERABLE: check (isOwner) and act (delete) are separate async operations
// A race condition between two concurrent calls can bypass the ownership check
server.tool("delete_file", { fileId: z.string() }, async ({ fileId }, ctx) => {
  // CHECK: verify caller owns the file
  const file = await db.files.findById(fileId);
  if (!file || file.ownerId !== ctx.userId) {
    throw new ForbiddenError("Not your file");
  }

  // --- WINDOW: file ownership could change between here and the next line ---
  // Another concurrent request could transfer file ownership to a different user
  // before this delete executes, allowing the original owner to delete a file
  // they no longer own

  // ACT: delete the file (no longer checking ownership)
  await db.files.delete(fileId);
  return { content: [{ type: "text", text: "Deleted" }] };
});

// SECURE: atomic check-and-act using database-level conditional update
// The WHERE clause enforces the ownership check at the database level atomically
server.tool("delete_file", { fileId: z.string() }, async ({ fileId }, ctx) => {
  // Atomic: only deletes if the file exists AND belongs to ctx.userId
  // If ownership changed between the "check" and "act", this DELETE finds 0 rows
  const result = await db.query(
    "DELETE FROM files WHERE id = $1 AND owner_id = $2 RETURNING id",
    [fileId, ctx.userId]
  );

  if (result.rowCount === 0) {
    // Either file doesn't exist, or caller doesn't own it — same error either way
    throw new ForbiddenError("File not found or access denied");
  }

  return { content: [{ type: "text", text: "Deleted" }] };
});

2. Shared mutable session state — concurrent writes corrupt session data

MCP servers that store session data in shared in-memory objects or non-atomic cache operations can have concurrent tool calls from the same session race on mutable state:

// VULNERABLE: shared in-memory session map with non-atomic read-modify-write
const sessions = new Map<string, SessionData>();

server.tool("increment_usage", { amount: z.number() }, async ({ amount }, ctx) => {
  const session = sessions.get(ctx.sessionId);
  if (!session) throw new Error("Session not found");

  // RACE: two concurrent calls both read session.usage = 10
  const currentUsage = session.usage;
  // --- Both threads see currentUsage = 10 ---

  // Both increment to 11 instead of the correct 12
  session.usage = currentUsage + amount;
  sessions.set(ctx.sessionId, session);
  // Result: usage=11 instead of 12 — lost update from concurrent call
  return { content: [{ type: "text", text: `Usage: ${session.usage}` }] };
});

// SECURE: use atomic increment operations
// Option A: Redis INCRBY (atomic by design)
import { createClient } from "redis";

const redis = createClient();

server.tool("increment_usage", { amount: z.number() }, async ({ amount }, ctx) => {
  const key = `session:${ctx.sessionId}:usage`;
  // INCRBY is atomic — no race condition
  const newUsage = await redis.incrBy(key, amount);
  // Set expiry to match session lifetime
  await redis.expire(key, 4 * 60 * 60);  // 4 hours
  return { content: [{ type: "text", text: `Usage: ${newUsage}` }] };
});

// Option B: in-memory with per-session mutex (for non-Redis deployments)
import { Mutex } from "async-mutex";

const sessionMutexes = new Map<string, Mutex>();
const sessionData = new Map<string, SessionData>();

function getSessionMutex(sessionId: string): Mutex {
  if (!sessionMutexes.has(sessionId)) {
    sessionMutexes.set(sessionId, new Mutex());
  }
  return sessionMutexes.get(sessionId)!;
}

server.tool("increment_usage", { amount: z.number() }, async ({ amount }, ctx) => {
  const mutex = getSessionMutex(ctx.sessionId);
  return mutex.runExclusive(async () => {
    const session = sessionData.get(ctx.sessionId)!;
    session.usage += amount;  // Safe: only one concurrent execution per session
    sessionData.set(ctx.sessionId, session);
    return { content: [{ type: "text", text: `Usage: ${session.usage}` }] };
  });
});

3. Rate limit bypass via concurrent request racing

Rate limiters that implement a read-then-decrement pattern (check remaining count, then decrement) can be bypassed by sending many concurrent requests simultaneously. All concurrent requests pass the check before any decrement is processed:

// VULNERABLE: non-atomic rate limit check — susceptible to concurrent bypass
const rateLimits = new Map<string, { count: number; resetAt: number }>();

function checkRateLimit(callerId: string, limit: number): void {
  const now = Date.now();
  const entry = rateLimits.get(callerId) ?? { count: 0, resetAt: now + 60_000 };

  if (now > entry.resetAt) {
    entry.count = 0;
    entry.resetAt = now + 60_000;
  }

  // RACE: 10 concurrent requests all read count=9 (below limit=10)
  if (entry.count >= limit) throw new Error("Rate limit exceeded");

  // All 10 then increment to 10 — no request is blocked despite 10 concurrent calls
  entry.count++;
  rateLimits.set(callerId, entry);
}

// SECURE: atomic rate limit using Redis Lua script (reads and decrements atomically)
const RATE_LIMIT_SCRIPT = `
  local key = KEYS[1]
  local limit = tonumber(ARGV[1])
  local window = tonumber(ARGV[2])
  local now = tonumber(ARGV[3])

  local count = redis.call('INCR', key)
  if count == 1 then
    redis.call('EXPIRE', key, window)
  end
  if count > limit then
    return -1
  end
  return limit - count
`;

async function atomicRateLimit(
  redis: Redis,
  callerId: string,
  limit: number,
  windowSeconds: number
): Promise<void> {
  const key = `ratelimit:${callerId}`;
  const remaining = await redis.eval(
    RATE_LIMIT_SCRIPT,
    1,  // numkeys
    key,
    String(limit),
    String(windowSeconds),
    String(Math.floor(Date.now() / 1000))
  ) as number;

  if (remaining === -1) {
    throw new Error(`Rate limit exceeded: max ${limit} requests per ${windowSeconds}s`);
  }
}
// INCR is atomic in Redis — concurrent calls serialize at the Redis level
// The Lua script ensures INCR + comparison + EXPIRE are executed atomically

4. Per-caller concurrency limits — preventing one caller from monopolizing resources

Without a per-caller concurrency limit, a single LLM agent session can send thousands of concurrent tool calls, exhausting database connection pools or thread pools before rate limits (which count requests per time window) trigger:

// SECURE: per-caller semaphore limiting concurrent in-flight requests
import Semaphore from "semaphore-async-await";

const callerSemaphores = new Map<string, Semaphore>();
const MAX_CONCURRENT_PER_CALLER = 10;

function getCallerSemaphore(callerId: string): Semaphore {
  if (!callerSemaphores.has(callerId)) {
    callerSemaphores.set(callerId, new Semaphore(MAX_CONCURRENT_PER_CALLER));
    // Clean up semaphore when caller is inactive
    setTimeout(() => callerSemaphores.delete(callerId), 10 * 60 * 1000);
  }
  return callerSemaphores.get(callerId)!;
}

// Middleware: acquire semaphore before processing any tool call
function withConcurrencyLimit<T>(
  callerId: string,
  fn: () => Promise<T>
): Promise<T> {
  const semaphore = getCallerSemaphore(callerId);

  // Non-blocking check: reject immediately if all slots are taken
  if (!semaphore.getPermitsRemaining()) {
    throw new Error(
      `Concurrency limit reached: max ${MAX_CONCURRENT_PER_CALLER} concurrent requests`
    );
  }

  return semaphore.use(fn);  // Acquire, run fn, release (even on error)
}

// Apply to all tool handlers via middleware
const originalTool = server.tool.bind(server);
server.tool = function(name, schema, handler) {
  return originalTool(name, schema, async (args, ctx) => {
    return withConcurrencyLimit(ctx.callerId, () => handler(args, ctx));
  });
};

SkillAudit findings for concurrent request handling

CRITICAL −22 TOCTOU race between authorization check and resource operation — concurrent requests can bypass ownership checks
HIGH −18 Non-atomic rate limiting (read-check-decrement) — concurrent requests bypass per-caller request limits
HIGH −15 Shared mutable session state with non-atomic read-modify-write — concurrent tool calls from same session corrupt session data
MEDIUM −10 No per-caller concurrency limit — single agent session can exhaust database connection pool via parallel tool calls

Run a SkillAudit scan to detect TOCTOU patterns, non-atomic rate limiters, and shared mutable state in tool handlers. See also race condition security and TOCTOU attacks for related patterns.