Topic: mcp server circuit breaker security
MCP server circuit breaker security — fail-open vs fail-closed cascades
Circuit breakers are typically discussed as a reliability pattern — they prevent a failing downstream dependency from taking down the whole service. In MCP servers they're also a security control. When a circuit opens (downstream is failing), the server must decide: fail open (allow requests through without the downstream check) or fail closed (block all requests until the downstream recovers). That decision determines whether a targeted attack on your auth provider, rate limit store, or SSRF blocklist service can be used to disable your security controls entirely.
Why circuit breaker state matters for security
Consider an MCP server that validates API keys against a remote auth service. If the auth service is the only place key validity is checked, an attacker who can cause the auth service to become unreachable (DDoS, DNS poisoning, network partition) controls whether your server accepts all requests or no requests — depending on how the circuit is configured.
A fail-open circuit in front of an auth check is a DoS-to-authentication-bypass: the attacker doesn't need to forge a valid key, just take down the key validation service. This is a real attack vector for MCP servers deployed in production, especially those using shared cloud auth infrastructure.
Fail-closed circuit breaker implementation
// Circuit breaker with fail-closed for security-critical dependencies
const STATES = { CLOSED: 'closed', OPEN: 'open', HALF_OPEN: 'half_open' };
class SecurityCircuitBreaker {
constructor(opts = {}) {
this.state = STATES.CLOSED;
this.failureCount = 0;
this.failureThreshold = opts.failureThreshold ?? 5;
this.recoveryTimeout = opts.recoveryTimeout ?? 30_000; // 30s before half-open probe
this.openedAt = null;
this.failClosed = opts.failClosed ?? true; // SECURITY-CRITICAL SERVICES: default true
// Cache last known good response for fail-closed with stale data
this.lastGoodResponse = null;
this.maxStaleMs = opts.maxStaleMs ?? 300_000; // 5-minute stale window
}
async call(fn) {
if (this.state === STATES.OPEN) {
const elapsed = Date.now() - this.openedAt;
if (elapsed < this.recoveryTimeout) {
return this._handleOpen();
}
// Transition to half-open: allow one probe
this.state = STATES.HALF_OPEN;
}
try {
const result = await fn();
this._onSuccess(result);
return result;
} catch (err) {
return this._onFailure(err);
}
}
_onSuccess(result) {
this.failureCount = 0;
this.state = STATES.CLOSED;
this.lastGoodResponse = { result, cachedAt: Date.now() };
}
_onFailure(err) {
this.failureCount++;
if (this.failureCount >= this.failureThreshold) {
this.state = STATES.OPEN;
this.openedAt = Date.now();
process.stderr.write(JSON.stringify({
event: 'CIRCUIT_OPENED',
failClosed: this.failClosed,
ts: new Date().toISOString(),
}) + '\n');
}
return this._handleOpen();
}
_handleOpen() {
if (!this.failClosed) {
// Non-security dependency: allow through with degraded behavior
return { degraded: true };
}
// Security-critical: try stale cache first
if (this.lastGoodResponse) {
const staleMs = Date.now() - this.lastGoodResponse.cachedAt;
if (staleMs < this.maxStaleMs) {
// Return stale result (e.g., cached auth key list)
return { ...this.lastGoodResponse.result, fromStaleCache: true };
}
}
// No usable cache — deny the request
throw new Error('Security dependency unavailable — request denied');
}
}
// Usage: auth service gets fail-closed circuit
const authCircuit = new SecurityCircuitBreaker({ failClosed: true, maxStaleMs: 300_000 });
async function verifyToken(token) {
try {
const result = await authCircuit.call(() => authService.verify(token));
if (result.fromStaleCache) {
// Still valid — we're using cached auth data from within 5 minutes
return result.valid;
}
return result.valid;
} catch (err) {
// Circuit open with no usable cache — reject the request
return false;
}
}
Half-open probe attack surface
The half-open state is the most dangerous for security-critical circuits. The circuit allows one probe request through to test if the downstream has recovered. If that probe is a real user request carrying real credentials or performing a real action, an attacker can time their attack to the half-open window.
// Guard: half-open probe should use a synthetic health-check, not a real request
class SecureCircuitBreaker extends SecurityCircuitBreaker {
constructor(opts) {
super(opts);
this.healthCheckFn = opts.healthCheck; // synthetic ping, not a real operation
this.halfOpenProbeInFlight = false;
}
async call(fn) {
if (this.state === STATES.OPEN) {
const elapsed = Date.now() - this.openedAt;
if (elapsed >= this.recoveryTimeout && !this.halfOpenProbeInFlight) {
// Run health check in background — don't let real request be the probe
this.halfOpenProbeInFlight = true;
this.healthCheckFn()
.then(() => { this.state = STATES.HALF_OPEN; })
.catch(() => { this.openedAt = Date.now(); }) // reset timer
.finally(() => { this.halfOpenProbeInFlight = false; });
}
// While probe is in flight or circuit is still open: deny
return this._handleOpen();
}
// ... rest of call logic
}
}
Cascade failure blast radius
When a circuit opens, every subsequent request hits the fail-closed path. If the fail-closed path itself does significant work (database lookups, secondary auth calls), a cascade of open circuits can create a resource exhaustion secondary attack. The fail-closed path should be minimal: a cache lookup and a deny response, not another round-trip.
SkillAudit detection
SkillAudit's Security axis checks for circuit breaker patterns in MCP servers that make external calls: whether the circuit breaker default is fail-open or fail-closed for calls that gate security decisions, and whether the circuit state affects authentication or input validation paths. Servers with try/catch blocks around auth calls that silently swallow errors (effectively fail-open) are flagged under the Authentication Bypass risk category.
Run a SkillAudit scan to identify which external-dependency calls in your MCP server are wired to fail-open and would be security-degraded under a targeted availability attack.
Related: Fail-secure patterns · Rate limiting security · Rate limiting deep dive