Topic: mcp server graceful shutdown security

MCP server graceful shutdown security — SIGTERM handling, in-flight request completion, credential cleanup

Graceful shutdown is usually discussed as an availability concern: finish in-flight requests before exiting so callers don't see unexplained errors. In MCP servers, it is also a security concern. Credentials cached in memory, active database connections holding locks, and audit log entries buffered in memory but not yet flushed — these are all security gaps created by an abrupt process exit.

What an abrupt SIGTERM leaves behind

The most common pattern in Node.js MCP servers is no SIGTERM handler at all, or one that immediately calls process.exit(0):

// Dangerous: no graceful shutdown — abrupt exit on SIGTERM
process.on('SIGTERM', () => {
  process.exit(0);  // credentials in memory, audit log unflushed, DB connections leaked
});

// Also dangerous: no handler at all
// Node.js exits immediately on SIGTERM with no handler registered

The security consequences:

Credentials in memory — if the server cached a database connection string, API key, or vault lease in a module-level variable, that memory is held until the OS reclaims it. A process that exits cleanly with zeroed credential variables is safer than one that exits abruptly.
Audit log buffer unflushed — if the audit log client uses an in-memory buffer (common for batching writes to reduce I/O), an abrupt exit drops the buffer. Tool calls that happened seconds before shutdown have no audit record. For SOC 2 and GDPR purposes, this is a gap in the record of processing.
In-flight tool calls dropped mid-operation — a database write that was halfway through a transaction when SIGTERM arrived leaves the database in a state that requires automatic rollback recovery. More critically, the tool call's result is neither success nor failure in the caller's view — the LLM agent may retry the call, causing the operation to execute twice.

Reference graceful shutdown implementation

// Credential store — centralized so we can zero it on shutdown
const credentialStore = {
  dbConnectionString: process.env.DATABASE_URL,
  apiKey: process.env.API_KEY,

  clear() {
    // Overwrite with zeros before releasing the reference
    if (this.dbConnectionString) {
      this.dbConnectionString = '\0'.repeat(this.dbConnectionString.length);
      this.dbConnectionString = null;
    }
    if (this.apiKey) {
      this.apiKey = '\0'.repeat(this.apiKey.length);
      this.apiKey = null;
    }
  }
};

// Track in-flight tool calls
let inFlightCount = 0;
const inFlightDone = () => new Promise<void>(resolve => {
  if (inFlightCount === 0) return resolve();
  const interval = setInterval(() => {
    if (inFlightCount === 0) { clearInterval(interval); resolve(); }
  }, 50);
});

// Graceful shutdown handler
async function shutdown(signal: string) {
  console.log(`${signal} received — starting graceful shutdown`);

  // 1. Stop accepting new tool calls
  server.stopAccepting();

  // 2. Wait for in-flight tool calls to complete (max 10s)
  await Promise.race([
    inFlightDone(),
    new Promise(resolve => setTimeout(resolve, 10_000))
  ]);

  // 3. Flush audit log buffer
  await auditLog.flush();

  // 4. Close database connections
  await db.end();

  // 5. Zero credential variables
  credentialStore.clear();

  // 6. Flush OpenTelemetry spans
  await tracerProvider.shutdown();

  console.log('Graceful shutdown complete');
  process.exit(0);
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT',  () => shutdown('SIGINT'));

The in-flight timeout tradeoff

The 10-second timeout in the above example is a deliberate tradeoff. Some tool calls may be running long-running database queries or slow API calls. Setting the timeout too long delays deployment rollouts and causes the orchestrator to send SIGKILL when its own timeout expires — which is worse than a clean exit. Setting it too short drops in-flight operations.

The right value depends on your tool's P99 latency. If your slowest tool typically completes in under 2 seconds, a 5-second timeout is generous. If you have tools that run 30-second batch operations, you need a way to cancel them explicitly (cancellation tokens or AbortController) rather than just waiting.

// Cancellation-aware tool call wrapper
const abortControllers = new Map<string, AbortController>();

server.tool('longRunningQuery', {
  handler: async (args, { callId }) => {
    const ac = new AbortController();
    abortControllers.set(callId, ac);
    inFlightCount++;

    try {
      const result = await db.query(args.sql, { signal: ac.signal });
      return result;
    } finally {
      inFlightCount--;
      abortControllers.delete(callId);
    }
  }
});

// During shutdown: cancel all in-flight before waiting
async function shutdown(signal: string) {
  server.stopAccepting();

  // Cancel all in-flight operations before waiting
  for (const [callId, ac] of abortControllers) {
    ac.abort(new Error(`Server shutting down: ${signal}`));
  }

  await inFlightDone();
  await auditLog.flush();
  await db.end();
  credentialStore.clear();
  process.exit(0);
}

Vault lease revocation on shutdown

MCP servers that use HashiCorp Vault or a similar secrets manager with short-lived leases should revoke their leases explicitly on graceful shutdown. An expired lease that was not revoked leaves a window — however small — where the credential is technically still valid:

async function shutdown(signal: string) {
  // ... stop accepting, wait for in-flight ...

  // Revoke Vault leases before closing the Vault client
  if (vault.currentLeaseId) {
    try {
      await vault.revokeLease(vault.currentLeaseId);
    } catch (e) {
      // Log but don't block shutdown — the TTL-based expiry is the fallback
      console.error('Vault lease revocation failed:', e.message);
    }
  }

  await auditLog.flush();
  await db.end();
  credentialStore.clear();
  process.exit(0);
}

What SkillAudit checks

SkillAudit's static analysis examines shutdown handling in MCP server code and flags:

Absence of a SIGTERM handler (process exits immediately, no cleanup)
SIGTERM handlers that call process.exit() synchronously without waiting for in-flight operations
Audit log clients with buffered writes and no explicit flush() call in the shutdown path
Module-level credential variables with no clearing logic in the shutdown handler
Database connection pools with no explicit end() or destroy() in shutdown

Missing graceful shutdown is a common finding in community MCP servers and contributes to the Security sub-score. Run a free audit at skillaudit.dev.