MCP Server Security — Credential Bootstrapping

MCP server secret zero — bootstrapping credentials without hardcoding them

Every containerised MCP server faces the same bootstrap paradox: to fetch secrets from a secrets manager, the server needs a credential — but how does that first credential get in without being hardcoded? This is the secret zero problem. The wrong answers (Dockerfile ENV, --build-arg, CI-mounted .env) leave the credential visible in Docker layer history long after it has been "removed." The right answers — IMDSv2 instance identity, SPIFFE/SPIRE SVIDs, and Vault agent init containers — give the container a cryptographic proof of identity that it exchanges for short-lived secrets, with no hardcoded value anywhere in the chain.

Pattern 1: Instance identity with AWS IMDSv2

When an MCP server runs on an EC2 instance (or an ECS task with a task role), it can use the Instance Metadata Service to obtain temporary IAM credentials bound to the instance's attached role — no stored secret required. The critical detail is the version: IMDSv1 accepts any GET to http://169.254.169.254/latest/meta-data/iam/security-credentials/ with no authentication, making it trivially exploitable via SSRF. IMDSv2 adds a required session token obtained via a PUT request with a TTL header, so an SSRF-driven GET to the metadata endpoint without the token returns a 401.

The Node.js two-step — obtain a session token, then use it to fetch the role credential — uses fetch directly since Node 18+:

// bootstrap/imdsv2.ts
// Step 1: acquire a session token (valid for up to 21600 seconds / 6 hours)
async function getImdsToken(ttlSeconds = 21600): Promise<string> {
  const res = await fetch(
    'http://169.254.169.254/latest/api/token',
    {
      method: 'PUT',
      headers: { 'X-aws-ec2-metadata-token-ttl-seconds': String(ttlSeconds) },
    }
  );
  if (!res.ok) throw new Error(`IMDSv2 token request failed: ${res.status}`);
  return res.text();
}

// Step 2: discover the attached IAM role name
async function getRoleName(token: string): Promise<string> {
  const res = await fetch(
    'http://169.254.169.254/latest/meta-data/iam/security-credentials/',
    { headers: { 'X-aws-ec2-metadata-token': token } }
  );
  if (!res.ok) throw new Error(`IMDS role lookup failed: ${res.status}`);
  return (await res.text()).trim();
}

// Step 3: exchange the role name for temporary STS credentials
async function getRoleCredentials(token: string, roleName: string) {
  const res = await fetch(
    `http://169.254.169.254/latest/meta-data/iam/security-credentials/${roleName}`,
    { headers: { 'X-aws-ec2-metadata-token': token } }
  );
  if (!res.ok) throw new Error(`IMDS credential fetch failed: ${res.status}`);
  const creds = await res.json() as {
    AccessKeyId: string;
    SecretAccessKey: string;
    Token: string;
    Expiration: string;
  };
  return creds;
}

// Step 4: use the temporary credentials to call Secrets Manager
import { SecretsManagerClient, GetSecretValueCommand } from '@aws-sdk/client-secrets-manager';

export async function bootstrapFromImds(): Promise<string> {
  const token   = await getImdsToken();
  const role    = await getRoleName(token);
  const creds   = await getRoleCredentials(token, role);

  const sm = new SecretsManagerClient({
    region: process.env.AWS_REGION ?? 'us-east-1',
    credentials: {
      accessKeyId:     creds.AccessKeyId,
      secretAccessKey: creds.SecretAccessKey,
      sessionToken:    creds.Token,
      expiration:      new Date(creds.Expiration),
    },
  });

  const result = await sm.send(
    new GetSecretValueCommand({ SecretId: 'mcp-server/api-key' })
  );
  return result.SecretString!;
}

Disable IMDSv1 at the AWS account level with an SCP that requires ec2:MetadataHttpTokens: required. At the instance level, pass --metadata-options HttpTokens=required to aws ec2 run-instances. The AWS SDK v3 handles IMDSv2 automatically when you use fromInstanceMetadata() from @aws-sdk/credential-providers — prefer that over the raw fetch above in production; the raw version is shown here to make the two-step explicit.

Pattern 2: Workload identity without IMDS — SPIFFE/SPIRE SVIDs

On Kubernetes (or any platform where a SPIRE agent runs on each node), the MCP server can obtain a cryptographic identity with no bootstrap secret at all. The SPIRE agent runs as a DaemonSet pod; it attests its own node using a platform attestor (the kubelet API, for example) and then issues X.509 SVIDs to workloads running on that node via a Unix domain socket at /var/run/spire/agent.sock. The SVID is short-lived (typically 1 hour) and auto-rotated by the agent.

The Node.js client uses the @spiffe/spiffe-js package, which wraps the Workload API gRPC interface:

// bootstrap/spiffe.ts
import { WorkloadApiClient, X509Source } from '@spiffe/spiffe-js';

export async function bootstrapFromSpiffe(): Promise<string> {
  // Connect to the SPIRE agent socket — no credentials needed.
  // The kernel enforces that only processes with the correct UID/GID
  // can reach this socket path via the SPIRE agent's allow_uids setting.
  const source = await X509Source.create({
    socketPath: process.env.SPIFFE_ENDPOINT_SOCKET
      ?? 'unix:///var/run/spire/agent.sock',
  });

  // The SVID is an X.509 certificate chain with a SPIFFE URI in the SAN:
  // spiffe://example.org/ns/prod/sa/mcp-server
  const svid = await source.getX509SVID();
  console.log('SPIFFE ID:', svid.spiffeId);

  // Use the SVID to authenticate to Vault via the cert auth method.
  // Vault validates the SVID against the configured SPIFFE trust bundle.
  const vaultRes = await fetch('https://vault.internal:8200/v1/auth/cert/login', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ name: 'mcp-server' }),
    // Node 18+ TLS: provide the SVID cert + key as the client certificate
    // In production use a proper TLS agent; simplified here for clarity.
  });

  if (!vaultRes.ok) {
    throw new Error(`Vault cert auth failed: ${vaultRes.status}`);
  }

  const { auth } = await vaultRes.json() as { auth: { client_token: string } };

  // Fetch the actual application secret using the short-lived Vault token
  const secretRes = await fetch(
    'https://vault.internal:8200/v1/secret/data/mcp-server/api-key',
    { headers: { 'X-Vault-Token': auth.client_token } }
  );

  const { data } = await secretRes.json() as {
    data: { data: { value: string } }
  };

  // Rotate the X509Source when done — it auto-renews, but explicitly
  // close it once the bootstrap is complete to avoid holding the socket.
  await source.close();

  return data.data.value;
}

The SPIRE agent handles SVID rotation transparently. If the MCP server needs to make long-running Vault API calls (e.g., for dynamic secrets with leases), keep the X509Source open and call source.getX509SVID() before each Vault authentication rather than caching the SVID — it may have expired. Set the SPIRE agent's default_svid_ttl lower than the Vault token TTL so the certificate is always fresh when it is used to authenticate.

Pattern 3: Vault agent init container for Kubernetes

The init container pattern decouples the MCP server completely from the Vault API. A vault-agent init container runs before the MCP server container, authenticates to Vault using the pod's Kubernetes service account token (Vault's kubernetes auth method verifies the token against the cluster's OIDC endpoint), writes rendered secrets to a shared emptyDir volume, then exits. The MCP server container starts after the init container exits successfully and reads the secret from the file — it never calls Vault and has no Vault token.

# kubernetes/mcp-server-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: mcp-server
  annotations:
    # Vault agent injection annotations (Vault Agent Injector sidecar alternative)
    # Using explicit init container here for clarity.
spec:
  serviceAccountName: mcp-server  # bound to Vault role via k8s auth

  volumes:
    - name: secrets-vol
      emptyDir:
        medium: Memory   # tmpfs — not written to node disk
    - name: vault-agent-config
      configMap:
        name: vault-agent-config

  initContainers:
    - name: vault-agent-init
      image: hashicorp/vault:1.17
      command:
        - vault
        - agent
        - -config=/vault/config/agent.hcl
        - -exit-after-auth   # exit once secrets are written; don't run as sidecar
      env:
        - name: VAULT_ADDR
          value: "https://vault.internal:8200"
      volumeMounts:
        - name: secrets-vol
          mountPath: /vault/secrets
        - name: vault-agent-config
          mountPath: /vault/config

  containers:
    - name: mcp-server
      image: ghcr.io/my-org/mcp-server:latest
      env:
        - name: SECRETS_FILE
          value: /vault/secrets/mcp-server.json
      volumeMounts:
        - name: secrets-vol
          mountPath: /vault/secrets
          readOnly: true

# configmap: vault-agent-config (agent.hcl)
auto_auth {
  method "kubernetes" {
    mount_path = "auth/kubernetes"
    config = {
      role = "mcp-server"
    }
  }
}

template {
  destination = "/vault/secrets/mcp-server.json"
  contents = <<EOT
{
  "apiKey": "{{ with secret "secret/data/mcp-server/api-key" }}{{ .Data.data.value }}{{ end }}"
}
EOT
}

// bootstrap/vault-file.ts — MCP server side
import { readFileSync } from 'node:fs';

interface McpSecrets {
  apiKey: string;
}

export function loadSecretsFromFile(): McpSecrets {
  const path = process.env.SECRETS_FILE ?? '/vault/secrets/mcp-server.json';
  let raw: string;
  try {
    raw = readFileSync(path, 'utf8');
  } catch (err) {
    throw new Error(
      `Secret file not found at ${path}. ` +
      `Did the vault-agent init container complete successfully?`
    );
  }
  const parsed = JSON.parse(raw) as McpSecrets;
  if (!parsed.apiKey) throw new Error('apiKey missing from secrets file');
  return parsed;
}

The init container fails the pod startup if Vault authentication fails, which means the MCP server never starts with empty credentials — a far safer failure mode than starting with a missing secret and erroring at first tool call. Use a PodDisruptionBudget and a Vault HA cluster to ensure the init container can always reach Vault during rolling deployments.

Pattern 4: What NOT to do — three antipatterns

Each of these patterns appears in the SkillAudit corpus and represents a different way to permanently embed a secret in a Docker image or CI pipeline artifact.

# ANTIPATTERN 1: Secret in Dockerfile ENV instruction
# Visible in: docker inspect <image> | jq '.[0].Config.Env'
#             docker history <image> --no-trunc
#             any registry that hosts the image
FROM node:22-alpine
ENV SECRET_KEY=sk-prod-abc123XYZ789   # ← PERMANENT in image metadata
COPY . .
RUN npm ci
CMD ["node", "server.js"]

# ANTIPATTERN 2: Secret via --build-arg
# Visible in: docker history <image> --no-trunc
#             the intermediate layer that ran RUN with ARG in scope
FROM node:22-alpine
ARG VAULT_TOKEN                         # ← leaks into layer history
ENV VAULT_TOKEN=${VAULT_TOKEN}          # ← doubly exposed in image config
COPY . .
RUN npm ci && node scripts/fetch-config.js  # runs with VAULT_TOKEN in env
CMD ["node", "server.js"]

# At build time:
# docker build --build-arg VAULT_TOKEN=hvs.AAAA... .
# docker history myimage --no-trunc | grep VAULT_TOKEN  → value visible

# ANTIPATTERN 3: .env file written in CI and mounted into container
# Secret appears in: CI job logs, CI artifact storage,
#                   any runner that cached the workspace

# .github/workflows/deploy.yml (BAD)
- name: Write secrets to .env
  run: |
    echo "API_KEY=${{ secrets.API_KEY }}" >> .env   # secret in runner logs
    echo "DB_PASS=${{ secrets.DB_PASS }}" >> .env

- name: Run container
  run: docker run --env-file .env ghcr.io/my-org/mcp-server:latest

# The .env file may be:
# - cached in the runner's workspace between jobs
# - uploaded as a debug artifact
# - visible in the step's stdout if 'set -x' is active
# - readable by other jobs on the same self-hosted runner

The common thread across all three antipatterns: the secret is baked into a build-time artifact that outlives the process that created it. Docker image layers are content-addressed and immutable — you cannot retroactively remove a layer without rebuilding the image from scratch and re-pushing it to every registry and cache. Always pass secrets to running containers via runtime mechanisms (environment variables set by the orchestrator at pod scheduling time, mounted secret volumes, or the init-container file pattern above) — never at image build time.

SkillAudit findings

The following findings are raised by SkillAudit's static analysis and Docker image inspection passes when scanning MCP server repositories:

CRITICAL Hardcoded credential in Dockerfile ENV instruction. A secret matching a known format (AWS key, OpenAI key, Stripe key, GitHub PAT, Vault token) found in a ENV instruction in a Dockerfile or docker-compose.yml. The secret is permanently embedded in the image config layer and visible via docker inspect.

CRITICAL Secret passed as Docker --build-arg (visible in layer history). A Dockerfile ARG instruction whose name matches a credential pattern, used in a RUN step where the arg is in scope. The value passed at docker build --build-arg time is visible in docker history --no-trunc for that layer.

HIGH IMDSv1 endpoint used without session token header. A fetch or HTTP client call to http://169.254.169.254/latest/meta-data/ without a preceding PUT to obtain a session token and without the X-aws-ec2-metadata-token header on the GET. IMDSv1 is trivially exploitable via SSRF: any HTTP client that follows redirects to 169.254.169.254 can retrieve IAM credentials.

HIGH Vault token stored as environment variable instead of using AppRole or SPIFFE auth. A long-lived Vault token (hvs.* or s.* prefix) in process.env read-at-startup code. Long-lived Vault tokens defeat the purpose of a secrets manager — if the token leaks, all secrets it has access to are exposed for the token's full TTL.

MEDIUM No TTL on retrieved secret — cached indefinitely with no rotation check. Secrets fetched from a secrets manager at startup and stored in a module-level variable with no expiry check and no refresh schedule. If the secret is rotated in the secrets manager, the running server continues using the old value until it is restarted.

MEDIUM No startup failure on secret fetch error — server starts with empty credentials. Secret bootstrap code wrapped in a try/catch that logs a warning and falls through, allowing the MCP server process to start without valid credentials. Tool handlers then fail at call time with opaque errors rather than the server refusing to start with a clear diagnostic message.

Paste a GitHub URL at skillaudit.dev to get a graded report card.