MCP server sandboxing: running untrusted tool code in isolated processes
When an MCP server executes code that came from a user, a community plugin, or a third-party tool registry, that code runs with the same permissions as the server process unless you explicitly isolate it. This guide covers three isolation layers — the Node.js vm module, Worker threads, and Docker sidecars — and explains what each one actually prevents, what it cannot stop, and when to reach for each.
Why MCP servers need process isolation
Most MCP server authors do not think of themselves as running untrusted code. They wrote the tool handlers themselves. But several common patterns introduce third-party code into the execution path:
- Plugin/extension architectures — platforms that let users install community-built MCP tool extensions dynamically load JavaScript modules at runtime. Once a module is
require()'d, it shares the process heap, the event loop, and all open file handles. - eval-adjacent patterns — tools that accept user-supplied templates, expressions, or scripts (a formula evaluator, a code-review tool that runs the submitted snippet, a Jupyter-style executor) run attacker-controlled code as a side effect of normal operation.
- Deserialization of untrusted payloads — servers that accept serialized JavaScript objects, JSON with embedded function references, or YAML with
!!js/functiontags can execute untrusted code on parse alone. - Supply chain compromise — a dependency that ships a malicious update executes attacker code inside your process as part of module initialization. This is what SkillAudit's dependency confusion scanner is designed to detect before install.
In each case the attacker's goal is the same: escalate from "I can influence what your tool does" to "I own your process." A sandbox's job is to make that escalation structurally impossible — or at least, to limit the blast radius to a disposable child process rather than the MCP host itself.
SkillAudit finding: 23% of MCP servers in the Anthropic Skills Directory accept at least one argument that is subsequently passed to eval(), Function(), or a template engine with code execution capabilities. Of those, fewer than 3% implement any process-level isolation around that execution path.
Isolation layer 1: the Node.js vm module
The vm module provides a sandboxed V8 context within the same process. Code runs in a separate JavaScript context — it cannot access the module cache, cannot require() built-in modules, and cannot read variables from the outer scope unless you explicitly inject them into the sandbox context. It is Node's built-in answer to "I need to evaluate user code without giving it access to my application state."
The critical word is within the same process. A vm sandbox stops JavaScript-level access — prototype pollution attacks on the outer heap, accidental variable leakage, unauthorized module imports. It does not stop native escapes. An attacker who can call process.binding() through any reference in the sandbox context, or who can exploit a V8 vulnerability, breaks out of the vm jail and is back in the host process.
vm module — JavaScript context boundary only
Same process, same OS user, same memory space. Stops JS-level access; cannot stop native escapes.
The vm module is best used to protect against accidental access (a template engine reading your environment variables) rather than against determined adversaries. For tool code that processes user-supplied expressions with no escalation path, it's appropriate. For code that comes from an untrusted third party, it is not sufficient alone.
WRONG — eval with no sandboxing
// WRONG: user-supplied expression runs in the host process context
// Can read process.env, require() any module, call fs.readFileSync
function evaluateUserFormula(expression, variables) {
const fn = new Function(...Object.keys(variables), `return ${expression}`);
return fn(...Object.values(variables));
}
RIGHT — vm module with frozen context and timeout
import vm from 'node:vm';
function evaluateUserFormula(expression, variables) {
// RIGHT: build a context with only the variables the user needs
// Do NOT inject: process, require, Buffer, global, __dirname
const sandbox = Object.freeze({
...variables,
Math, // safe — pure functions, no side effects
JSON, // safe — no I/O
// explicitly exclude: process, require, globalThis, __proto__
});
const context = vm.createContext(sandbox);
// RIGHT: set timeout to prevent infinite loops blocking the event loop
const result = vm.runInContext(
`(function() { "use strict"; return (${expression}); })()`,
context,
{
timeout: 50, // ms — kills infinite loops
microtaskMode: 'afterEval', // flush microtasks before returning
filename: 'user-formula.js', // shows in stack traces
}
);
// RIGHT: validate that result is a plain value, not a function or object
// that holds references back to the sandbox
if (typeof result === 'function' || (result !== null && typeof result === 'object')) {
throw new TypeError('Formula must return a primitive value');
}
return result;
}
vm is not a security boundary against determined attackers. Known escape techniques include: accessing this.constructor.constructor (the outer Function) through prototype chain traversal, exploiting V8 engine bugs, and smuggling references through shared built-in objects like Error. Freeze your sandbox context, whitelist only primitives, and treat vm as defense-in-depth rather than the primary control.
Isolation layer 2: Worker threads
Worker threads run in a separate V8 isolate within the same OS process. Each Worker has its own heap, its own JavaScript context, and its own module registry. Communication between the main thread and a Worker uses structured clone (which serializes data rather than sharing references) or transferable objects. A Worker cannot directly read or write the main thread's memory — there is no shared heap between isolates.
The key improvement over vm is the isolate boundary: prototype pollution and cross-context reference attacks are much harder because the Worker's Object.prototype is a different object from the main thread's. However, Workers still share the same OS process — the same file descriptors, the same OS user, and the same Linux capabilities. A Worker that calls fs.readFileSync('/etc/shadow') will succeed if the process has that permission. Workers stop JavaScript-level attacks; they do not provide OS-level isolation.
Worker threads — separate V8 isolate, same OS process
Separate heap and module registry. Stops JS cross-context attacks; does not stop OS-level access.
Worker threads are appropriate for executing community plugins where you trust the author's intentions but want defence against bugs or prototype pollution. They are not appropriate for executing code from adversarial sources — a malicious Worker can read environment variables, open network connections, and write to any path the parent process can reach.
RIGHT — Worker thread execution with structured-clone message passing
// worker-runner.js — spawned per tool invocation, exits when done
import { workerData, parentPort } from 'node:worker_threads';
import { runUserTool } from './user-tool-registry.js';
// Worker has its own module registry — cannot access parent's require cache
async function main() {
try {
const result = await runUserTool(workerData.toolName, workerData.args);
// RIGHT: structured clone prevents passing non-serializable references
parentPort.postMessage({ ok: true, result });
} catch (err) {
parentPort.postMessage({ ok: false, error: err.message, code: err.code });
}
}
main();
// host-server.js — spawns a Worker for each tool call, enforces timeout
import { Worker } from 'node:worker_threads';
import path from 'node:path';
const WORKER_TIMEOUT_MS = 5000;
async function runToolInWorker(toolName, args) {
return new Promise((resolve, reject) => {
const worker = new Worker(
path.join(import.meta.dirname, 'worker-runner.js'),
{
workerData: { toolName, args },
// RIGHT: restrict which built-in modules the Worker can load
// (Node 22+: resourceLimits for memory and CPU caps)
resourceLimits: {
maxOldGenerationSizeMb: 128, // cap heap per worker
maxYoungGenerationSizeMb: 32,
codeRangeSizeMb: 16,
},
}
);
const timeout = setTimeout(() => {
worker.terminate(); // RIGHT: kill the Worker if it exceeds the time limit
reject(new Error('Tool execution timeout'));
}, WORKER_TIMEOUT_MS);
worker.once('message', (msg) => {
clearTimeout(timeout);
if (msg.ok) resolve(msg.result);
else reject(Object.assign(new Error(msg.error), { code: msg.code }));
});
worker.once('error', (err) => {
clearTimeout(timeout);
reject(err);
});
// RIGHT: clean up the worker when it finishes
worker.once('exit', (code) => {
clearTimeout(timeout);
if (code !== 0) reject(new Error(`Worker exited with code ${code}`));
});
});
}
A Worker with resourceLimits cannot allocate more heap than allowed, which stops memory exhaustion attacks. Combined with a hard timeout via worker.terminate(), this prevents the two most common denial-of-service vectors from community tool code. For the security model of Workers versus vm, see the rate limiting deep-dive which covers the resource exhaustion threat model in detail.
Isolation layer 3: Docker sidecar processes
A Docker sidecar isolates tool execution at the OS level: a separate container has its own filesystem namespace, its own network namespace (which can be air-gapped), its own PID namespace, and runs as a different OS user. Even if an attacker achieves arbitrary code execution inside the container, they cannot read the host filesystem, cannot connect to the MCP server's internal network, and cannot escalate to the host OS without a separate container escape (a much harder attack requiring a kernel-level vulnerability).
The tradeoff is latency and operational complexity. Spinning up a new container per tool invocation is too slow for interactive MCP use — round-trip latency in the hundreds of milliseconds. In practice, Docker sidecars work best with a warm pool of pre-started containers that the MCP host dispatches requests to, with containers recycled after each use to prevent state accumulation.
Docker sidecar — OS-level namespace isolation
Separate filesystem, network, and process namespaces. Appropriate for executing adversarially-supplied code.
Use Docker sidecars when the tool code is adversarially untrusted: user-submitted code in a code-execution service, dynamic plugins from an open marketplace, or any tool that processes external content that could contain an exploit. The container should run as a non-root user, have a read-only filesystem except for a /tmp scratchpad, have no network access, and have resource limits set via --memory and --cpu-quota.
Sidecar dispatch with warm container pool
// docker-pool.js — manages a pool of pre-warmed execution containers
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
const execFileAsync = promisify(execFile);
// RIGHT: pre-warm a fixed pool of containers; recycle after each use
class DockerSandboxPool {
#containers = [];
#available = [];
#poolSize;
constructor(poolSize = 4, image = 'skillaudit/tool-sandbox:latest') {
this.#poolSize = poolSize;
this.#image = image;
}
async init() {
for (let i = 0; i < this.#poolSize; i++) {
const id = await this.#spawnContainer();
this.#containers.push(id);
this.#available.push(id);
}
}
async #spawnContainer() {
// RIGHT: run container with hard resource limits and no network
const { stdout } = await execFileAsync('docker', [
'run', '--detach',
'--rm', // auto-remove on exit
'--network', 'none', // RIGHT: no outbound network access
'--read-only', // RIGHT: read-only root filesystem
'--tmpfs', '/tmp:size=32m', // writable scratchpad, limited size
'--memory', '128m', // RIGHT: cap memory
'--memory-swap', '128m', // RIGHT: no swap — prevents disk writes
'--cpu-quota', '50000', // RIGHT: 50ms per 100ms = 50% of one core
'--pids-limit', '50', // RIGHT: prevent fork bombs
'--cap-drop', 'ALL', // RIGHT: drop all Linux capabilities
'--security-opt', 'no-new-privileges',
'--user', '65534:65534', // RIGHT: run as nobody:nogroup
this.#image,
'/bin/sh', '-c', 'while true; do sleep 1; done', // idle process
]);
return stdout.trim();
}
async execute(code, timeoutMs = 5000) {
if (this.#available.length === 0) {
throw new Error('Sandbox pool exhausted — try again shortly');
}
const containerId = this.#available.pop();
try {
// RIGHT: pass code via stdin, not as a shell argument (prevents injection)
const { stdout, stderr } = await execFileAsync(
'docker',
['exec', '-i', containerId, 'node', '--input-type=module'],
{ input: code, timeout: timeoutMs }
);
return { stdout, stderr };
} finally {
// RIGHT: recycle the container — remove and spawn a fresh one
// This prevents state accumulation between executions
this.#recycle(containerId);
}
}
async #recycle(containerId) {
// Kill the used container and replace it with a fresh one
execFileAsync('docker', ['kill', containerId]).catch(() => {});
const newId = await this.#spawnContainer();
this.#available.push(newId);
}
}
The container hardening above — no network, read-only filesystem, dropped capabilities, no-new-privileges — makes this a genuine security boundary for most threat models. The remaining attack surface is container escape via kernel exploits (mitigated by keeping the host kernel patched) and timing side channels (not currently relevant to most MCP server threat models).
What each isolation layer blocks
The table below shows how each approach handles the eight most common attack vectors against MCP server tool execution sandboxes. This matters because choosing the wrong layer for your threat model gives you false confidence — you deploy a vm sandbox and believe you're protected from supply-chain attacks that it cannot stop.
| Attack vector | vm module | Worker threads | Docker sidecar |
|---|---|---|---|
Read outer scope variables / process.env |
Blocked | Blocked | Blocked |
| Prototype pollution of host objects | Partial (escapable) | Blocked | Blocked |
| Infinite loop / CPU exhaustion | Blocked (timeout) | Blocked (terminate) | Blocked (cpu-quota) |
| Memory exhaustion | Open | Blocked (resourceLimits) | Blocked (--memory) |
| Filesystem read (arbitrary paths) | Open (same process) | Open (same process) | Blocked (read-only + no host mount) |
| Outbound network connections | Open | Open | Blocked (--network none) |
| Fork bomb / process exhaustion | Open | Partial (no fork, but Worker spawn) | Blocked (--pids-limit) |
| Supply chain: malicious module init code | Open (same require cache) | Partial (own module registry) | Blocked (immutable container image) |
Choosing the right layer for your use case
There is no single right answer — the correct isolation layer depends on who wrote the code and what you're trying to prevent.
vm module
User-supplied math expressions or template strings. Author is the user — threat is accidental scope access, not exploitation. Response time must be under 5ms.
Worker threads
Community plugins from a trusted registry with human review. Want to isolate bugs and prototype pollution without Docker overhead. Still need to trust the code's intent.
Docker sidecar
User-submitted code, open marketplace plugins, anything that processes untrusted external content. Latency budget of 200ms+ acceptable. OS-level blast radius reduction required.
All three combined
Defense in depth: vm inside a Worker inside a container. Adds latency but limits blast radius even if a container escape occurs. Appropriate for execution-as-a-service products.
What SkillAudit scans for
When SkillAudit audits an MCP server, the execution-path analysis checks for several sandbox-relevant patterns:
- eval() / Function() with argument interpolation — any call where a tool argument flows into
eval(),new Function(), or equivalent without vm sandboxing. This is the most common finding and rates a D or F on the security axis depending on what data is accessible to the unsandboxed code. - vm context with process/require injected — a vm sandbox that includes
process,require, orBufferin the context provides essentially no isolation. These objects are bridges back to the host process. - Worker without timeout or resource limits — spawning a Worker thread with no timeout and no
resourceLimitsallows CPU exhaustion attacks. SkillAudit flags this as a denial-of-service vector even if the code itself is trusted. - Docker exec with shell string interpolation — if a tool argument is interpolated into a shell command passed to
docker exec, the sandbox itself becomes a command injection vector. See the command injection patterns guide for the full analysis.
The input validation patterns guide covers the complementary layer: validating and sanitizing inputs before they reach the sandbox, so you're not relying on the sandbox to handle malformed or oversized inputs.
Operational hardening for Docker sidecars
Beyond the container flags shown above, three operational practices materially reduce the risk from Docker sandboxes in production:
1. Immutable container images — no package manager in the sandbox
# Dockerfile for tool execution sandbox
# RIGHT: multi-stage build removes package manager and build tools from final image
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
FROM node:22-alpine AS runtime
# RIGHT: remove package manager binaries — attacker cannot install new packages
RUN apk del apk-tools && \
rm -rf /usr/local/lib/node_modules/npm /usr/bin/npm /usr/bin/npx
WORKDIR /sandbox
# RIGHT: copy only the node_modules and runtime — no source, no build tools
COPY --from=builder /app/node_modules ./node_modules
COPY sandbox-runner.js ./
# RIGHT: drop to non-root before CMD
USER 65534:65534
CMD ["node", "sandbox-runner.js"]
2. seccomp profile — limit available syscalls
// Launch container with a seccomp profile that blocks the most dangerous syscalls // RIGHT: use Docker's default seccomp profile, then additionally block: // ptrace (process inspection), socket (if --network none, belt + suspenders), // mount, unshare, clone with CLONE_NEWUSER (privilege escalation via user namespaces) // In execFileAsync call, add: '--security-opt', 'seccomp=/etc/docker/sandbox-seccomp.json',
3. Network egress control at the host level — defence in depth
Even with --network none, it's worth adding a host-level iptables rule that drops all traffic from the container network CIDR. Defense in depth means that a --network none bypass (rare but possible with some Docker versions and kernel configurations) doesn't immediately result in a live network connection.
Quick wins for existing MCP servers
If you have an existing MCP server with an eval-adjacent pattern and want to reduce risk without a full sidecar migration:
- Add timeout + strict mode to any existing vm.runInContext calls (30 minutes) — the most common vm usage omits the
timeoutoption. Addingtimeout: 100prevents loop-based CPU exhaustion. Adding"use strict";to the evaluated code prevents some prototype chain escapes. Estimated impact: eliminates denial-of-service vectors at zero architectural cost. - Move the eval call into a Worker thread (2–4 hours) — extracting the unsafe code into a
worker_threads.Workerfile withresourceLimitsadds heap and CPU isolation without requiring Docker. The structured-clone message boundary is also a natural place to validate output shape before it reaches the main thread. - Audit what you inject into vm contexts (1 hour) — grep your codebase for
vm.createContext(and audit every object key in the argument. Any key whose value is or holds a reference toprocess,require,Buffer,global, or a built-in constructor that exposesconstructor.constructoris a sandbox escape vector. Remove or freeze it.
Sandboxing is the deepest layer of the MCP server security model — it's what you reach for when all upstream controls (input validation, authentication, rate limiting) have already been bypassed. The three layers above give you a principled choice based on your threat model, with Docker sidecars as the hard floor for adversarial code. Run a SkillAudit scan to see which execution paths in your server are flagged for missing isolation.