MCP server sandboxing: running untrusted tool code in isolated processes

When an MCP server executes code that came from a user, a community plugin, or a third-party tool registry, that code runs with the same permissions as the server process unless you explicitly isolate it. This guide covers three isolation layers — the Node.js vm module, Worker threads, and Docker sidecars — and explains what each one actually prevents, what it cannot stop, and when to reach for each.

Why MCP servers need process isolation

Most MCP server authors do not think of themselves as running untrusted code. They wrote the tool handlers themselves. But several common patterns introduce third-party code into the execution path:

In each case the attacker's goal is the same: escalate from "I can influence what your tool does" to "I own your process." A sandbox's job is to make that escalation structurally impossible — or at least, to limit the blast radius to a disposable child process rather than the MCP host itself.

SkillAudit finding: 23% of MCP servers in the Anthropic Skills Directory accept at least one argument that is subsequently passed to eval(), Function(), or a template engine with code execution capabilities. Of those, fewer than 3% implement any process-level isolation around that execution path.

Isolation layer 1: the Node.js vm module

The vm module provides a sandboxed V8 context within the same process. Code runs in a separate JavaScript context — it cannot access the module cache, cannot require() built-in modules, and cannot read variables from the outer scope unless you explicitly inject them into the sandbox context. It is Node's built-in answer to "I need to evaluate user code without giving it access to my application state."

The critical word is within the same process. A vm sandbox stops JavaScript-level access — prototype pollution attacks on the outer heap, accidental variable leakage, unauthorized module imports. It does not stop native escapes. An attacker who can call process.binding() through any reference in the sandbox context, or who can exploit a V8 vulnerability, breaks out of the vm jail and is back in the host process.

Low isolation

vm module — JavaScript context boundary only

Same process, same OS user, same memory space. Stops JS-level access; cannot stop native escapes.

The vm module is best used to protect against accidental access (a template engine reading your environment variables) rather than against determined adversaries. For tool code that processes user-supplied expressions with no escalation path, it's appropriate. For code that comes from an untrusted third party, it is not sufficient alone.

WRONG — eval with no sandboxing

// WRONG: user-supplied expression runs in the host process context
// Can read process.env, require() any module, call fs.readFileSync
function evaluateUserFormula(expression, variables) {
  const fn = new Function(...Object.keys(variables), `return ${expression}`);
  return fn(...Object.values(variables));
}

RIGHT — vm module with frozen context and timeout

import vm from 'node:vm';

function evaluateUserFormula(expression, variables) {
  // RIGHT: build a context with only the variables the user needs
  // Do NOT inject: process, require, Buffer, global, __dirname
  const sandbox = Object.freeze({
    ...variables,
    Math,           // safe — pure functions, no side effects
    JSON,           // safe — no I/O
    // explicitly exclude: process, require, globalThis, __proto__
  });

  const context = vm.createContext(sandbox);

  // RIGHT: set timeout to prevent infinite loops blocking the event loop
  const result = vm.runInContext(
    `(function() { "use strict"; return (${expression}); })()`,
    context,
    {
      timeout: 50,           // ms — kills infinite loops
      microtaskMode: 'afterEval',  // flush microtasks before returning
      filename: 'user-formula.js', // shows in stack traces
    }
  );

  // RIGHT: validate that result is a plain value, not a function or object
  // that holds references back to the sandbox
  if (typeof result === 'function' || (result !== null && typeof result === 'object')) {
    throw new TypeError('Formula must return a primitive value');
  }

  return result;
}

vm is not a security boundary against determined attackers. Known escape techniques include: accessing this.constructor.constructor (the outer Function) through prototype chain traversal, exploiting V8 engine bugs, and smuggling references through shared built-in objects like Error. Freeze your sandbox context, whitelist only primitives, and treat vm as defense-in-depth rather than the primary control.

Isolation layer 2: Worker threads

Worker threads run in a separate V8 isolate within the same OS process. Each Worker has its own heap, its own JavaScript context, and its own module registry. Communication between the main thread and a Worker uses structured clone (which serializes data rather than sharing references) or transferable objects. A Worker cannot directly read or write the main thread's memory — there is no shared heap between isolates.

The key improvement over vm is the isolate boundary: prototype pollution and cross-context reference attacks are much harder because the Worker's Object.prototype is a different object from the main thread's. However, Workers still share the same OS process — the same file descriptors, the same OS user, and the same Linux capabilities. A Worker that calls fs.readFileSync('/etc/shadow') will succeed if the process has that permission. Workers stop JavaScript-level attacks; they do not provide OS-level isolation.

Medium isolation

Worker threads — separate V8 isolate, same OS process

Separate heap and module registry. Stops JS cross-context attacks; does not stop OS-level access.

Worker threads are appropriate for executing community plugins where you trust the author's intentions but want defence against bugs or prototype pollution. They are not appropriate for executing code from adversarial sources — a malicious Worker can read environment variables, open network connections, and write to any path the parent process can reach.

RIGHT — Worker thread execution with structured-clone message passing

// worker-runner.js — spawned per tool invocation, exits when done
import { workerData, parentPort } from 'node:worker_threads';
import { runUserTool } from './user-tool-registry.js';

// Worker has its own module registry — cannot access parent's require cache
async function main() {
  try {
    const result = await runUserTool(workerData.toolName, workerData.args);

    // RIGHT: structured clone prevents passing non-serializable references
    parentPort.postMessage({ ok: true, result });
  } catch (err) {
    parentPort.postMessage({ ok: false, error: err.message, code: err.code });
  }
}

main();
// host-server.js — spawns a Worker for each tool call, enforces timeout
import { Worker } from 'node:worker_threads';
import path from 'node:path';

const WORKER_TIMEOUT_MS = 5000;

async function runToolInWorker(toolName, args) {
  return new Promise((resolve, reject) => {
    const worker = new Worker(
      path.join(import.meta.dirname, 'worker-runner.js'),
      {
        workerData: { toolName, args },
        // RIGHT: restrict which built-in modules the Worker can load
        // (Node 22+: resourceLimits for memory and CPU caps)
        resourceLimits: {
          maxOldGenerationSizeMb: 128,   // cap heap per worker
          maxYoungGenerationSizeMb: 32,
          codeRangeSizeMb: 16,
        },
      }
    );

    const timeout = setTimeout(() => {
      worker.terminate();  // RIGHT: kill the Worker if it exceeds the time limit
      reject(new Error('Tool execution timeout'));
    }, WORKER_TIMEOUT_MS);

    worker.once('message', (msg) => {
      clearTimeout(timeout);
      if (msg.ok) resolve(msg.result);
      else reject(Object.assign(new Error(msg.error), { code: msg.code }));
    });

    worker.once('error', (err) => {
      clearTimeout(timeout);
      reject(err);
    });

    // RIGHT: clean up the worker when it finishes
    worker.once('exit', (code) => {
      clearTimeout(timeout);
      if (code !== 0) reject(new Error(`Worker exited with code ${code}`));
    });
  });
}

A Worker with resourceLimits cannot allocate more heap than allowed, which stops memory exhaustion attacks. Combined with a hard timeout via worker.terminate(), this prevents the two most common denial-of-service vectors from community tool code. For the security model of Workers versus vm, see the rate limiting deep-dive which covers the resource exhaustion threat model in detail.

Isolation layer 3: Docker sidecar processes

A Docker sidecar isolates tool execution at the OS level: a separate container has its own filesystem namespace, its own network namespace (which can be air-gapped), its own PID namespace, and runs as a different OS user. Even if an attacker achieves arbitrary code execution inside the container, they cannot read the host filesystem, cannot connect to the MCP server's internal network, and cannot escalate to the host OS without a separate container escape (a much harder attack requiring a kernel-level vulnerability).

The tradeoff is latency and operational complexity. Spinning up a new container per tool invocation is too slow for interactive MCP use — round-trip latency in the hundreds of milliseconds. In practice, Docker sidecars work best with a warm pool of pre-started containers that the MCP host dispatches requests to, with containers recycled after each use to prevent state accumulation.

High isolation

Docker sidecar — OS-level namespace isolation

Separate filesystem, network, and process namespaces. Appropriate for executing adversarially-supplied code.

Use Docker sidecars when the tool code is adversarially untrusted: user-submitted code in a code-execution service, dynamic plugins from an open marketplace, or any tool that processes external content that could contain an exploit. The container should run as a non-root user, have a read-only filesystem except for a /tmp scratchpad, have no network access, and have resource limits set via --memory and --cpu-quota.

Sidecar dispatch with warm container pool

// docker-pool.js — manages a pool of pre-warmed execution containers
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';

const execFileAsync = promisify(execFile);

// RIGHT: pre-warm a fixed pool of containers; recycle after each use
class DockerSandboxPool {
  #containers = [];
  #available = [];
  #poolSize;

  constructor(poolSize = 4, image = 'skillaudit/tool-sandbox:latest') {
    this.#poolSize = poolSize;
    this.#image = image;
  }

  async init() {
    for (let i = 0; i < this.#poolSize; i++) {
      const id = await this.#spawnContainer();
      this.#containers.push(id);
      this.#available.push(id);
    }
  }

  async #spawnContainer() {
    // RIGHT: run container with hard resource limits and no network
    const { stdout } = await execFileAsync('docker', [
      'run', '--detach',
      '--rm',                              // auto-remove on exit
      '--network', 'none',                 // RIGHT: no outbound network access
      '--read-only',                       // RIGHT: read-only root filesystem
      '--tmpfs', '/tmp:size=32m',          // writable scratchpad, limited size
      '--memory', '128m',                  // RIGHT: cap memory
      '--memory-swap', '128m',             // RIGHT: no swap — prevents disk writes
      '--cpu-quota', '50000',              // RIGHT: 50ms per 100ms = 50% of one core
      '--pids-limit', '50',               // RIGHT: prevent fork bombs
      '--cap-drop', 'ALL',                 // RIGHT: drop all Linux capabilities
      '--security-opt', 'no-new-privileges',
      '--user', '65534:65534',             // RIGHT: run as nobody:nogroup
      this.#image,
      '/bin/sh', '-c', 'while true; do sleep 1; done',  // idle process
    ]);
    return stdout.trim();
  }

  async execute(code, timeoutMs = 5000) {
    if (this.#available.length === 0) {
      throw new Error('Sandbox pool exhausted — try again shortly');
    }

    const containerId = this.#available.pop();

    try {
      // RIGHT: pass code via stdin, not as a shell argument (prevents injection)
      const { stdout, stderr } = await execFileAsync(
        'docker',
        ['exec', '-i', containerId, 'node', '--input-type=module'],
        { input: code, timeout: timeoutMs }
      );
      return { stdout, stderr };
    } finally {
      // RIGHT: recycle the container — remove and spawn a fresh one
      // This prevents state accumulation between executions
      this.#recycle(containerId);
    }
  }

  async #recycle(containerId) {
    // Kill the used container and replace it with a fresh one
    execFileAsync('docker', ['kill', containerId]).catch(() => {});
    const newId = await this.#spawnContainer();
    this.#available.push(newId);
  }
}

The container hardening above — no network, read-only filesystem, dropped capabilities, no-new-privileges — makes this a genuine security boundary for most threat models. The remaining attack surface is container escape via kernel exploits (mitigated by keeping the host kernel patched) and timing side channels (not currently relevant to most MCP server threat models).

What each isolation layer blocks

The table below shows how each approach handles the eight most common attack vectors against MCP server tool execution sandboxes. This matters because choosing the wrong layer for your threat model gives you false confidence — you deploy a vm sandbox and believe you're protected from supply-chain attacks that it cannot stop.

Attack vector vm module Worker threads Docker sidecar
Read outer scope variables / process.env Blocked Blocked Blocked
Prototype pollution of host objects Partial (escapable) Blocked Blocked
Infinite loop / CPU exhaustion Blocked (timeout) Blocked (terminate) Blocked (cpu-quota)
Memory exhaustion Open Blocked (resourceLimits) Blocked (--memory)
Filesystem read (arbitrary paths) Open (same process) Open (same process) Blocked (read-only + no host mount)
Outbound network connections Open Open Blocked (--network none)
Fork bomb / process exhaustion Open Partial (no fork, but Worker spawn) Blocked (--pids-limit)
Supply chain: malicious module init code Open (same require cache) Partial (own module registry) Blocked (immutable container image)

Choosing the right layer for your use case

There is no single right answer — the correct isolation layer depends on who wrote the code and what you're trying to prevent.

vm module

User-supplied math expressions or template strings. Author is the user — threat is accidental scope access, not exploitation. Response time must be under 5ms.

Worker threads

Community plugins from a trusted registry with human review. Want to isolate bugs and prototype pollution without Docker overhead. Still need to trust the code's intent.

Docker sidecar

User-submitted code, open marketplace plugins, anything that processes untrusted external content. Latency budget of 200ms+ acceptable. OS-level blast radius reduction required.

All three combined

Defense in depth: vm inside a Worker inside a container. Adds latency but limits blast radius even if a container escape occurs. Appropriate for execution-as-a-service products.

What SkillAudit scans for

When SkillAudit audits an MCP server, the execution-path analysis checks for several sandbox-relevant patterns:

The input validation patterns guide covers the complementary layer: validating and sanitizing inputs before they reach the sandbox, so you're not relying on the sandbox to handle malformed or oversized inputs.

Operational hardening for Docker sidecars

Beyond the container flags shown above, three operational practices materially reduce the risk from Docker sandboxes in production:

1. Immutable container images — no package manager in the sandbox

# Dockerfile for tool execution sandbox
# RIGHT: multi-stage build removes package manager and build tools from final image
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev

FROM node:22-alpine AS runtime
# RIGHT: remove package manager binaries — attacker cannot install new packages
RUN apk del apk-tools && \
    rm -rf /usr/local/lib/node_modules/npm /usr/bin/npm /usr/bin/npx

WORKDIR /sandbox
# RIGHT: copy only the node_modules and runtime — no source, no build tools
COPY --from=builder /app/node_modules ./node_modules
COPY sandbox-runner.js ./

# RIGHT: drop to non-root before CMD
USER 65534:65534

CMD ["node", "sandbox-runner.js"]

2. seccomp profile — limit available syscalls

// Launch container with a seccomp profile that blocks the most dangerous syscalls
// RIGHT: use Docker's default seccomp profile, then additionally block:
// ptrace (process inspection), socket (if --network none, belt + suspenders),
// mount, unshare, clone with CLONE_NEWUSER (privilege escalation via user namespaces)

// In execFileAsync call, add:
'--security-opt', 'seccomp=/etc/docker/sandbox-seccomp.json',

3. Network egress control at the host level — defence in depth

Even with --network none, it's worth adding a host-level iptables rule that drops all traffic from the container network CIDR. Defense in depth means that a --network none bypass (rare but possible with some Docker versions and kernel configurations) doesn't immediately result in a live network connection.

Quick wins for existing MCP servers

If you have an existing MCP server with an eval-adjacent pattern and want to reduce risk without a full sidecar migration:

  1. Add timeout + strict mode to any existing vm.runInContext calls (30 minutes) — the most common vm usage omits the timeout option. Adding timeout: 100 prevents loop-based CPU exhaustion. Adding "use strict"; to the evaluated code prevents some prototype chain escapes. Estimated impact: eliminates denial-of-service vectors at zero architectural cost.
  2. Move the eval call into a Worker thread (2–4 hours) — extracting the unsafe code into a worker_threads.Worker file with resourceLimits adds heap and CPU isolation without requiring Docker. The structured-clone message boundary is also a natural place to validate output shape before it reaches the main thread.
  3. Audit what you inject into vm contexts (1 hour) — grep your codebase for vm.createContext( and audit every object key in the argument. Any key whose value is or holds a reference to process, require, Buffer, global, or a built-in constructor that exposes constructor.constructor is a sandbox escape vector. Remove or freeze it.

Sandboxing is the deepest layer of the MCP server security model — it's what you reach for when all upstream controls (input validation, authentication, rate limiting) have already been bypassed. The three layers above give you a principled choice based on your threat model, with Docker sidecars as the hard floor for adversarial code. Run a SkillAudit scan to see which execution paths in your server are flagged for missing isolation.