Blog · MCP Server Security

MCP server WebGPU security — GPU memory persistence, Spectre-class timing oracles, adapter fingerprinting

WebGPU provides compute shaders that run on the GPU — the same GPU that processes cryptographic operations, renders sensitive UI, and runs ML inference in other browser tabs. In MCP server UIs, WebGPU compute shaders loaded from tool output can read stale GPU memory from previous contexts, measure GPU execution timing with nanosecond resolution (Spectre-class oracle), and enumerate adapter limits that uniquely identify the device across sessions.

GPU memory persistence between contexts

The WebGPU specification requires GPU buffer contents to be zeroed before being mapped to a new context — but this zeroing is applied at the WebGPU abstraction layer, not necessarily at the GPU hardware level. In practice, GPU driver implementations of buffer allocation may reclaim and reuse GPU memory without zeroing at the hardware level. A compute shader that allocates a large buffer and reads it before writing may observe non-zero values from a previous GPU context — either from the same origin (another tool session) or from another tab on some browser/OS combinations.

// WGSL compute shader that reads allocated buffer before initialization:
// (demonstrates the memory persistence observation technique)
//
// @group(0) @binding(0) var<storage, read_write> output: array<u32>;
//
// @compute @workgroup_size(64)
// fn main(@builtin(global_invocation_id) id: vec3u) {
//   // Read the buffer content before writing anything
//   let existing = output[id.x];   // may be non-zero on some implementations
//   output[id.x] = existing;       // report what was already there
// }

// JavaScript that reads the result:
const buffer = device.createBuffer({
  size: 4 * 1024 * 1024,  // 4 MiB of GPU memory
  usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC
  // No mappedAtCreation: true — contents are unspecified
});

// After compute shader runs, read back the buffer:
const readBuffer = device.createBuffer({
  size: buffer.size,
  usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ
});
encoder.copyBufferToBuffer(buffer, 0, readBuffer, 0, buffer.size);
queue.submit([encoder.finish()]);

await readBuffer.mapAsync(GPUMapMode.READ);
const data = new Uint32Array(readBuffer.getMappedRange());
// data may contain remnant values from prior GPU operations

Browser mitigations: Chrome and Firefox implement WebGPU buffer zeroing on allocation. However, this is a specification requirement on implementations — not a GPU hardware guarantee. On some mobile GPU drivers and embedded systems, zeroing may be incomplete. MCP server security audits should flag any pattern where compute shader results are read without writing first, especially when the shader code is derived from tool output.

WebGPU as a Spectre-class timing oracle

The GPU clock has nanosecond-resolution. WebGPU timestamp queries (GPUQueryType.timestamp) expose this clock directly to JavaScript when the timestamp-query feature is enabled on the adapter. Even without timestamp queries, the time delta between queue.submit() and the completion of GPUBuffer.mapAsync() is measurable via performance.now() and varies based on what the GPU was processing.

// WebGPU timestamp query — nanosecond resolution GPU clock
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice({
  requiredFeatures: ['timestamp-query']  // Must be explicitly requested
});

const querySet = device.createQuerySet({
  type: 'timestamp',
  count: 2  // before and after compute shader
});

const timestampBuffer = device.createBuffer({
  size: 16,  // 2 × 64-bit timestamps
  usage: GPUBufferUsage.QUERY_RESOLVE | GPUBufferUsage.COPY_SRC
});

const encoder = device.createCommandEncoder();
encoder.writeTimestamp(querySet, 0);  // GPU clock before shader
encoder.dispatch(computePipeline, bindGroup, 1, 1, 1);
encoder.writeTimestamp(querySet, 1);  // GPU clock after shader

// The difference is the shader execution time in nanoseconds.
// For a branch-on-secret-bit shader, the time difference reveals
// which branch was taken — and the secret bit value.

// DEFENSE: disable timestamp-query feature
const device = await adapter.requestDevice({
  requiredFeatures: []  // Do not request timestamp-query
});
// Without timestamp-query, GPU timing via performance.now() is still
// measurable but at reduced resolution (jitter applied by the browser)

Adapter fingerprinting via limits and features

GPUAdapter.limits exposes ~40 numerical GPU capability constants including maximum texture size, maximum compute workgroup size, maximum storage buffer binding size, and maximum bind groups. The specific combination of limit values is unique to each GPU model and driver version — creating a stable hardware fingerprint that persists across browser restarts, private browsing sessions, and VPN changes.

// WebGPU adapter limits fingerprint
const adapter = await navigator.gpu.requestAdapter();

const fingerprint = {
  // ~40 limits — unique to each GPU model + driver combination
  maxTextureDimension2D: adapter.limits.maxTextureDimension2D,
  maxBufferSize: adapter.limits.maxBufferSize,
  maxComputeWorkgroupSizeX: adapter.limits.maxComputeWorkgroupSizeX,
  maxComputeWorkgroupsPerDimension: adapter.limits.maxComputeWorkgroupsPerDimension,
  maxStorageBufferBindingSize: adapter.limits.maxStorageBufferBindingSize,
  // ... 35 more limit values

  // GPU architecture info (not always available but when present):
  vendor: adapter.info?.vendor,         // "nvidia", "amd", "intel", "apple"
  architecture: adapter.info?.architecture,  // "ampere", "rdna3", "xe"
  device: adapter.info?.device,         // specific model string

  // Supported features (boolean capability set):
  features: [...adapter.features]
};

// The above creates a stable device identifier that correlates across:
// - Browser restarts and cache clears
// - Private/incognito mode
// - VPN or Tor usage (IP address is irrelevant)
// - Time (GPU hardware doesn't change)

// DEFENSE: use Permissions-Policy to block WebGPU access for MCP tool content
// Permissions-Policy: webgpu=()  ← blocks navigator.gpu access entirely
// Or: render tool output in cross-origin sandboxed iframes
//     where navigator.gpu is not available (iframe sandbox attribute)

Shader compilation timing as a covert channel

WebGPU compiles WGSL shader code to GPU-native code during device.createShaderModule(). If an MCP tool supplies shader code (e.g., for a visualization feature), the compilation time varies based on shader complexity. More importantly, shader compilation is shared across the GPU device — two WebGPU contexts in the same process can observe each other's compilation load through measurement of their own shader compilation times, creating a cross-context covert channel.

// Shader compilation timing leaks information across contexts

// Context A (legitimate app tool renderer):
const t0 = performance.now();
const shaderModule = device.createShaderModule({ code: shaderSource });
await shaderModule.compilationInfo(); // Wait for compilation
const compilationTime = performance.now() - t0;
// compilationTime varies based on GPU load — including shaders from OTHER contexts

// Context B (attacker's MCP tool output in same origin):
// By varying its own shader complexity and measuring compilation time,
// attacker can detect when Context A compiles expensive shaders —
// inferring what rendering operations (and thus what data) Context A processes.

// DEFENSE: compile all shaders at startup before any tool output is rendered
const precompiledShaders = new Map();
for (const [name, code] of BUILT_IN_SHADERS) {
  precompiledShaders.set(name, device.createShaderModule({ code }));
}
// Warm up GPU compilation cache before accepting tool connections
await Promise.all([...precompiledShaders.values()].map(m => m.compilationInfo()));

// Never compile shaders from tool output — use only pre-compiled modules

WebGPU security — risk comparison

Attack vectorMechanismRequired capabilityDefense
GPU memory persistence Compute shader reads buffer before writing — may observe prior context data WebGPU compute shader access Always use mappedAtCreation: true to zero-initialize; never read before write
Timestamp query oracle Nanosecond GPU clock via timestamp-query feature enables Spectre-class timing timestamp-query feature enabled on device Never request timestamp-query feature; use CPU timing only (performance.now()) for non-sensitive measurements
Adapter fingerprinting 40+ limit values + features uniquely identify GPU model; persistent across sessions Read-only access to adapter.limits and adapter.features Permissions-Policy: webgpu=() for origins that render tool output; sandbox tool iframes
Shader compilation covert channel Compilation time of attacker shader reveals GPU load from other same-process contexts Ability to create shader modules from tool output Pre-compile all shaders at startup; never compile shaders from tool output

SkillAudit findings for WebGPU usage

CRITICAL WebGPU compute shader source from MCP tool output compiled and executed — attacker-controlled WGSL code runs on the GPU in the same device context as legitimate processing; GPU memory persistence and timestamp oracle attacks enabled. Score: −24.
HIGH timestamp-query feature requested on WebGPU device — nanosecond-resolution GPU clock exposed to JavaScript; enables Spectre-class timing oracle with GPU execution time side channels. Score: −20.
HIGH adapter.limits and adapter.features readable in tool output rendering context — 40+ limit values form a stable hardware fingerprint that persists across sessions and anonymization techniques. Score: −18.
MEDIUM GPU buffer allocated without mappedAtCreation: true — buffer content is unspecified on allocation; if read before write (especially in compute shaders from tool output), may observe remnant data from prior GPU contexts. Score: −12.
MEDIUM No Permissions-Policy: webgpu=() for tool output rendering context — tool-rendered code in the same origin has unrestricted access to WebGPU including adapter fingerprinting and GPU memory. Score: −10.

Audit your MCP server WebGPU security

SkillAudit detects WebGPU compute shader sourced from tool output, timestamp-query feature usage, adapter limit fingerprinting patterns, uninitialized buffer reads, and missing Permissions-Policy headers for GPU access. Free audit in 60 seconds.

Free audit →