Blog · MCP Server Security
MCP server WebGPU security — GPU memory persistence, Spectre-class timing oracles, adapter fingerprinting
WebGPU provides compute shaders that run on the GPU — the same GPU that processes cryptographic operations, renders sensitive UI, and runs ML inference in other browser tabs. In MCP server UIs, WebGPU compute shaders loaded from tool output can read stale GPU memory from previous contexts, measure GPU execution timing with nanosecond resolution (Spectre-class oracle), and enumerate adapter limits that uniquely identify the device across sessions.
GPU memory persistence between contexts
The WebGPU specification requires GPU buffer contents to be zeroed before being mapped to a new context — but this zeroing is applied at the WebGPU abstraction layer, not necessarily at the GPU hardware level. In practice, GPU driver implementations of buffer allocation may reclaim and reuse GPU memory without zeroing at the hardware level. A compute shader that allocates a large buffer and reads it before writing may observe non-zero values from a previous GPU context — either from the same origin (another tool session) or from another tab on some browser/OS combinations.
// WGSL compute shader that reads allocated buffer before initialization:
// (demonstrates the memory persistence observation technique)
//
// @group(0) @binding(0) var<storage, read_write> output: array<u32>;
//
// @compute @workgroup_size(64)
// fn main(@builtin(global_invocation_id) id: vec3u) {
// // Read the buffer content before writing anything
// let existing = output[id.x]; // may be non-zero on some implementations
// output[id.x] = existing; // report what was already there
// }
// JavaScript that reads the result:
const buffer = device.createBuffer({
size: 4 * 1024 * 1024, // 4 MiB of GPU memory
usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC
// No mappedAtCreation: true — contents are unspecified
});
// After compute shader runs, read back the buffer:
const readBuffer = device.createBuffer({
size: buffer.size,
usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ
});
encoder.copyBufferToBuffer(buffer, 0, readBuffer, 0, buffer.size);
queue.submit([encoder.finish()]);
await readBuffer.mapAsync(GPUMapMode.READ);
const data = new Uint32Array(readBuffer.getMappedRange());
// data may contain remnant values from prior GPU operations
Browser mitigations: Chrome and Firefox implement WebGPU buffer zeroing on allocation. However, this is a specification requirement on implementations — not a GPU hardware guarantee. On some mobile GPU drivers and embedded systems, zeroing may be incomplete. MCP server security audits should flag any pattern where compute shader results are read without writing first, especially when the shader code is derived from tool output.
WebGPU as a Spectre-class timing oracle
The GPU clock has nanosecond-resolution. WebGPU timestamp queries (GPUQueryType.timestamp) expose this clock directly to JavaScript when the timestamp-query feature is enabled on the adapter. Even without timestamp queries, the time delta between queue.submit() and the completion of GPUBuffer.mapAsync() is measurable via performance.now() and varies based on what the GPU was processing.
// WebGPU timestamp query — nanosecond resolution GPU clock
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice({
requiredFeatures: ['timestamp-query'] // Must be explicitly requested
});
const querySet = device.createQuerySet({
type: 'timestamp',
count: 2 // before and after compute shader
});
const timestampBuffer = device.createBuffer({
size: 16, // 2 × 64-bit timestamps
usage: GPUBufferUsage.QUERY_RESOLVE | GPUBufferUsage.COPY_SRC
});
const encoder = device.createCommandEncoder();
encoder.writeTimestamp(querySet, 0); // GPU clock before shader
encoder.dispatch(computePipeline, bindGroup, 1, 1, 1);
encoder.writeTimestamp(querySet, 1); // GPU clock after shader
// The difference is the shader execution time in nanoseconds.
// For a branch-on-secret-bit shader, the time difference reveals
// which branch was taken — and the secret bit value.
// DEFENSE: disable timestamp-query feature
const device = await adapter.requestDevice({
requiredFeatures: [] // Do not request timestamp-query
});
// Without timestamp-query, GPU timing via performance.now() is still
// measurable but at reduced resolution (jitter applied by the browser)
Adapter fingerprinting via limits and features
GPUAdapter.limits exposes ~40 numerical GPU capability constants including maximum texture size, maximum compute workgroup size, maximum storage buffer binding size, and maximum bind groups. The specific combination of limit values is unique to each GPU model and driver version — creating a stable hardware fingerprint that persists across browser restarts, private browsing sessions, and VPN changes.
// WebGPU adapter limits fingerprint
const adapter = await navigator.gpu.requestAdapter();
const fingerprint = {
// ~40 limits — unique to each GPU model + driver combination
maxTextureDimension2D: adapter.limits.maxTextureDimension2D,
maxBufferSize: adapter.limits.maxBufferSize,
maxComputeWorkgroupSizeX: adapter.limits.maxComputeWorkgroupSizeX,
maxComputeWorkgroupsPerDimension: adapter.limits.maxComputeWorkgroupsPerDimension,
maxStorageBufferBindingSize: adapter.limits.maxStorageBufferBindingSize,
// ... 35 more limit values
// GPU architecture info (not always available but when present):
vendor: adapter.info?.vendor, // "nvidia", "amd", "intel", "apple"
architecture: adapter.info?.architecture, // "ampere", "rdna3", "xe"
device: adapter.info?.device, // specific model string
// Supported features (boolean capability set):
features: [...adapter.features]
};
// The above creates a stable device identifier that correlates across:
// - Browser restarts and cache clears
// - Private/incognito mode
// - VPN or Tor usage (IP address is irrelevant)
// - Time (GPU hardware doesn't change)
// DEFENSE: use Permissions-Policy to block WebGPU access for MCP tool content
// Permissions-Policy: webgpu=() ← blocks navigator.gpu access entirely
// Or: render tool output in cross-origin sandboxed iframes
// where navigator.gpu is not available (iframe sandbox attribute)
Shader compilation timing as a covert channel
WebGPU compiles WGSL shader code to GPU-native code during device.createShaderModule(). If an MCP tool supplies shader code (e.g., for a visualization feature), the compilation time varies based on shader complexity. More importantly, shader compilation is shared across the GPU device — two WebGPU contexts in the same process can observe each other's compilation load through measurement of their own shader compilation times, creating a cross-context covert channel.
// Shader compilation timing leaks information across contexts
// Context A (legitimate app tool renderer):
const t0 = performance.now();
const shaderModule = device.createShaderModule({ code: shaderSource });
await shaderModule.compilationInfo(); // Wait for compilation
const compilationTime = performance.now() - t0;
// compilationTime varies based on GPU load — including shaders from OTHER contexts
// Context B (attacker's MCP tool output in same origin):
// By varying its own shader complexity and measuring compilation time,
// attacker can detect when Context A compiles expensive shaders —
// inferring what rendering operations (and thus what data) Context A processes.
// DEFENSE: compile all shaders at startup before any tool output is rendered
const precompiledShaders = new Map();
for (const [name, code] of BUILT_IN_SHADERS) {
precompiledShaders.set(name, device.createShaderModule({ code }));
}
// Warm up GPU compilation cache before accepting tool connections
await Promise.all([...precompiledShaders.values()].map(m => m.compilationInfo()));
// Never compile shaders from tool output — use only pre-compiled modules
WebGPU security — risk comparison
| Attack vector | Mechanism | Required capability | Defense |
|---|---|---|---|
| GPU memory persistence | Compute shader reads buffer before writing — may observe prior context data | WebGPU compute shader access | Always use mappedAtCreation: true to zero-initialize; never read before write |
| Timestamp query oracle | Nanosecond GPU clock via timestamp-query feature enables Spectre-class timing |
timestamp-query feature enabled on device |
Never request timestamp-query feature; use CPU timing only (performance.now()) for non-sensitive measurements |
| Adapter fingerprinting | 40+ limit values + features uniquely identify GPU model; persistent across sessions | Read-only access to adapter.limits and adapter.features |
Permissions-Policy: webgpu=() for origins that render tool output; sandbox tool iframes |
| Shader compilation covert channel | Compilation time of attacker shader reveals GPU load from other same-process contexts | Ability to create shader modules from tool output | Pre-compile all shaders at startup; never compile shaders from tool output |
SkillAudit findings for WebGPU usage
Audit your MCP server WebGPU security
SkillAudit detects WebGPU compute shader sourced from tool output, timestamp-query feature usage, adapter limit fingerprinting patterns, uninitialized buffer reads, and missing Permissions-Policy headers for GPU access. Free audit in 60 seconds.
Free audit →