Security Guide
MCP server WebGPU API security — GPU timing side channels, adapter fingerprinting, compute shader processing for data exfiltration
WebGPU gives browser JavaScript direct access to GPU compute shaders, GPU memory buffers, and GPU timing queries. For MCP server tool output that can inject JavaScript into a same-origin context, WebGPU expands the attack surface in three directions: GPU adapter information (GPUAdapterInfo) reveals the exact GPU model and driver version without any user permission, enabling highly precise device fingerprinting; GPU timer queries (timestamp-query feature) provide nanosecond-resolution timing that enables new classes of side-channel attacks; and compute shaders can process megabytes of stolen data in milliseconds, making GPU-accelerated exfiltration payload preparation orders of magnitude faster than CPU-based processing.
What WebGPU exposes and where MCP servers encounter it
WebGPU is the successor to WebGL, providing a lower-level GPU API suitable for 3D rendering, scientific computation, and machine learning inference in the browser. Browser-based MCP clients that support in-browser model inference — local LLM execution, embedding generation, or image analysis — use WebGPU to offload computation to the GPU. The threat surface arises when MCP tool output can inject JavaScript that calls navigator.gpu.requestAdapter().
The WebGPU API requires no user permission. navigator.gpu.requestAdapter() returns a GPUAdapter object, and adapter.requestAdapterInfo() (which requires no permission in Chrome 121+) returns GPUAdapterInfo containing the GPU vendor, architecture, device name, and driver description in plain text. This information is available to any same-origin JavaScript with no user-visible prompt.
// GPU adapter fingerprinting — no permission required
const adapter = await navigator.gpu?.requestAdapter();
if (adapter) {
const info = await adapter.requestAdapterInfo();
const fingerprint = {
vendor: info.vendor, // "NVIDIA Corporation" | "AMD" | "Intel Inc." | "Apple"
architecture: info.architecture, // "ampere" | "rdna-3" | "xe" | "apple-gpu"
device: info.device, // "NVIDIA GeForce RTX 4090" — exact model
description: info.description, // driver version string
// Additional precision: supported features narrow device model further
features: [...adapter.features].sort(),
// GPU limits identify VRAM size and shader model capabilities
maxBufferSize: adapter.limits.maxBufferSize,
maxComputeWorkgroupStorageSize: adapter.limits.maxComputeWorkgroupStorageSize
};
navigator.sendBeacon('/track', JSON.stringify(fingerprint));
// NVIDIA GeForce RTX 4090 is identifiable to within ~1% of Chrome users
// Combined with User-Agent, timezone, and screen resolution: near-unique device ID
}
GPU adapter info is more precise than canvas fingerprinting. Canvas fingerprinting — the standard browser fingerprinting technique — exploits subtle differences in GPU rendering to produce a semi-unique hash. WebGPU GPUAdapterInfo returns the exact GPU model and driver version as a plain string. No rendering is required. The device fingerprint is stable across private browsing sessions, storage clears, and VPN usage.
GPU timer queries: nanosecond timing side channels
WebGPU's timestamp-query feature (exposed when adapter.features.has('timestamp-query') is true) provides GPU-side timestamps at nanosecond resolution. While the browser quantizes performance.now() to prevent timing attacks, GPU timestamp queries measure time in the GPU driver, and their resolution is higher than what the browser exposes at the JavaScript level.
GPU timing side channels work by measuring how long a specific compute operation takes — which depends on whether the GPU's caches, shader cores, or memory bandwidth are contended by another context. In a same-GPU-device scenario (where two browser contexts share the same physical GPU), contention in one context produces measurable timing variation in the other. This enables a limited form of cross-context inference that bypasses the browser's JavaScript timer mitigations.
// GPU timer query side channel — measures compute shader execution time
// to detect GPU contention from other contexts (requires 'timestamp-query' feature)
async function gpuTimingProbe(device) {
const querySet = device.createQuerySet({ type: 'timestamp', count: 2 });
const resolveBuffer = device.createBuffer({
size: 16,
usage: GPUBufferUsage.QUERY_RESOLVE | GPUBufferUsage.COPY_SRC
});
const resultBuffer = device.createBuffer({
size: 16,
usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ
});
// Run a fixed-cost compute shader and measure GPU execution time
const encoder = device.createCommandEncoder();
encoder.writeTimestamp(querySet, 0);
// Trivial compute pass — execution time varies based on GPU contention
const pass = encoder.beginComputePass();
pass.setPipeline(probeComputePipeline);
pass.dispatchWorkgroups(64, 64);
pass.end();
encoder.writeTimestamp(querySet, 1);
encoder.resolveQuerySet(querySet, 0, 2, resolveBuffer, 0);
encoder.copyBufferToBuffer(resolveBuffer, 0, resultBuffer, 0, 16);
device.queue.submit([encoder.finish()]);
await resultBuffer.mapAsync(GPUMapMode.READ);
const timestamps = new BigInt64Array(resultBuffer.getMappedRange());
const elapsedNs = Number(timestamps[1] - timestamps[0]);
resultBuffer.unmap();
return elapsedNs; // Returns execution time in nanoseconds
// Elevated times indicate GPU contention from another context
}
Compute shader data processing for high-throughput exfiltration
GPU compute shaders can process data at orders of magnitude higher throughput than CPU-based JavaScript. For exfiltration scenarios where large amounts of data need to be processed before transmission — compression, encryption, encoding — a compute shader can prepare megabytes of data in milliseconds, reducing the time window during which exfiltration processing is visible in CPU profiling.
Practically: an attacker who wants to exfiltrate a large IndexedDB database encrypted to avoid detection-by-content-inspection can use a WebGPU compute shader to encrypt the data on the GPU in 50–200ms, then transmit the ciphertext via a standard fetch or WebTransport connection. The CPU-side JavaScript code is trivially small; the GPU compute shader does the heavy lifting with no CPU footprint.
| Attack | API surface | What it enables |
|---|---|---|
| Device fingerprinting | GPUAdapterInfo — vendor, architecture, device, description |
Exact GPU model + driver version; stable cross-session identifier |
| GPU timing side channel | timestamp-query feature + GPU command encoder |
Contention measurement — infer computation in other GPU contexts |
| High-throughput data processing | GPU compute shaders | Compress/encrypt large exfiltration payloads in milliseconds |
| Feature detection fingerprinting | adapter.features and adapter.limits |
GPU generation, VRAM size, shader model — narrow device to model variant |
Permissions-Policy and defenses
As of mid-2026, WebGPU does not have a standardized Permissions-Policy feature name in the Permissions Policy specification. The primary architectural defense is cross-origin iframe isolation for MCP tool output rendering: if injected JavaScript runs in a cross-origin sandboxed iframe at a distinct registrable domain, it has access to that iframe's GPU context (if any) but cannot read the application's storage or access the parent frame's data.
For MCP client deployments that use WebGPU for legitimate in-browser model inference, the risk cannot be eliminated by policy controls alone. The most effective defense is rigorous tool output sanitization (DOMPurify with strict configuration) combined with cross-origin iframe rendering that limits what any injected code can access even if it successfully calls WebGPU APIs.
SkillAudit findings for WebGPU API misuse
Audit your MCP server for WebGPU API exposure
SkillAudit checks for tool output isolation, CSP script-src coverage, and GPU fingerprinting risks automatically — paste a GitHub URL and get a graded report in 60 seconds.
Run a free audit →