Security Guide

MCP server Web Neural Network API security — NPU/GPU backend fingerprinting, ML inference timing side channels, hardware identification

The Web Neural Network API (navigator.ml) exposes browser access to the device's neural processing hardware — NPU, GPU, or CPU fallback. navigator.ml.createContext() returns an MLContext for whichever backend the device provides, and NPU backend availability is a high-precision hardware identifier: only Apple Silicon (M1+), Qualcomm Snapdragon X Elite, and Intel Core Ultra CPUs currently provide NPU backends to WebNN. ML inference timing side channels enable detection of LLM inference activity in adjacent browser contexts sharing the same hardware. No Permissions-Policy directive controls WebNN access.

WebNN hardware backend enumeration: NPU as device fingerprint

The WebNN API allows JavaScript to request specific hardware backends via the deviceType option: 'npu', 'gpu', or 'cpu'. The browser returns an MLContext if the requested backend is available, or throws NotSupportedError if it is not. Probing all three backends reveals whether the device has an NPU, which GPU architecture is available, and which CPU path the browser falls back to.

This enumeration is particularly valuable as a fingerprinting primitive because NPU availability is extremely hardware-specific. In 2026, the devices that provide a browser-accessible NPU backend are: Apple M1/M2/M3/M4 family (including MacBook, Mac Mini, iPad Pro, iPhone via WKWebView), Qualcomm Snapdragon X Elite and X Plus Copilot+ PCs, and Intel Core Ultra (Meteor Lake) CPUs. All other devices return NotSupportedError for deviceType: 'npu'. This is a three-way hardware split that narrows device identity to a specific platform family.

// WebNN hardware backend fingerprint — no permission required
async function webnnFingerprint() {
  const results = {};
  const backends = ['npu', 'gpu', 'cpu'];

  for (const deviceType of backends) {
    try {
      const context = await navigator.ml.createContext({ deviceType });
      results[deviceType] = {
        available: true,
        // contextType reveals additional backend detail in some implementations
        type: context.constructor.name
      };
    } catch (e) {
      results[deviceType] = {
        available: false,
        error: e.name  // 'NotSupportedError' | 'SecurityError'
      };
    }
  }

  // Interpretation:
  // npu: true → Apple Silicon, Snapdragon X Elite, or Intel Core Ultra
  // npu: false + gpu: true → older Intel/AMD desktop/laptop GPU
  // npu: false + gpu: false + cpu: true → server, VM, or old device
  // npu: true and platform navigator.platform === 'MacIntel' → Apple Silicon (Rosetta)

  navigator.sendBeacon('/track', JSON.stringify({
    webnn: results,
    platform: navigator.platform,
    timestamp: Date.now()
  }));
}

NPU availability is one of the most precise device-type fingerprints available without a permission prompt. Fewer than 15% of browser-capable devices in 2026 have an NPU accessible to WebNN. Within that 15%, the NPU generation (M1 vs M4 vs Snapdragon X vs Intel Core Ultra) can be discriminated further by ML inference throughput measurements. This fingerprint is stable, cannot be spoofed without hardware modification, and persists across private browsing sessions.

ML inference timing side channels

When injected JavaScript runs an ML graph on the WebNN GPU or NPU backend, the execution time depends on whether the same hardware accelerator is being used concurrently by another browser context. In an MCP client that runs on-device LLM inference (a common pattern for privacy-preserving local AI assistants in 2026), the MCP client itself continuously uses the NPU or GPU for token generation. Injected WebNN inference in a tool output context that shares the same hardware will experience contention, producing timing variations that can detect whether LLM inference is actively running.

This is a novel attack vector specific to AI-native MCP deployments: the very hardware infrastructure that makes local LLM inference possible (dedicated NPU) creates a timing oracle that tool output injections can exploit to infer the application's internal computational state.

AttackWebNN surfaceWhat it reveals
Hardware backend fingerprint createContext({deviceType}) probe NPU/GPU/CPU availability — device platform family (Apple, Qualcomm, Intel, other)
NPU generation timing ML graph execution throughput Discriminates M1 from M4, Snapdragon X from Intel Core Ultra by TOPS measurement
LLM inference detection NPU contention timing side channel Detects whether the MCP client is actively running LLM inference on the same hardware
Cross-context ML workload inference GPU backend contention measurement Detects other tabs running image generation, embedding computation, or video ML

Permissions-Policy gap and defenses

As of mid-2026, the WebNN specification does not define a Permissions-Policy feature name. The API is available to any same-origin JavaScript that calls navigator.ml.createContext(). The architectural defense for MCP deployments is cross-origin sandboxed iframe rendering: tool output JavaScript running in a cross-origin context cannot access the application's data to exfiltrate, even if WebNN hardware enumeration remains possible within the sandboxed context. For MCP clients that run on-device LLM inference via WebNN, additional monitoring of ML context creation in the same origin is advisable.

SkillAudit findings for WebNN API exposure

High Tool output rendered same-origin; injected JS can probe NPU/GPU/CPU backend availability to identify device platform family. NPU availability uniquely identifies Apple Silicon, Snapdragon X, and Intel Core Ultra hardware — a stable, un-spoofable hardware fingerprint. Grade impact: −16.
High MCP client uses WebNN for local LLM inference on same hardware as tool output context; inference timing reveals token generation state. NPU contention from active LLM inference is detectable via timing in the same-hardware context, exposing internal application computational state to injected code. Grade impact: −14.
Medium No cross-origin iframe isolation; WebNN backend enumeration runs in application origin and can combine hardware signal with application-origin data. Same-origin injections can combine the hardware fingerprint with session-specific data (cookies, localStorage) for highly precise cross-session tracking. Grade impact: −12.
Medium connect-src CSP absent; WebNN timing measurement results can be exfiltrated to arbitrary external endpoints. Without connect-src 'self', hardware fingerprint data and timing measurements can be sent to attacker-controlled endpoints via sendBeacon or fetch. Grade impact: −10.
Low No Permissions-Policy directive available for WebNN; API cannot be blocked via HTTP headers regardless of MCP client deployment configuration. Unlike camera or microphone access, WebNN hardware enumeration has no policy gate. Only architectural isolation controls access. Grade impact: −6.

Audit your MCP server for WebNN hardware fingerprinting risks

SkillAudit checks for tool output isolation, ML API exposure, and hardware fingerprinting attack surfaces — paste a GitHub URL and get a graded security report in 60 seconds.

Run a free audit →