Security Guide

MCP server Eye Tracking Gaze API security — gaze-based password extraction, attention pattern profiling, sensitive content surveillance, ambient light covert channel

The Eye Tracking / Gaze API is an experimental browser capability (currently under development via the W3C Eye Gaze Specification and shipping in early form on some AR/VR platforms) that exposes continuous gaze coordinates — where on the screen or within a 3D scene the user is looking — to web contexts that have been granted the eye-tracking permission. For MCP clients running on devices with eye-tracking hardware (modern headsets, some high-end laptops, accessibility devices), an MCP tool that requests or inherits this permission can construct a complete record of everything the user looked at, correlate gaze sequences with keyboard input to infer passwords character-by-character, identify which parts of sensitive documents were read, and detect real-time cognitive states including confusion, surprise, and distraction — with no visible camera indicator and no user awareness that gaze data is being captured.

What the Eye Tracking Gaze API exposes

The W3C Eye Gaze Specification (in development) and platform-specific extensions (WebXR gaze input, Tobii Stream Engine via browser bridge, Windows Eye Control SDK exposed to PWAs on supported hardware) provide access to gaze origin and direction vectors, fixation events (where the eye pauses for comprehension), saccade events (rapid eye movements between fixation points), blink rate and pupil diameter (available on some hardware), and screen-space gaze coordinates at sampling rates ranging from 30 Hz to 1200 Hz depending on hardware.

The security model varies by platform: WebXR gaze input inherits the XR session permission granted when the user enters an immersive session; dedicated eye-tracking APIs require explicit user permission; some browser bridges to platform-level eye tracking SDKs operate outside the browser permission model entirely and can be accessed by locally-installed MCP servers without any browser-level gate. MCP tools accessing system-level APIs via Node.js native modules may have no permission requirement at all.

MCP servers running as local Node.js processes can access platform eye-tracking SDKs directly via native modules — bypassing the browser permission model entirely. An MCP tool that calls into the Tobii Stream Engine, the Windows Eye Control APIs, or the macOS Accessibility eye-tracking layer can capture continuous gaze data without requesting any browser permission, with no indicator visible to the user.

Gaze-based password extraction from on-screen keyboards

On devices where the primary input method is gaze-based (headsets, accessibility devices, some AR systems), users interact with on-screen keyboards by looking at keys. The timing and sequence of fixation events on keyboard elements directly encodes what is being typed. An MCP tool capturing gaze data during a password entry interaction can reconstruct the typed sequence with high accuracy by correlating fixation coordinates with keyboard key positions, even when the application renders the password field with the standard masking type="password".

// ATTACK: Gaze coordinate to keyboard key mapping for password extraction

// Assumes a standard QWERTY on-screen keyboard with known layout
const KEYBOARD_LAYOUT = {
  // Row 1 — y: 200-260px
  'q': { x: [60, 120], y: [200, 260] },
  'w': { x: [130, 190], y: [200, 260] },
  'e': { x: [200, 260], y: [200, 260] },
  // ... all keys mapped
};

function gazeToKey(gazeX, gazeY) {
  for (const [key, bounds] of Object.entries(KEYBOARD_LAYOUT)) {
    if (gazeX >= bounds.x[0] && gazeX <= bounds.x[1] &&
        gazeY >= bounds.y[0] && gazeY <= bounds.y[1]) {
      return key;
    }
  }
  return null;
}

// Gaze stream listener (via hypothetical browser API or native module bridge)
let extractedChars = [];
let lastFixationEnd = 0;

eyeTracker.addEventListener('fixation', event => {
  const { gazeX, gazeY, duration, timestamp } = event;

  // Fixation threshold: >150ms is typical for deliberate character selection
  if (duration > 150) {
    const key = gazeToKey(gazeX, gazeY);
    if (key && (timestamp - lastFixationEnd < 3000)) {
      // Consecutive fixations within 3 seconds — likely same typing sequence
      extractedChars.push({ key, timestamp, duration });
      lastFixationEnd = timestamp + duration;
    }
  }
});

// After password entry: exfiltrate extracted character sequence
// Reconstructed password = extractedChars.map(c => c.key).join('')

The defense requires that applications using gaze input for password entry implement gaze-input protection: randomizing the key layout on each entry session (so knowledge of coordinates does not decode characters), using a non-spatial entry method for sensitive fields (e.g., a PIN pad with randomized digit positions), or explicitly pausing gaze data streaming during password field focus via a permission API that apps can invoke to suspend third-party gaze access.

Attention pattern profiling and sensitive content detection

The fixation sequence across a document reveals exactly what content the user read and in what order — a far more informative signal than scroll position or viewport visibility. An MCP tool capturing gaze data can identify which paragraphs in a legal document, medical report, salary disclosure, or security briefing the user actually read versus skimmed, which figures they scrutinized versus glanced at, and whether they re-read specific sections (indicating confusion or concern).

This creates a real-time reading surveillance channel. Combined with the document's text layout (which the MCP tool can access via DOM coordinate lookup), fixation coordinates map directly to the text the user is reading. This is a GDPR Article 9 violation when the documents contain health, political, religious, or biometric data: the gaze data itself becomes a special category sensitive data record revealing the user's medical interest areas, political positions read, or religious content consumed.

// Gaze-to-text correlation — maps fixation points to DOM text nodes

function getTextAtGazePoint(gazeX, gazeY) {
  // document.caretPositionFromPoint() returns the text position at screen coordinates
  const caretPos = document.caretPositionFromPoint(gazeX, gazeY);
  if (!caretPos) return null;

  const node = caretPos.offsetNode;
  if (node.nodeType !== Node.TEXT_NODE) return null;

  // Extract a window of surrounding text for context
  const parent = node.parentElement;
  return {
    element: parent.tagName,
    text: node.textContent.substring(
      Math.max(0, caretPos.offset - 50),
      Math.min(node.textContent.length, caretPos.offset + 50)
    ),
    elementClass: parent.className,
    timestamp: Date.now()
  };
}

// Continuous gaze stream → reading history reconstruction:
eyeTracker.addEventListener('fixation', event => {
  const textContext = getTextAtGazePoint(event.gazeX, event.gazeY);
  if (textContext) {
    readingLog.push(textContext);
    // After the session: readingLog is a complete record of every text fragment
    // the user fixated on, in chronological order — their complete reading history
  }
});

Pupil diameter and blink rate as emotional state oracles

High-fidelity eye-tracking hardware measures pupil diameter and blink rate alongside gaze coordinates. These physiological signals encode emotional and cognitive states with research-validated reliability: pupil dilation indicates surprise, cognitive load, and heightened arousal; pupil constriction correlates with reduced engagement; elevated blink rate indicates fatigue or discomfort; suppressed blink rate indicates intense concentration. An MCP tool with access to the full eye-tracking data stream can infer the user's emotional response to content in real time.

In the context of an AI assistant interacting with the user, this creates a covert feedback channel: the MCP tool can observe whether the user's physiological response to model output indicates surprise (did the answer shock them?), confusion (are they re-reading?), or acceptance (did they proceed without re-reading?). This enables a class of manipulation attacks where the model's behavior is conditioned on the user's inferred emotional state without the user's awareness that their physiological signals are being used.

Pupil diameter measurement requires explicit disclosure under GDPR Article 9 (biometric data) and CCPA sensitive personal information provisions in most jurisdictions. Any MCP tool that accesses eye-tracking data on hardware capable of pupil measurement must treat the full data stream as biometric data regardless of whether pupil values are explicitly extracted.

SkillAudit detection patterns for eye-tracking API access

SkillAudit scans MCP server code for patterns that indicate eye-tracking data access:

// Patterns SkillAudit flags in MCP server code:

// 1. Native module loading patterns that may bridge to platform eye-tracking SDKs
const tobii = require('@tobii/stream-engine');        // FLAGGED: direct SDK access
const eyeControl = require('windows-eye-control');    // FLAGGED: OS-level access
const gazePoint = require('gazepoint-native');        // FLAGGED: hardware bridge

// 2. WebXR gaze input session features
navigator.xr.requestSession('immersive-vr', {
  requiredFeatures: ['eye-tracking']                  // FLAGGED: explicit gaze request
});

// 3. Experimental browser gaze APIs
navigator.gaze?.requestPermission();                  // FLAGGED: experimental browser API
window.GazeEvent;                                     // FLAGGED: gaze event class access

// 4. DOM coordinate access during input events (weaker signal, combined with gaze APIs)
document.caretPositionFromPoint(x, y);               // FLAGGED in combination with eye-tracking
document.elementFromPoint(x, y);                     // FLAGGED in combination with gaze events

// CORRECT: if your MCP server has a legitimate gaze use case (accessibility tooling,
// gaze-based navigation), document the specific API access, data minimization approach,
// and retention policy explicitly in your MCP server manifest and tool descriptions
Risk Data accessed Defense
Password extraction via gaze sequence Fixation coordinates during on-screen keyboard input Randomized key layout; non-spatial PIN entry; pause gaze API during password fields
Sensitive content reading surveillance Fixation coordinates mapped to DOM text content Suspend gaze API access during document viewing; explicit per-document consent
Emotional state inference Pupil diameter and blink rate time series Treat full gaze stream as biometric data; minimize to gaze coordinates only if possible
OS-level SDK access bypassing browser permission Native module bridging to Tobii/Windows Eye Control Audit native module dependencies; deny eye-tracking SDK imports in MCP tool sandboxes
Covert model conditioning on physiological state Real-time emotional/arousal signal from pupil and blink data Prohibit gaze data from flowing into model context; explicit disclosure if used for adaptation

SkillAudit findings for Eye Tracking Gaze API access

Critical MCP tool imports native eye-tracking SDK module. The server-side MCP tool code imports a native module binding to a platform eye-tracking SDK (Tobii, Windows Eye Control, GazePoint, or equivalent). This bypasses the browser permission model entirely. The tool can capture continuous gaze data without any user-visible permission prompt or indicator. Grade impact: −30.
Critical Gaze coordinates accessed during password field focus. The MCP tool or client code captures gaze events while a type="password" input or equivalent is focused. Fixation sequences over on-screen keyboard elements encode the password being typed. This is a direct credential extraction vector. Grade impact: −28.
Critical Fixation events correlated with DOM text content via caretPositionFromPoint. The MCP tool maps gaze fixation coordinates to DOM text content, constructing a reading history of all text the user looked at. When applied to documents containing health, financial, or personal data, this is a GDPR Article 9 violation. Grade impact: −26.
High Pupil diameter or blink rate data collected without biometric data disclosure. The full eye-tracking data stream including pupil diameter is captured and processed. Pupil data is biometric data under GDPR Article 9 and CCPA. No explicit consent or disclosure is present in the tool's manifest or privacy documentation. Grade impact: −20.
High Gaze data included in MCP tool response or model context. Captured gaze coordinates, fixation sequences, or derived inferences are included in the MCP tool's response to the model, allowing the LLM to condition its behavior on the user's gaze patterns without the user's awareness. Grade impact: −18.
Medium WebXR session requests eye-tracking feature without functional necessity. The MCP client requests the eye-tracking optional feature in a WebXR session even though no MCP tool functionality requires gaze data. The permission is acquired opportunistically. Grade impact: −12.

Audit your MCP server for these issues

SkillAudit checks for eye-tracking API access patterns automatically — paste a GitHub URL and get a graded report in 60 seconds.

Run a free audit →