Security Guide
MCP server Eye Tracking Gaze API security — gaze-based password extraction, attention pattern profiling, sensitive content surveillance, ambient light covert channel
The Eye Tracking / Gaze API is an experimental browser capability (currently under development via the W3C Eye Gaze Specification and shipping in early form on some AR/VR platforms) that exposes continuous gaze coordinates — where on the screen or within a 3D scene the user is looking — to web contexts that have been granted the eye-tracking permission. For MCP clients running on devices with eye-tracking hardware (modern headsets, some high-end laptops, accessibility devices), an MCP tool that requests or inherits this permission can construct a complete record of everything the user looked at, correlate gaze sequences with keyboard input to infer passwords character-by-character, identify which parts of sensitive documents were read, and detect real-time cognitive states including confusion, surprise, and distraction — with no visible camera indicator and no user awareness that gaze data is being captured.
What the Eye Tracking Gaze API exposes
The W3C Eye Gaze Specification (in development) and platform-specific extensions (WebXR gaze input, Tobii Stream Engine via browser bridge, Windows Eye Control SDK exposed to PWAs on supported hardware) provide access to gaze origin and direction vectors, fixation events (where the eye pauses for comprehension), saccade events (rapid eye movements between fixation points), blink rate and pupil diameter (available on some hardware), and screen-space gaze coordinates at sampling rates ranging from 30 Hz to 1200 Hz depending on hardware.
The security model varies by platform: WebXR gaze input inherits the XR session permission granted when the user enters an immersive session; dedicated eye-tracking APIs require explicit user permission; some browser bridges to platform-level eye tracking SDKs operate outside the browser permission model entirely and can be accessed by locally-installed MCP servers without any browser-level gate. MCP tools accessing system-level APIs via Node.js native modules may have no permission requirement at all.
MCP servers running as local Node.js processes can access platform eye-tracking SDKs directly via native modules — bypassing the browser permission model entirely. An MCP tool that calls into the Tobii Stream Engine, the Windows Eye Control APIs, or the macOS Accessibility eye-tracking layer can capture continuous gaze data without requesting any browser permission, with no indicator visible to the user.
Gaze-based password extraction from on-screen keyboards
On devices where the primary input method is gaze-based (headsets, accessibility devices, some AR systems), users interact with on-screen keyboards by looking at keys. The timing and sequence of fixation events on keyboard elements directly encodes what is being typed. An MCP tool capturing gaze data during a password entry interaction can reconstruct the typed sequence with high accuracy by correlating fixation coordinates with keyboard key positions, even when the application renders the password field with the standard masking type="password".
// ATTACK: Gaze coordinate to keyboard key mapping for password extraction
// Assumes a standard QWERTY on-screen keyboard with known layout
const KEYBOARD_LAYOUT = {
// Row 1 — y: 200-260px
'q': { x: [60, 120], y: [200, 260] },
'w': { x: [130, 190], y: [200, 260] },
'e': { x: [200, 260], y: [200, 260] },
// ... all keys mapped
};
function gazeToKey(gazeX, gazeY) {
for (const [key, bounds] of Object.entries(KEYBOARD_LAYOUT)) {
if (gazeX >= bounds.x[0] && gazeX <= bounds.x[1] &&
gazeY >= bounds.y[0] && gazeY <= bounds.y[1]) {
return key;
}
}
return null;
}
// Gaze stream listener (via hypothetical browser API or native module bridge)
let extractedChars = [];
let lastFixationEnd = 0;
eyeTracker.addEventListener('fixation', event => {
const { gazeX, gazeY, duration, timestamp } = event;
// Fixation threshold: >150ms is typical for deliberate character selection
if (duration > 150) {
const key = gazeToKey(gazeX, gazeY);
if (key && (timestamp - lastFixationEnd < 3000)) {
// Consecutive fixations within 3 seconds — likely same typing sequence
extractedChars.push({ key, timestamp, duration });
lastFixationEnd = timestamp + duration;
}
}
});
// After password entry: exfiltrate extracted character sequence
// Reconstructed password = extractedChars.map(c => c.key).join('')
The defense requires that applications using gaze input for password entry implement gaze-input protection: randomizing the key layout on each entry session (so knowledge of coordinates does not decode characters), using a non-spatial entry method for sensitive fields (e.g., a PIN pad with randomized digit positions), or explicitly pausing gaze data streaming during password field focus via a permission API that apps can invoke to suspend third-party gaze access.
Attention pattern profiling and sensitive content detection
The fixation sequence across a document reveals exactly what content the user read and in what order — a far more informative signal than scroll position or viewport visibility. An MCP tool capturing gaze data can identify which paragraphs in a legal document, medical report, salary disclosure, or security briefing the user actually read versus skimmed, which figures they scrutinized versus glanced at, and whether they re-read specific sections (indicating confusion or concern).
This creates a real-time reading surveillance channel. Combined with the document's text layout (which the MCP tool can access via DOM coordinate lookup), fixation coordinates map directly to the text the user is reading. This is a GDPR Article 9 violation when the documents contain health, political, religious, or biometric data: the gaze data itself becomes a special category sensitive data record revealing the user's medical interest areas, political positions read, or religious content consumed.
// Gaze-to-text correlation — maps fixation points to DOM text nodes
function getTextAtGazePoint(gazeX, gazeY) {
// document.caretPositionFromPoint() returns the text position at screen coordinates
const caretPos = document.caretPositionFromPoint(gazeX, gazeY);
if (!caretPos) return null;
const node = caretPos.offsetNode;
if (node.nodeType !== Node.TEXT_NODE) return null;
// Extract a window of surrounding text for context
const parent = node.parentElement;
return {
element: parent.tagName,
text: node.textContent.substring(
Math.max(0, caretPos.offset - 50),
Math.min(node.textContent.length, caretPos.offset + 50)
),
elementClass: parent.className,
timestamp: Date.now()
};
}
// Continuous gaze stream → reading history reconstruction:
eyeTracker.addEventListener('fixation', event => {
const textContext = getTextAtGazePoint(event.gazeX, event.gazeY);
if (textContext) {
readingLog.push(textContext);
// After the session: readingLog is a complete record of every text fragment
// the user fixated on, in chronological order — their complete reading history
}
});
Pupil diameter and blink rate as emotional state oracles
High-fidelity eye-tracking hardware measures pupil diameter and blink rate alongside gaze coordinates. These physiological signals encode emotional and cognitive states with research-validated reliability: pupil dilation indicates surprise, cognitive load, and heightened arousal; pupil constriction correlates with reduced engagement; elevated blink rate indicates fatigue or discomfort; suppressed blink rate indicates intense concentration. An MCP tool with access to the full eye-tracking data stream can infer the user's emotional response to content in real time.
In the context of an AI assistant interacting with the user, this creates a covert feedback channel: the MCP tool can observe whether the user's physiological response to model output indicates surprise (did the answer shock them?), confusion (are they re-reading?), or acceptance (did they proceed without re-reading?). This enables a class of manipulation attacks where the model's behavior is conditioned on the user's inferred emotional state without the user's awareness that their physiological signals are being used.
Pupil diameter measurement requires explicit disclosure under GDPR Article 9 (biometric data) and CCPA sensitive personal information provisions in most jurisdictions. Any MCP tool that accesses eye-tracking data on hardware capable of pupil measurement must treat the full data stream as biometric data regardless of whether pupil values are explicitly extracted.
SkillAudit detection patterns for eye-tracking API access
SkillAudit scans MCP server code for patterns that indicate eye-tracking data access:
// Patterns SkillAudit flags in MCP server code:
// 1. Native module loading patterns that may bridge to platform eye-tracking SDKs
const tobii = require('@tobii/stream-engine'); // FLAGGED: direct SDK access
const eyeControl = require('windows-eye-control'); // FLAGGED: OS-level access
const gazePoint = require('gazepoint-native'); // FLAGGED: hardware bridge
// 2. WebXR gaze input session features
navigator.xr.requestSession('immersive-vr', {
requiredFeatures: ['eye-tracking'] // FLAGGED: explicit gaze request
});
// 3. Experimental browser gaze APIs
navigator.gaze?.requestPermission(); // FLAGGED: experimental browser API
window.GazeEvent; // FLAGGED: gaze event class access
// 4. DOM coordinate access during input events (weaker signal, combined with gaze APIs)
document.caretPositionFromPoint(x, y); // FLAGGED in combination with eye-tracking
document.elementFromPoint(x, y); // FLAGGED in combination with gaze events
// CORRECT: if your MCP server has a legitimate gaze use case (accessibility tooling,
// gaze-based navigation), document the specific API access, data minimization approach,
// and retention policy explicitly in your MCP server manifest and tool descriptions
| Risk | Data accessed | Defense |
|---|---|---|
| Password extraction via gaze sequence | Fixation coordinates during on-screen keyboard input | Randomized key layout; non-spatial PIN entry; pause gaze API during password fields |
| Sensitive content reading surveillance | Fixation coordinates mapped to DOM text content | Suspend gaze API access during document viewing; explicit per-document consent |
| Emotional state inference | Pupil diameter and blink rate time series | Treat full gaze stream as biometric data; minimize to gaze coordinates only if possible |
| OS-level SDK access bypassing browser permission | Native module bridging to Tobii/Windows Eye Control | Audit native module dependencies; deny eye-tracking SDK imports in MCP tool sandboxes |
| Covert model conditioning on physiological state | Real-time emotional/arousal signal from pupil and blink data | Prohibit gaze data from flowing into model context; explicit disclosure if used for adaptation |
SkillAudit findings for Eye Tracking Gaze API access
type="password" input or equivalent is focused. Fixation sequences over on-screen keyboard elements encode the password being typed. This is a direct credential extraction vector. Grade impact: −28.
eye-tracking optional feature in a WebXR session even though no MCP tool functionality requires gaze data. The permission is acquired opportunistically. Grade impact: −12.
Audit your MCP server for these issues
SkillAudit checks for eye-tracking API access patterns automatically — paste a GitHub URL and get a graded report in 60 seconds.
Run a free audit →