MCP Server Security · Encoding API · TextEncoder · TextDecoder · Timing Side Channel · BOM Sniffing · Charset Fingerprinting · Unicode Normalization · Fatal Mode

MCP server Encoding API security

The Encoding API (TextEncoder/TextDecoder) is synchronous, permission-free, and present in every browser, Node.js, Deno, and Electron. That universality is exactly what makes it dangerous in MCP server contexts: four attack surfaces — timing oracles on string composition, binary-sequence probing via fatal mode, BOM-based encoding fingerprinting, and Unicode normalization side channels — can be executed with no user gesture, no network call, and no API restriction.

Encoding API surface

// Encoding API — universally available, synchronous, no permission required
// Available: Chrome, Firefox, Safari, Edge, Node.js 11+, Deno, Electron — ALL versions in common use

const encoder = new TextEncoder();           // always UTF-8; no constructor arguments
const decoder = new TextDecoder();           // default: UTF-8, {fatal: false, ignoreBOM: false}
const fatalDecoder = new TextDecoder('utf-8', { fatal: true });   // throws on invalid byte sequences
const latin1Decoder = new TextDecoder('iso-8859-1');
const utf16leDecoder = new TextDecoder('utf-16le');

// Core operations
const encoded = encoder.encode('hello');     // Uint8Array [104, 101, 108, 108, 111]
const decoded = decoder.decode(encoded);     // 'hello'

// encodeInto() — writes directly into an existing buffer, returns {read, written}
const buf = new Uint8Array(256);
const result = encoder.encodeInto('hello', buf);  // {read: 5, written: 5}

// No Permissions-Policy control — cannot be blocked via HTTP header
// No CSP directive — operates entirely in JS memory
// Works in Web Workers, Service Workers, Shared Workers — no main-thread requirement

Universal availability: Unlike most Web APIs with attack surfaces, there is no browser flag, Permissions-Policy directive, or CSP directive that disables TextEncoder/TextDecoder. The API is available in every sandboxed worker context including Service Workers and Shared Workers. An MCP tool that imports this API has it unconditionally.

Attack 1 — encode() timing oracle on string composition

TextEncoder.encode() performance varies with the Unicode character categories in the input string. ASCII characters encode in a single byte; 2- and 3-byte characters require additional computation. By prepending an attacker-controlled prefix to a secret value and measuring encode() duration via performance.now(), an MCP tool can infer the character composition of the secret — specifically whether it contains multi-byte Unicode, which fingerprints language and script.

// Timing oracle: infer character composition of a secret string
// Premise: encoding a string with many 3-byte characters (e.g., Chinese, Japanese, emoji)
// takes measurably longer than an equal-length ASCII string

function inferCharClass(secret) {
  const ROUNDS = 10000;
  const encoder = new TextEncoder();

  // Baseline: concatenate with known ASCII prefix
  const start1 = performance.now();
  for (let i = 0; i < ROUNDS; i++) encoder.encode('AAAA' + secret);
  const asciiPrefixTime = performance.now() - start1;

  // Test: concatenate with 3-byte prefix
  const start2 = performance.now();
  for (let i = 0; i < ROUNDS; i++) encoder.encode('中文汉字' + secret);  // 中文汉字
  const cjkPrefixTime = performance.now() - start2;

  // Difference in encoding time correlates with byte-length of secret
  // A secret that is all-ASCII will show less overhead than one with CJK/emoji
  return {
    asciiPrefixTime,
    cjkPrefixTime,
    estimatedSecretByteLength: encoder.encode(secret).byteLength  // direct leak if you can call encode()
  };
}

// More subtle: use encodeInto() to measure {written} without materializing a Uint8Array:
function byteCountOracle(value) {
  const encoder = new TextEncoder();
  const buf = new Uint8Array(8192);
  const { written } = encoder.encodeInto(value, buf);
  return written;  // differs from value.length for multi-byte characters → reveals encoding
}

Attack 2 — fatal-mode binary probe on captured byte buffers

When a TextDecoder is constructed with { fatal: true }, it throws a TypeError if the byte sequence contains bytes that are invalid for the specified encoding. This makes it a probe: given a byte buffer captured from OPFS, IndexedDB, or a tool response, an MCP server can binary-search through encoding combinations to determine the original encoding of the data, leaking information about where the data was authored (locale, tool chain, operating system).

// Binary probe: determine encoding of a captured byte buffer without displaying its content
// Use case: MCP tool receives a blob from another tool or from OPFS — what encoding is it?

async function probeEncoding(byteBuffer) {
  const probes = [
    { label: 'utf-8',        decoder: new TextDecoder('utf-8',        { fatal: true }) },
    { label: 'utf-16le',     decoder: new TextDecoder('utf-16le',     { fatal: true }) },
    { label: 'utf-16be',     decoder: new TextDecoder('utf-16be',     { fatal: true }) },
    { label: 'iso-8859-1',   decoder: new TextDecoder('iso-8859-1',   { fatal: true }) },
    { label: 'windows-1252', decoder: new TextDecoder('windows-1252', { fatal: true }) },
    { label: 'shift-jis',    decoder: new TextDecoder('shift-jis',    { fatal: true }) },
    { label: 'euc-kr',       decoder: new TextDecoder('euc-kr',       { fatal: true }) },
  ];

  const results = {};
  for (const { label, decoder } of probes) {
    try {
      decoder.decode(byteBuffer);     // throws TypeError on invalid byte sequence
      results[label] = true;          // valid encoding
    } catch {
      results[label] = false;         // invalid byte sequence for this encoding
    }
  }

  // results fingerprints where the data was authored:
  // { 'utf-8': false, 'shift-jis': true, 'euc-kr': false } → Windows Japanese locale
  // { 'utf-8': true, 'iso-8859-1': true } → ASCII-only or UTF-8 compatible content
  return results;
}

// Extended attack: length difference between encodings fingerprints language
function encodingLengthOracle(text) {
  const utf8Bytes  = new TextEncoder().encode(text).byteLength;
  const charCount  = text.length;   // JS string length = UTF-16 code units

  // utf8Bytes > charCount → non-ASCII characters present (2+ byte sequences)
  // utf8Bytes = charCount → pure ASCII (bytes 0x00–0x7F only)
  // utf8Bytes / charCount ratio fingerprints CJK (3x), emoji/supplementary (4x), etc.
  return { charCount, utf8Bytes, ratio: utf8Bytes / charCount };
}

Attack 3 — BOM detection to fingerprint data origin

Many Windows authoring tools prepend a UTF-8 BOM (0xEF 0xBB 0xBF) to files they write. Linux tools typically do not. UTF-16 BOM presence and byte order (0xFF 0xFE vs 0xFE 0xFF) reveals endianness and author platform. An MCP tool that accesses byte data from OPFS, IndexedDB, or clipboard can read the BOM to fingerprint which platform, editor, or operating system produced the data — even without reading the content.

// BOM detection oracle: fingerprint the authoring platform from the first 3–4 bytes

function detectBOM(buffer) {
  const bytes = new Uint8Array(buffer);
  const b0 = bytes[0], b1 = bytes[1], b2 = bytes[2];

  if (b0 === 0xEF && b1 === 0xBB && b2 === 0xBF) {
    return { encoding: 'utf-8-bom', platform: 'Windows (Notepad / Excel / old VS Code)' };
  }
  if (b0 === 0xFF && b1 === 0xFE) {
    return { encoding: 'utf-16le', platform: 'Windows COM / PowerShell default output' };
  }
  if (b0 === 0xFE && b1 === 0xFF) {
    return { encoding: 'utf-16be', platform: 'macOS / Java / older UNIX tools' };
  }
  if (b0 === 0x00 && b1 === 0x00 && b2 === 0xFE) {
    return { encoding: 'utf-32le', platform: 'rare; some Linux tools' };
  }
  // No BOM: most Linux/macOS tools, modern VS Code, git-authored files
  return { encoding: 'unknown-no-bom', platform: 'likely Linux/macOS or BOM-stripped tool' };
}

// Cross-platform leak: detect whether captured file was authored on Windows vs Unix
// without reading the file's content — BOM is in the first 3 bytes only
async function authorPlatformFromOPFS(filename) {
  const root = await navigator.storage.getDirectory();
  const fh = await root.getFileHandle(filename);
  const file = await fh.getFile();
  const header = await file.slice(0, 4).arrayBuffer();  // read 4 bytes only
  return detectBOM(header);
  // → { encoding: 'utf-8-bom', platform: 'Windows (Notepad / Excel / old VS Code)' }
  // leaks whether the user is on Windows or Linux without reading any content
}

Attack 4 — Unicode normalization oracle

Unicode has multiple canonical representations for visually identical characters. For example, é can be stored as a single precomposed code point (U+00E9, NFC form) or as e followed by a combining accent (U+0065 + U+0301, NFD form). TextEncoder encodes both forms correctly but produces different byte lengths. Japanese IMEs and macOS normalize to NFD; Windows and web browsers typically produce NFC. Measuring the byte length of a secret value — without decoding it — leaks its normalization form, which fingerprints the input method and operating system that produced the value.

// Unicode normalization oracle: fingerprint IME / OS from byte length difference

function normalizationOracle(secretText) {
  const encoder = new TextEncoder();

  // Compare byte lengths of NFC and NFD normalized forms
  const nfcBytes  = encoder.encode(secretText.normalize('NFC')).byteLength;
  const nfdBytes  = encoder.encode(secretText.normalize('NFD')).byteLength;
  const rawBytes  = encoder.encode(secretText).byteLength;

  return {
    rawBytes,
    nfcBytes,
    nfdBytes,
    alreadyNFC: rawBytes === nfcBytes,   // true if input was produced by Windows/web browser
    alreadyNFD: rawBytes === nfdBytes,   // true if input was produced by macOS/Japanese IME
    hasComposable: nfcBytes !== nfdBytes  // false for ASCII-only input → reveals non-ASCII presence
  };
}

// Extended: measure string length vs byte length to detect surrogate pairs (emoji, supplementary)
function surrogateOracle(text) {
  const byteLen = new TextEncoder().encode(text).byteLength;
  const charLen = text.length;  // UTF-16 code units; surrogate pairs count as 2

  return {
    hasSupplementary: text.length !== [...text].length,  // spread iterates code points
    codePointCount: [...text].length,
    utf16Units: charLen,
    utf8Bytes: byteLen,
    // If codePointCount < utf16Units → emoji / supplementary chars present → fingerprints content
    emojiLikely: charLen > [...text].length
  };
}

What SkillAudit checks

MEDIUM
TextEncoder.encode() called on mixed attacker-controlled + secret data in a timing measurement loop — encode() timing leaks byte-length and character-class composition of the secret component; not directly exploitable but part of a multi-step oracle chain.
MEDIUM
TextDecoder with {fatal: true} called in an encoding probe loop on captured buffers — binary-search over encoding names leaks authoring locale and platform for byte data from OPFS, IndexedDB, or clipboard without displaying any content.
MEDIUM
First 3–4 bytes of OPFS or IndexedDB file read to detect BOM prefix — BOM detection leaks user operating system (Windows vs Linux/macOS) without reading file content; combined with filename from launch handler or file picker, creates a high-entropy device fingerprint.
LOW
Normalization form comparison (NFC vs NFD byte length) on user-provided text — leaks whether input was typed on macOS/Japanese IME (NFD) vs Windows/web browser (NFC); low standalone impact but fingerprints user locale and input method.
LOW
encodeInto() {written} value logged or transmitted — the written count is the UTF-8 byte length of the input; for non-ASCII text this is strictly greater than the character count and leaks multi-byte character presence even if the string content is not sent.

Runtime support

RuntimeTextEncoderTextDecoderencodeInto()Permissions-Policy
Chrome 38+FullFullChrome 74+None
Firefox 19+FullFullFirefox 69+None
Safari 10.1+FullFullSafari 14.1+None
Node.js 11+GlobalGlobalFullN/A
Deno 1.0+GlobalGlobalFullN/A
ElectronFull (Chromium)FullFullNone
Web WorkersFullFullFullNone
Service WorkersFullFullFullNone
Audit your MCP server →

Related: Compression Streams security · Compute Pressure API security · Compression Streams deep dive · WebCodecs security · All security posts