MCP Server Security · Encoding API · TextEncoder · TextDecoder · Timing Side Channel · BOM Sniffing · Charset Fingerprinting · Unicode Normalization · Fatal Mode
MCP server Encoding API security
The Encoding API (TextEncoder/TextDecoder) is synchronous, permission-free, and present in every browser, Node.js, Deno, and Electron. That universality is exactly what makes it dangerous in MCP server contexts: four attack surfaces — timing oracles on string composition, binary-sequence probing via fatal mode, BOM-based encoding fingerprinting, and Unicode normalization side channels — can be executed with no user gesture, no network call, and no API restriction.
Encoding API surface
// Encoding API — universally available, synchronous, no permission required
// Available: Chrome, Firefox, Safari, Edge, Node.js 11+, Deno, Electron — ALL versions in common use
const encoder = new TextEncoder(); // always UTF-8; no constructor arguments
const decoder = new TextDecoder(); // default: UTF-8, {fatal: false, ignoreBOM: false}
const fatalDecoder = new TextDecoder('utf-8', { fatal: true }); // throws on invalid byte sequences
const latin1Decoder = new TextDecoder('iso-8859-1');
const utf16leDecoder = new TextDecoder('utf-16le');
// Core operations
const encoded = encoder.encode('hello'); // Uint8Array [104, 101, 108, 108, 111]
const decoded = decoder.decode(encoded); // 'hello'
// encodeInto() — writes directly into an existing buffer, returns {read, written}
const buf = new Uint8Array(256);
const result = encoder.encodeInto('hello', buf); // {read: 5, written: 5}
// No Permissions-Policy control — cannot be blocked via HTTP header
// No CSP directive — operates entirely in JS memory
// Works in Web Workers, Service Workers, Shared Workers — no main-thread requirement
Universal availability: Unlike most Web APIs with attack surfaces, there is no browser flag, Permissions-Policy directive, or CSP directive that disables TextEncoder/TextDecoder. The API is available in every sandboxed worker context including Service Workers and Shared Workers. An MCP tool that imports this API has it unconditionally.
Attack 1 — encode() timing oracle on string composition
TextEncoder.encode() performance varies with the Unicode character categories in the input string. ASCII characters encode in a single byte; 2- and 3-byte characters require additional computation. By prepending an attacker-controlled prefix to a secret value and measuring encode() duration via performance.now(), an MCP tool can infer the character composition of the secret — specifically whether it contains multi-byte Unicode, which fingerprints language and script.
// Timing oracle: infer character composition of a secret string
// Premise: encoding a string with many 3-byte characters (e.g., Chinese, Japanese, emoji)
// takes measurably longer than an equal-length ASCII string
function inferCharClass(secret) {
const ROUNDS = 10000;
const encoder = new TextEncoder();
// Baseline: concatenate with known ASCII prefix
const start1 = performance.now();
for (let i = 0; i < ROUNDS; i++) encoder.encode('AAAA' + secret);
const asciiPrefixTime = performance.now() - start1;
// Test: concatenate with 3-byte prefix
const start2 = performance.now();
for (let i = 0; i < ROUNDS; i++) encoder.encode('中文汉字' + secret); // 中文汉字
const cjkPrefixTime = performance.now() - start2;
// Difference in encoding time correlates with byte-length of secret
// A secret that is all-ASCII will show less overhead than one with CJK/emoji
return {
asciiPrefixTime,
cjkPrefixTime,
estimatedSecretByteLength: encoder.encode(secret).byteLength // direct leak if you can call encode()
};
}
// More subtle: use encodeInto() to measure {written} without materializing a Uint8Array:
function byteCountOracle(value) {
const encoder = new TextEncoder();
const buf = new Uint8Array(8192);
const { written } = encoder.encodeInto(value, buf);
return written; // differs from value.length for multi-byte characters → reveals encoding
}
Attack 2 — fatal-mode binary probe on captured byte buffers
When a TextDecoder is constructed with { fatal: true }, it throws a TypeError if the byte sequence contains bytes that are invalid for the specified encoding. This makes it a probe: given a byte buffer captured from OPFS, IndexedDB, or a tool response, an MCP server can binary-search through encoding combinations to determine the original encoding of the data, leaking information about where the data was authored (locale, tool chain, operating system).
// Binary probe: determine encoding of a captured byte buffer without displaying its content
// Use case: MCP tool receives a blob from another tool or from OPFS — what encoding is it?
async function probeEncoding(byteBuffer) {
const probes = [
{ label: 'utf-8', decoder: new TextDecoder('utf-8', { fatal: true }) },
{ label: 'utf-16le', decoder: new TextDecoder('utf-16le', { fatal: true }) },
{ label: 'utf-16be', decoder: new TextDecoder('utf-16be', { fatal: true }) },
{ label: 'iso-8859-1', decoder: new TextDecoder('iso-8859-1', { fatal: true }) },
{ label: 'windows-1252', decoder: new TextDecoder('windows-1252', { fatal: true }) },
{ label: 'shift-jis', decoder: new TextDecoder('shift-jis', { fatal: true }) },
{ label: 'euc-kr', decoder: new TextDecoder('euc-kr', { fatal: true }) },
];
const results = {};
for (const { label, decoder } of probes) {
try {
decoder.decode(byteBuffer); // throws TypeError on invalid byte sequence
results[label] = true; // valid encoding
} catch {
results[label] = false; // invalid byte sequence for this encoding
}
}
// results fingerprints where the data was authored:
// { 'utf-8': false, 'shift-jis': true, 'euc-kr': false } → Windows Japanese locale
// { 'utf-8': true, 'iso-8859-1': true } → ASCII-only or UTF-8 compatible content
return results;
}
// Extended attack: length difference between encodings fingerprints language
function encodingLengthOracle(text) {
const utf8Bytes = new TextEncoder().encode(text).byteLength;
const charCount = text.length; // JS string length = UTF-16 code units
// utf8Bytes > charCount → non-ASCII characters present (2+ byte sequences)
// utf8Bytes = charCount → pure ASCII (bytes 0x00–0x7F only)
// utf8Bytes / charCount ratio fingerprints CJK (3x), emoji/supplementary (4x), etc.
return { charCount, utf8Bytes, ratio: utf8Bytes / charCount };
}
Attack 3 — BOM detection to fingerprint data origin
Many Windows authoring tools prepend a UTF-8 BOM (0xEF 0xBB 0xBF) to files they write. Linux tools typically do not. UTF-16 BOM presence and byte order (0xFF 0xFE vs 0xFE 0xFF) reveals endianness and author platform. An MCP tool that accesses byte data from OPFS, IndexedDB, or clipboard can read the BOM to fingerprint which platform, editor, or operating system produced the data — even without reading the content.
// BOM detection oracle: fingerprint the authoring platform from the first 3–4 bytes
function detectBOM(buffer) {
const bytes = new Uint8Array(buffer);
const b0 = bytes[0], b1 = bytes[1], b2 = bytes[2];
if (b0 === 0xEF && b1 === 0xBB && b2 === 0xBF) {
return { encoding: 'utf-8-bom', platform: 'Windows (Notepad / Excel / old VS Code)' };
}
if (b0 === 0xFF && b1 === 0xFE) {
return { encoding: 'utf-16le', platform: 'Windows COM / PowerShell default output' };
}
if (b0 === 0xFE && b1 === 0xFF) {
return { encoding: 'utf-16be', platform: 'macOS / Java / older UNIX tools' };
}
if (b0 === 0x00 && b1 === 0x00 && b2 === 0xFE) {
return { encoding: 'utf-32le', platform: 'rare; some Linux tools' };
}
// No BOM: most Linux/macOS tools, modern VS Code, git-authored files
return { encoding: 'unknown-no-bom', platform: 'likely Linux/macOS or BOM-stripped tool' };
}
// Cross-platform leak: detect whether captured file was authored on Windows vs Unix
// without reading the file's content — BOM is in the first 3 bytes only
async function authorPlatformFromOPFS(filename) {
const root = await navigator.storage.getDirectory();
const fh = await root.getFileHandle(filename);
const file = await fh.getFile();
const header = await file.slice(0, 4).arrayBuffer(); // read 4 bytes only
return detectBOM(header);
// → { encoding: 'utf-8-bom', platform: 'Windows (Notepad / Excel / old VS Code)' }
// leaks whether the user is on Windows or Linux without reading any content
}
Attack 4 — Unicode normalization oracle
Unicode has multiple canonical representations for visually identical characters. For example, é can be stored as a single precomposed code point (U+00E9, NFC form) or as e followed by a combining accent (U+0065 + U+0301, NFD form). TextEncoder encodes both forms correctly but produces different byte lengths. Japanese IMEs and macOS normalize to NFD; Windows and web browsers typically produce NFC. Measuring the byte length of a secret value — without decoding it — leaks its normalization form, which fingerprints the input method and operating system that produced the value.
// Unicode normalization oracle: fingerprint IME / OS from byte length difference
function normalizationOracle(secretText) {
const encoder = new TextEncoder();
// Compare byte lengths of NFC and NFD normalized forms
const nfcBytes = encoder.encode(secretText.normalize('NFC')).byteLength;
const nfdBytes = encoder.encode(secretText.normalize('NFD')).byteLength;
const rawBytes = encoder.encode(secretText).byteLength;
return {
rawBytes,
nfcBytes,
nfdBytes,
alreadyNFC: rawBytes === nfcBytes, // true if input was produced by Windows/web browser
alreadyNFD: rawBytes === nfdBytes, // true if input was produced by macOS/Japanese IME
hasComposable: nfcBytes !== nfdBytes // false for ASCII-only input → reveals non-ASCII presence
};
}
// Extended: measure string length vs byte length to detect surrogate pairs (emoji, supplementary)
function surrogateOracle(text) {
const byteLen = new TextEncoder().encode(text).byteLength;
const charLen = text.length; // UTF-16 code units; surrogate pairs count as 2
return {
hasSupplementary: text.length !== [...text].length, // spread iterates code points
codePointCount: [...text].length,
utf16Units: charLen,
utf8Bytes: byteLen,
// If codePointCount < utf16Units → emoji / supplementary chars present → fingerprints content
emojiLikely: charLen > [...text].length
};
}
What SkillAudit checks
Runtime support
| Runtime | TextEncoder | TextDecoder | encodeInto() | Permissions-Policy |
|---|---|---|---|---|
| Chrome 38+ | Full | Full | Chrome 74+ | None |
| Firefox 19+ | Full | Full | Firefox 69+ | None |
| Safari 10.1+ | Full | Full | Safari 14.1+ | None |
| Node.js 11+ | Global | Global | Full | N/A |
| Deno 1.0+ | Global | Global | Full | N/A |
| Electron | Full (Chromium) | Full | Full | None |
| Web Workers | Full | Full | Full | None |
| Service Workers | Full | Full | Full | None |
Related: Compression Streams security · Compute Pressure API security · Compression Streams deep dive · WebCodecs security · All security posts