Topic: mcp server internationalization security

MCP server internationalization security — locale injection, charset encoding attacks, and RTL Unicode bidi override injection

Internationalization support in MCP servers introduces a category of vulnerabilities that is invisible to traditional security scanners: locale injection that changes how numbers and dates are formatted in tool responses, charset parameter manipulation that corrupts encoding of downstream consumers, and Unicode bidirectional control character injection that reverses displayed text in rendered outputs. These vulnerabilities are subtle because the server code produces no errors — it correctly executes the operation with the attacker-supplied locale or encoding.

Locale injection via tool arguments

An MCP server that accepts a locale argument and passes it to Intl.NumberFormat, Intl.DateTimeFormat, or similar APIs produces formatting output that is a function of the attacker-controlled locale string. The security implication is that the LLM receiving the formatted output may interpret the number differently than the system that produced it, because decimal and thousands separators vary by locale:

// Dangerous: locale from tool argument controls number formatting
server.tool('formatPrice', {
  schema: {
    amount: { type: 'number' },
    locale: { type: 'string' } // LLM-supplied or prompt-injection-supplied
  },
  handler: async ({ amount, locale }) => {
    // With locale='de-DE': 1234.56 → "1.234,56" (period as thousands, comma as decimal)
    // With locale='en-US': 1234.56 → "1,234.56" (comma as thousands, period as decimal)
    const formatted = new Intl.NumberFormat(locale).format(amount);
    return { formatted }; // LLM reads "1.234,56" and may interpret as 1.234 or 1234.56 depending on its training
  }
});

// Safe: locale comes from trusted server configuration, not from tool argument
const SERVER_LOCALE = process.env.SERVER_LOCALE || 'en-US';
const numberFormatter = new Intl.NumberFormat(SERVER_LOCALE);

server.tool('formatPrice', {
  schema: { amount: { type: 'number' } },
  handler: async ({ amount }) => {
    return {
      formatted: numberFormatter.format(amount),
      locale: SERVER_LOCALE, // declared alongside value so LLM has context
      raw: amount // always include the raw numeric value for downstream calculation
    };
  }
});

The raw numeric value is the most important part of the response for any downstream calculation. An LLM that reads "1.234,56" may not reliably identify the locale-specific decimal separator in all contexts — especially if it's processing formatted strings from multiple tools with different locales in the same session. Always return the raw value alongside the formatted string.

Date format ambiguity via locale injection

Date formatting via locale injection introduces a different ambiguity: 01/02/03 means January 2, 2003 in US locale, February 1, 2003 in UK locale, and March 2, 2001 in Japanese locale. An MCP server that formats dates using an LLM-controlled locale and returns them to the LLM's context can create date interpretation errors in downstream scheduling or time-based tools. The LLM may re-interpret the formatted date using its default locale assumption rather than the server's:

// Dangerous: date formatted with LLM-controlled locale
const formatted = new Intl.DateTimeFormat(args.locale).format(new Date(isoDateString));
return { date: formatted }; // "02/01/2026" — ambiguous DD/MM or MM/DD?

// Safe: use ISO 8601 format for all dates in tool responses — unambiguous across all locales
return {
  date_iso: new Date(isoDateString).toISOString(), // "2026-02-01T00:00:00.000Z"
  date_display: new Intl.DateTimeFormat('en-US', { dateStyle: 'long' }).format(new Date(isoDateString))
  // display format is for human reading; iso format is for downstream tool consumption
};

Charset encoding attacks via Content-Type manipulation

An MCP server that echoes a client-supplied charset or encoding parameter into a Content-Type response header, or that uses it to select a text encoding for response body serialization, can be induced to produce responses that downstream consumers mis-decode. If the server serializes a response in UTF-8 but declares the Content-Type as text/plain; charset=ISO-8859-1, any multi-byte Unicode characters in the response will be mis-decoded by a consumer that respects the Content-Type header. In an MCP context, the "consumer" is typically the MCP client framework itself.

// Dangerous: charset from request parameter echoed into response
app.get('/tool-result', (req, res) => {
  const charset = req.query.charset || 'utf-8';
  res.setHeader('Content-Type', `text/plain; charset=${charset}`); // attacker sets charset=gbk
  res.send(result); // result is UTF-8 bytes but declared as GBK
});

// Safe: always declare UTF-8, ignore client-supplied charset parameter
app.get('/tool-result', (req, res) => {
  res.setHeader('Content-Type', 'application/json; charset=utf-8');
  res.json(result);
});

Unicode bidirectional override injection

Unicode includes bidirectional control characters (U+202E RIGHT-TO-LEFT OVERRIDE, U+202B RIGHT-TO-LEFT EMBEDDING, and others) that reverse text rendering direction in applications that display Unicode text. An MCP server that returns user-controlled string content in tool responses may relay these characters to any rendering surface — a terminal, a web UI, a document — where they can visually reverse displayed text. This is used in social engineering attacks to make a filename or URL appear different from its actual value.

The attack in an MCP context: a tool that reads a Git repository and returns file contents may return a filename containing a bidi override that makes important_exec.sh display as hs.cexe_tnatropmi in the terminal, causing a human reviewer to approve execution of a file they didn't intend to run. The LLM itself is not the target — the human reviewing the LLM's output is:

const BIDI_OVERRIDE_REGEX = /[‪-‮⁦-⁩‏؜]/g;

function sanitizeBidiOverrides(str) {
  return str.replace(BIDI_OVERRIDE_REGEX, '');
}

// Applied to all string fields in tool responses that will be displayed to humans
function sanitizeToolResponse(obj) {
  if (typeof obj === 'string') return sanitizeBidiOverrides(obj);
  if (Array.isArray(obj)) return obj.map(sanitizeToolResponse);
  if (obj && typeof obj === 'object') {
    return Object.fromEntries(
      Object.entries(obj).map(([k, v]) => [k, sanitizeToolResponse(v)])
    );
  }
  return obj;
}

Homograph attacks in internationalized input

Unicode homographs are characters that look identical or near-identical to ASCII characters but have different code points. A tool argument of gіthub.com (where the "і" is Cyrillic U+0456, not ASCII "i") looks identical to github.com in many fonts but refers to a different domain if used in a URL allowlist check. MCP servers that validate domain names, usernames, or file paths against allowlists must normalize input to ASCII (via NFKC or punycode for domains) before comparison:

// Dangerous: string comparison against allowlist without Unicode normalization
const ALLOWED_DOMAINS = new Set(['github.com', 'api.github.com']);
function isAllowedDomain(hostname) {
  return ALLOWED_DOMAINS.has(hostname); // "gіthub.com" (Cyrillic і) passes if not in set
}

// Safe: NFKC normalization + punycode for domain allowlist checks
const { toASCII } = require('punycode'); // or use URL constructor which normalizes automatically

function isAllowedDomain(hostname) {
  try {
    // URL constructor normalizes to punycode, lowercases, strips trailing dot
    const normalized = new URL('https://' + hostname).hostname;
    return ALLOWED_DOMAINS.has(normalized);
  } catch {
    return false; // invalid hostname
  }
}

What SkillAudit checks for i18n security issues

Internationalization security findings are typically LOW or MEDIUM severity in isolation, but MEDIUM to HIGH when they are part of a chain that includes downstream calculation or authorization logic. Run a free SkillAudit scan to check your MCP server for i18n security patterns.