Security reference · Office documents · SSRF

MCP server Office document security

MCP tool handlers that process DOCX, XLSX, PPTX, or ODS files face an attack surface that extends far beyond the text content. The Office Open XML format (OOXML) is a ZIP archive containing XML files, embedded binary streams, external URL references, and optionally VBA macro code. When processed by LibreOffice headless — a common conversion path — the document's embedded functionality executes: macros run, external OLE links are fetched, field codes are evaluated. An LLM agent that passes an attacker-supplied document URL to a document tool can trigger SSRF, RCE, and data exfiltration from within the MCP server's process.

Attack 1: VBA/Basic macro execution via LibreOffice headless

DOCX files with the .docm extension (or XLSX with .xlsm) contain embedded VBA macro code stored in a binary stream within the ZIP archive. When LibreOffice headless opens such a file for conversion, it executes macros unless explicitly disabled.

// WRONG — LibreOffice without macro disable flag
import { exec } from "child_process";
import { promisify } from "util";
const execAsync = promisify(exec);

server.tool("convert_doc", { filePath: z.string() }, async ({ filePath }) => {
  // LibreOffice will execute macros in .docm files by default
  const { stdout } = await execAsync(
    `libreoffice --headless --convert-to txt "${filePath}" --outdir /tmp/converted`
  );
  return { content: [{ type: "text", text: stdout }] };
});

LibreOffice macros can execute arbitrary system commands. A VBA macro in a DOCX file can call Shell to run system commands, read environment variables including API keys, and make network connections — all within the context of the MCP server process.

Safe pattern: disable macros explicitly and reject macro-enabled formats.

import path from "path";
import { spawn } from "child_process";

// Reject macro-enabled file extensions before processing
const BLOCKED_EXTENSIONS = [".docm", ".xlsm", ".pptm", ".dotm", ".xltm", ".potm"];

function validateOfficeFile(filePath: string): void {
  const ext = path.extname(filePath).toLowerCase();
  if (BLOCKED_EXTENSIONS.includes(ext)) {
    throw new Error(`Macro-enabled Office format rejected: ${ext}`);
  }
}

async function libreOfficeConvert(filePath: string): Promise<string> {
  validateOfficeFile(filePath);
  const resolved = path.resolve(filePath);
  if (!resolved.startsWith(ALLOWED_DOCS_DIR + path.sep)) {
    throw new Error("Document path outside allowed directory");
  }

  return new Promise((resolve, reject) => {
    // --infilter disables macros at the UNO API level
    // --norestore prevents recovery file prompts
    const lo = spawn("libreoffice", [
      "--headless",
      "--norestore",
      "--infilter=writer8",  // force Writer filter, ignoring embedded Basic IDE
      "--env:UserInstallation=file:///tmp/lo-sandbox-" + process.pid,  // isolated profile
      "--convert-to", "txt:Text (encoded):UTF8",
      "--outdir", "/tmp/converted",
      resolved
    ], {
      shell: false,
      env: {
        ...process.env,
        JAVA_HOME: "",  // disable Java macros (UNO/Calc Basic can use Java bridge)
      }
    });

    let stderr = "";
    lo.stderr.on("data", d => stderr += d);
    lo.on("close", code => {
      if (code !== 0) reject(new Error(`libreoffice exited ${code}: ${stderr}`));
      else resolve(stderr); // stdout is the filename of converted file
    });
  });
}

Attack 2: External OLE link SSRF

OOXML documents can contain external data source references stored as OLE object links. A DOCX word/_rels/document.xml.rels entry with Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject" and a Target pointing to an internal URL causes LibreOffice to perform an outbound HTTP request when the document is opened — regardless of whether the document is a .docm file.

<!-- Malicious word/_rels/document.xml.rels -- embedded in DOCX -->
<Relationship Id="rId1"
  Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject"
  Target="http://169.254.169.254/latest/meta-data/iam/security-credentials/ec2-role"
  TargetMode="External"/>

When LibreOffice opens this document, it fetches the EC2 Instance Metadata Service URL and embeds the response in the document. The response (IAM credentials) ends up in the converted output text, exposing credentials to the MCP tool result.

Safe pattern: scan for external relationships before opening with LibreOffice, or strip them with a ZIP-level pre-processor.

import AdmZip from "adm-zip";

// Pre-scan OOXML for external OLE links before processing
function scanOfficeExternalLinks(buffer: Buffer): string[] {
  const zip = new AdmZip(buffer);
  const externalTargets: string[] = [];

  for (const entry of zip.getEntries()) {
    if (!entry.entryName.endsWith(".rels")) continue;
    const content = zip.readAsText(entry);
    // Match TargetMode="External" relationships
    const externalRe = /TargetMode="External"[^/>]*Target="([^"]+)"/g;
    let match;
    while ((match = externalRe.exec(content)) !== null) {
      externalTargets.push(match[1]);
    }
  }
  return externalTargets;
}

// In the tool handler:
const externalLinks = scanOfficeExternalLinks(fileBuffer);
if (externalLinks.length > 0) {
  // Either reject or strip — don't open with LibreOffice until clean
  throw new Error(`Document contains ${externalLinks.length} external link(s): ${externalLinks.join(", ")}`);
}

Attack 3: Embedded field code data exfiltration

DOCX field codes (the { INCLUDETEXT "http://..." } syntax) are evaluated by LibreOffice during rendering and can reference external URLs, causing outbound requests. Unlike OLE links (which are in the relationships file), field codes are embedded in the document body XML itself and don't appear in .rels files.

<!-- Malicious field code in word/document.xml -->
<w:fldChar w:fldCharType="begin"/>
<w:instrText> INCLUDETEXT "http://internal-api.corp/secrets" \* MERGEFORMAT </w:instrText>
<w:fldChar w:fldCharType="end"/>

Mitigation: Use mammoth.js for DOCX text extraction instead of LibreOffice. mammoth.js is a pure-JavaScript DOCX parser that ignores field codes and macro containers entirely — it reads the textual content from the XML without evaluating any embedded computation.

Preferred safe parsing libraries

Format Safe library What it ignores Notes
DOCX mammoth VBA macros, OLE links, field codes, embedded objects Pure JS; converts DOCX body XML to plain text or HTML
XLSX / XLS xlsx (SheetJS) VBA macros (xlsm binary stripped), external data connections Read-only mode with cellFormula: false disables formula evaluation
PPTX pptxgenjs (read) VBA macros, OLE links Limited text extraction; consider converting to PDF first via LibreOffice with restrictions
ODS / ODP XML parse directly (JSZip + DOMParser) Macros require explicit Basic interpreter ODF is clean XML — parse content.xml directly without LibreOffice
import mammoth from "mammoth";
import * as XLSX from "xlsx";

// Safe DOCX extraction with mammoth — no macro execution, no network requests
async function extractDocxText(buffer: Buffer): Promise<string> {
  const result = await mammoth.extractRawText({ buffer });
  if (result.messages.length > 0) {
    // mammoth emits warnings for unsupported elements — log but don't fail
    console.warn("mammoth warnings:", result.messages.map(m => m.message));
  }
  return result.value;
}

// Safe XLSX extraction with SheetJS — no formula evaluation, no macro execution
function extractXlsxText(buffer: Buffer): string {
  const workbook = XLSX.read(buffer, {
    type: "buffer",
    cellFormula: false,    // don't parse formulas — prevent formula injection
    cellHTML: false,       // no HTML in cells
    cellNF: false,         // no number formats needed for text extraction
    sheetStubs: false,
  });
  const lines: string[] = [];
  for (const sheetName of workbook.SheetNames) {
    const sheet = workbook.Sheets[sheetName];
    lines.push(XLSX.utils.sheet_to_csv(sheet));
  }
  return lines.join("\n\n");
}

SkillAudit findings

Finding → Grade Impact
Critical LibreOffice opened without macro disable flag — .docm/.xlsm macros execute with server process privileges. −25 points.
Critical No pre-scan for external OLE links — LibreOffice fetches embedded URLs including internal metadata services. −22 points.
High Macro-enabled file extensions (.docm, .xlsm) accepted without rejection. −15 points.
High LibreOffice launched with shared user profile — concurrent sessions share macro state and Basic IDE. −10 points.
High LibreOffice process not network-isolated — can make outbound connections to arbitrary URLs via field codes. −10 points.
Medium XLSX parsed with formula evaluation enabled — formula injection via cell values. −6 points.

Run an Office document security audit. SkillAudit checks for LibreOffice macro flags, OLE link pre-scanning, macro-enabled format acceptance, and unsafe library usage in MCP document tool handlers. Audit your server →