Security reference · Office documents · SSRF
MCP server Office document security
MCP tool handlers that process DOCX, XLSX, PPTX, or ODS files face an attack surface that extends far beyond the text content. The Office Open XML format (OOXML) is a ZIP archive containing XML files, embedded binary streams, external URL references, and optionally VBA macro code. When processed by LibreOffice headless — a common conversion path — the document's embedded functionality executes: macros run, external OLE links are fetched, field codes are evaluated. An LLM agent that passes an attacker-supplied document URL to a document tool can trigger SSRF, RCE, and data exfiltration from within the MCP server's process.
Attack 1: VBA/Basic macro execution via LibreOffice headless
DOCX files with the .docm extension (or XLSX with .xlsm) contain embedded VBA macro code stored in a binary stream within the ZIP archive. When LibreOffice headless opens such a file for conversion, it executes macros unless explicitly disabled.
// WRONG — LibreOffice without macro disable flag
import { exec } from "child_process";
import { promisify } from "util";
const execAsync = promisify(exec);
server.tool("convert_doc", { filePath: z.string() }, async ({ filePath }) => {
// LibreOffice will execute macros in .docm files by default
const { stdout } = await execAsync(
`libreoffice --headless --convert-to txt "${filePath}" --outdir /tmp/converted`
);
return { content: [{ type: "text", text: stdout }] };
});
LibreOffice macros can execute arbitrary system commands. A VBA macro in a DOCX file can call Shell to run system commands, read environment variables including API keys, and make network connections — all within the context of the MCP server process.
Safe pattern: disable macros explicitly and reject macro-enabled formats.
import path from "path";
import { spawn } from "child_process";
// Reject macro-enabled file extensions before processing
const BLOCKED_EXTENSIONS = [".docm", ".xlsm", ".pptm", ".dotm", ".xltm", ".potm"];
function validateOfficeFile(filePath: string): void {
const ext = path.extname(filePath).toLowerCase();
if (BLOCKED_EXTENSIONS.includes(ext)) {
throw new Error(`Macro-enabled Office format rejected: ${ext}`);
}
}
async function libreOfficeConvert(filePath: string): Promise<string> {
validateOfficeFile(filePath);
const resolved = path.resolve(filePath);
if (!resolved.startsWith(ALLOWED_DOCS_DIR + path.sep)) {
throw new Error("Document path outside allowed directory");
}
return new Promise((resolve, reject) => {
// --infilter disables macros at the UNO API level
// --norestore prevents recovery file prompts
const lo = spawn("libreoffice", [
"--headless",
"--norestore",
"--infilter=writer8", // force Writer filter, ignoring embedded Basic IDE
"--env:UserInstallation=file:///tmp/lo-sandbox-" + process.pid, // isolated profile
"--convert-to", "txt:Text (encoded):UTF8",
"--outdir", "/tmp/converted",
resolved
], {
shell: false,
env: {
...process.env,
JAVA_HOME: "", // disable Java macros (UNO/Calc Basic can use Java bridge)
}
});
let stderr = "";
lo.stderr.on("data", d => stderr += d);
lo.on("close", code => {
if (code !== 0) reject(new Error(`libreoffice exited ${code}: ${stderr}`));
else resolve(stderr); // stdout is the filename of converted file
});
});
}
Attack 2: External OLE link SSRF
OOXML documents can contain external data source references stored as OLE object links. A DOCX word/_rels/document.xml.rels entry with Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject" and a Target pointing to an internal URL causes LibreOffice to perform an outbound HTTP request when the document is opened — regardless of whether the document is a .docm file.
<!-- Malicious word/_rels/document.xml.rels -- embedded in DOCX --> <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject" Target="http://169.254.169.254/latest/meta-data/iam/security-credentials/ec2-role" TargetMode="External"/>
When LibreOffice opens this document, it fetches the EC2 Instance Metadata Service URL and embeds the response in the document. The response (IAM credentials) ends up in the converted output text, exposing credentials to the MCP tool result.
Safe pattern: scan for external relationships before opening with LibreOffice, or strip them with a ZIP-level pre-processor.
import AdmZip from "adm-zip";
// Pre-scan OOXML for external OLE links before processing
function scanOfficeExternalLinks(buffer: Buffer): string[] {
const zip = new AdmZip(buffer);
const externalTargets: string[] = [];
for (const entry of zip.getEntries()) {
if (!entry.entryName.endsWith(".rels")) continue;
const content = zip.readAsText(entry);
// Match TargetMode="External" relationships
const externalRe = /TargetMode="External"[^/>]*Target="([^"]+)"/g;
let match;
while ((match = externalRe.exec(content)) !== null) {
externalTargets.push(match[1]);
}
}
return externalTargets;
}
// In the tool handler:
const externalLinks = scanOfficeExternalLinks(fileBuffer);
if (externalLinks.length > 0) {
// Either reject or strip — don't open with LibreOffice until clean
throw new Error(`Document contains ${externalLinks.length} external link(s): ${externalLinks.join(", ")}`);
}
Attack 3: Embedded field code data exfiltration
DOCX field codes (the { INCLUDETEXT "http://..." } syntax) are evaluated by LibreOffice during rendering and can reference external URLs, causing outbound requests. Unlike OLE links (which are in the relationships file), field codes are embedded in the document body XML itself and don't appear in .rels files.
<!-- Malicious field code in word/document.xml --> <w:fldChar w:fldCharType="begin"/> <w:instrText> INCLUDETEXT "http://internal-api.corp/secrets" \* MERGEFORMAT </w:instrText> <w:fldChar w:fldCharType="end"/>
Mitigation: Use mammoth.js for DOCX text extraction instead of LibreOffice. mammoth.js is a pure-JavaScript DOCX parser that ignores field codes and macro containers entirely — it reads the textual content from the XML without evaluating any embedded computation.
Preferred safe parsing libraries
| Format | Safe library | What it ignores | Notes |
|---|---|---|---|
| DOCX | mammoth |
VBA macros, OLE links, field codes, embedded objects | Pure JS; converts DOCX body XML to plain text or HTML |
| XLSX / XLS | xlsx (SheetJS) |
VBA macros (xlsm binary stripped), external data connections | Read-only mode with cellFormula: false disables formula evaluation |
| PPTX | pptxgenjs (read) |
VBA macros, OLE links | Limited text extraction; consider converting to PDF first via LibreOffice with restrictions |
| ODS / ODP | XML parse directly (JSZip + DOMParser) | Macros require explicit Basic interpreter | ODF is clean XML — parse content.xml directly without LibreOffice |
import mammoth from "mammoth";
import * as XLSX from "xlsx";
// Safe DOCX extraction with mammoth — no macro execution, no network requests
async function extractDocxText(buffer: Buffer): Promise<string> {
const result = await mammoth.extractRawText({ buffer });
if (result.messages.length > 0) {
// mammoth emits warnings for unsupported elements — log but don't fail
console.warn("mammoth warnings:", result.messages.map(m => m.message));
}
return result.value;
}
// Safe XLSX extraction with SheetJS — no formula evaluation, no macro execution
function extractXlsxText(buffer: Buffer): string {
const workbook = XLSX.read(buffer, {
type: "buffer",
cellFormula: false, // don't parse formulas — prevent formula injection
cellHTML: false, // no HTML in cells
cellNF: false, // no number formats needed for text extraction
sheetStubs: false,
});
const lines: string[] = [];
for (const sheetName of workbook.SheetNames) {
const sheet = workbook.Sheets[sheetName];
lines.push(XLSX.utils.sheet_to_csv(sheet));
}
return lines.join("\n\n");
}
SkillAudit findings
Run an Office document security audit. SkillAudit checks for LibreOffice macro flags, OLE link pre-scanning, macro-enabled format acceptance, and unsafe library usage in MCP document tool handlers. Audit your server →