Security·Privacy·GDPR

MCP server data minimization: GDPR compliance and minimal response design

The GDPR's data minimization principle (Article 5(1)(c)) requires that personal data be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed." MCP server tool responses that return full database rows — including fields the LLM agent never uses — violate this principle and create unnecessary exposure surface. Every piece of PII that enters the LLM's context window is data that can be logged, accidentally included in a response, or exfiltrated via a prompt injection attack.

The excessive data exposure problem

Consider a get_user tool that returns a full user record:

// Problematic: returns the entire User row
server.tool("get_user", GetUserSchema, async ({ userId }) => {
  const user = await db.users.findById(userId);
  return { content: [{ type: "text", text: JSON.stringify(user) }] };
});

// What the LLM agent gets — including fields it never asked for:
{
  "id": "user-123",
  "name": "Jane Smith",
  "email": "jane@example.com",
  "phone": "+1-555-0123",
  "date_of_birth": "1985-03-14",
  "address": "123 Main St, Springfield, IL 62701",
  "ssn_last4": "6789",
  "payment_method_token": "tok_visa_xxx",
  "internal_notes": "High-value customer, escalation contact: mgr@example.com",
  "hashed_password": "$2b$12$...",
  "password_reset_token": "abc123def456",
  "api_key": "sk-live-xxxxxxxxxxxx"
}

The LLM agent was asked "what's Jane's email address?" It received a full user record including a password reset token, an API key, and a hashed password. If the agent logs its context, if a prompt injection attack exfiltrates conversation history, or if the agent accidentally quotes the full record in its response, all of this data is exposed.

Implementing field-level projection

Each tool should return only the fields its callers actually need. Define a projection for each tool independently:

// src/tools/get-user.ts
import { z } from "zod";

const GetUserSchema = z.object({
  userId: z.string().uuid(),
  fields: z.array(z.enum(["name", "email", "plan", "created_at"])).optional(),
});

// Allowed fields per caller role
const ALLOWED_FIELDS = {
  agent: ["name", "email", "plan", "created_at"] as const,
  admin: ["name", "email", "phone", "plan", "created_at", "org"] as const,
} as const;

type CallerRole = keyof typeof ALLOWED_FIELDS;

export async function getUserHandler(
  args: z.infer<typeof GetUserSchema>,
  ctx: { callerId: string; role: CallerRole }
) {
  const user = await db.users.findById(args.userId);
  if (!user) return { isError: true, content: [{ type: "text", text: "Not found" }] };

  // Apply role-based field allowlist
  const allowedFields = ALLOWED_FIELDS[ctx.role];
  const requestedFields = args.fields ?? allowedFields;
  const permittedFields = requestedFields.filter(f =>
    (allowedFields as readonly string[]).includes(f)
  );

  // Project to only permitted fields
  const projected = Object.fromEntries(
    permittedFields.map(f => [f, user[f]])
  );

  return { content: [{ type: "text", text: JSON.stringify(projected) }] };
}

The key design choices: the default field set (when the caller doesn't specify fields) is the most restrictive permitted set, not the full row. Callers must explicitly request fields, and requests are validated against an allowlist per role. Internal fields (hashed_password, api_key, password_reset_token) are never in any allowlist.

PII masking patterns

For fields that are sometimes needed but contain high-sensitivity data, use masking rather than full suppression:

// src/utils/pii-mask.ts
export const piiMask = {
  email: (v: string) => {
    const [local, domain] = v.split("@");
    return `${local[0]}***@${domain}`;
  },
  phone: (v: string) => v.replace(/\d(?=\d{4})/g, "*"),
  ssn: (v: string) => `***-**-${v.slice(-4)}`,
  creditCard: (v: string) => `****-****-****-${v.slice(-4)}`,
  apiKey: (v: string) => `${v.slice(0, 8)}...${v.slice(-4)}`,
};

// Usage in a tool response:
const safeUser = {
  name: user.name,
  email: piiMask.email(user.email),        // "j***@example.com"
  phone: piiMask.phone(user.phone),         // "***-***-0123"
};

Access logging for GDPR data subject requests

GDPR Article 30 requires records of processing activities. When your MCP tool accesses personal data, log enough to answer a data subject access request (DSAR) — who accessed what data, when, and for what purpose:

// src/middleware/pii-access-log.ts
interface PiiAccessEvent {
  timestamp: string;
  callerId: string;
  tool: string;
  subjectId: string;         // the user whose data was accessed
  fieldsAccessed: string[];
  purpose: string;           // from the tool's declared purpose field
}

export function logPiiAccess(event: PiiAccessEvent) {
  // Write to a separate, immutable audit log (not the same log as operational events)
  auditLogger.info("pii_access", {
    ...event,
    // Never log the actual values — only which fields were accessed
  });
}

The critical constraint: log which fields were accessed, not the field values. A log line saying "callerId X accessed fields [name, email] for subjectId Y" is sufficient for GDPR compliance. A log line containing the actual email address is another PII exposure point.

Data minimization compliance matrix

Tool pattern GDPR risk Fix
JSON.stringify(dbRow) — full row returned CRITICAL — all fields including PII, tokens, internal notes exposed to LLM context Field projection with per-role allowlist
Credentials/tokens in response fields HIGH — API keys and tokens in LLM context window Never include in any allowlist; strip at DB query level
Email/phone returned unmasked when only display name needed MEDIUM — unnecessary PII in context PII masking or field exclusion
No PII access logging MEDIUM — cannot answer DSARs or audit who accessed what Structured PII access audit log
No purpose declaration on tool LOW — processing without documented lawful basis Add purpose field to tool metadata

SkillAudit findings for data minimization

HIGHTool returns raw database row including credential fields (password hash, API key, reset token) — tokens exposed to LLM context window
HIGHTool response includes PII fields beyond what the tool's declared purpose requires (email, phone, SSN in a tool that displays a user's name)
MEDIUMNo field projection in any tool handler — all queries return full row regardless of caller needs
MEDIUMNo PII access logging — cannot comply with GDPR Article 30 processing records requirement or answer DSARs
LOWTool schema does not declare purpose or data categories — no documented basis for processing under GDPR Article 6

Run a free SkillAudit to see how your MCP server scores on the Excessive Data Exposure and Credential Exposure axes — two of the six dimensions in every SkillAudit report.