Topic: mcp server data retention security

MCP server data retention security — stale deleted records, PII in cached tool output, and retention policy enforcement at the server layer

Data retention obligations don't stop at your database. An MCP server that caches tool responses in memory, writes intermediate results to disk, or logs full request payloads can hold onto personal data long after the record was deleted from the authoritative store. The cache becomes a ghost table: the GDPR right to erasure was executed, the database record is gone, but the LLM agent is still reading the stale copy every time the same query is run.

The stale deleted record problem

The most common pattern: an MCP server caches upstream API responses in a Map or a local Redis instance with an aggressive TTL to reduce latency. A user invokes their right to erasure. The deletion propagates to the upstream database. The cache entry remains. For up to the cache TTL — which may be hours or days — any tool call that would have returned the deleted record still returns it, and that data enters the LLM's context window.

// Dangerous: in-process cache with long TTL, no invalidation path
const cache = new Map();

server.tool('getUserProfile', {
  handler: async ({ userId }) => {
    if (cache.has(userId)) {
      return cache.get(userId); // returns deleted user data if erasure hasn't expired TTL
    }
    const profile = await db.getUserById(userId);
    cache.set(userId, profile); // no TTL, no eviction
    return profile;
  }
});

// Safe: cache with configurable TTL + explicit invalidation hook
const cache = new Map(); // { key: { value, expiresAt } }

const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes maximum

function cacheGet(key) {
  const entry = cache.get(key);
  if (!entry || Date.now() > entry.expiresAt) {
    cache.delete(key);
    return null;
  }
  return entry.value;
}

function cacheSet(key, value) {
  cache.set(key, { value, expiresAt: Date.now() + CACHE_TTL_MS });
}

// Expose an invalidation tool for erasure events
server.tool('invalidateUserCache', {
  schema: { userId: { type: 'string' } },
  handler: async ({ userId }) => {
    cache.delete(userId);
    return { invalidated: true };
  }
});

The TTL is a ceiling, not a floor. Regulatory requirements — GDPR Article 17, CCPA right to deletion, HIPAA minimum necessary — mean your cache TTL must be shorter than the time between a deletion event and when you can guarantee the LLM will not access the deleted data again. For most production use cases, that means either very short TTLs (minutes, not hours) or an explicit cache invalidation channel that is triggered as part of the deletion workflow.

Over-retention of PII in cached tool output

The second pattern: a tool returns a rich response containing personal data — name, email, address, health indicator. The server caches the full response object. The retention period for that response is effectively the cache TTL plus however long the server process stays running, not the retention period defined in your data processing agreement. If the server restarts every 24 hours and the cache is in-memory, the de facto retention period is 24 hours regardless of what your policy says.

The fix requires two things: PII classification at the response level, and a cache key design that makes PII-bearing responses evictable by the retention schedule:

// PII classification at the tool response level
const PII_FIELDS = new Set(['email', 'phone', 'address', 'ssn', 'dob', 'ip_address']);
const PII_RETENTION_MS = 15 * 60 * 1000; // 15 minutes for PII-bearing responses

function hasPiiFields(obj, fields = PII_FIELDS) {
  if (typeof obj !== 'object' || !obj) return false;
  return Object.keys(obj).some(k => fields.has(k) || hasPiiFields(obj[k]));
}

function cacheSetWithRetention(key, value) {
  const ttl = hasPiiFields(value) ? PII_RETENTION_MS : CACHE_TTL_MS;
  cache.set(key, { value, expiresAt: Date.now() + ttl });
}

// Scrub PII from log lines — never log full tool output
function sanitizeForLog(obj) {
  const out = {};
  for (const [k, v] of Object.entries(obj)) {
    out[k] = PII_FIELDS.has(k) ? '[REDACTED]' : v;
  }
  return out;
}

PII persistence in log files

Many MCP servers log at DEBUG level for development and forget to tighten logging before deployment. A tool response containing a user's email address, transaction history, or medical record ends up in a log file that is retained on disk for 30, 90, or 365 days by the log rotation policy — far longer than the 15-minute in-memory cache TTL that was carefully designed to comply with retention requirements. The log file is the actual data retention violation.

The pattern to avoid: logger.debug('tool result', JSON.stringify(result)) or console.log(result) in any handler that touches PII-bearing upstream responses. Log result metadata (row count, latency, tool name, user ID hash) not result content.

What SkillAudit checks for data retention issues

Map, Object, or Redis cache usage without TTL enforcement — presence of cache.set(key, value) without an associated expiry
JSON.stringify(result) or console.log(result) in tool handlers that call APIs likely to return PII (user profile endpoints, payment APIs, HR systems)
No cache invalidation path exposed — caches that can only evict by process restart cannot support right-to-erasure compliance
Log retention configuration that sets a retention period longer than the data processing agreement's PII retention window
Disk-backed intermediate state (temp files, SQLite caches) written in tool handlers with no cleanup on process exit

Retention violations rarely appear in a SkillAudit security scan as HIGH severity because they require context about your DPA and retention schedule. They appear as MEDIUM findings in the Credentials and Permissions axes when the caching pattern is clearly unbounded or when PII is demonstrably flowing into log statements. The audit trail compliance post covers GDPR Article 30 and SOC 2 requirements in more detail. Run a free audit to see the caching and logging findings for your MCP server.