Security reference · gRPC · Streaming

MCP server gRPC streaming security

gRPC streaming MCP servers — server streaming, client streaming, and bidirectional streaming — can't rely on the connection-time authentication that works for unary RPC. A JWT verified at stream open may expire mid-stream. Deadline values set on the initial call may not propagate to downstream services. And bidirectional streams require authorization checks on messages flowing in both directions. This reference covers per-message authorization patterns, deadline propagation, and stream cancellation vulnerabilities with code examples for @grpc/grpc-js.

Why streaming RPC breaks the unary auth model

In unary gRPC, the auth model is simple: validate the authorization metadata on each call, attach the claims to the call context, check claims before processing. Each call is independent — a revoked token is caught on the next call.

In server streaming, the client sends one request and the server sends back a sequence of responses over a single HTTP/2 stream that may stay open for minutes or hours. The metadata (including the bearer token) is only available at stream open time. If the token expires while the stream is open, the server has no automatic mechanism to detect this — it will continue sending responses to an identity that may no longer be authorized.

In bidirectional streaming, both the client and server send sequences of messages over the same stream. The server must authorize not just the stream itself but the content of each incoming message, since message content may encode different operation types or access different resources.

Auth drift window: A server streaming RPC that holds a stream open for 60 minutes with a JWT valid for 60 minutes will accept the entire session if validated only at open time. If the user is revoked at minute 5, they continue receiving data for 55 minutes. Continuous authorization enforcement is the only fix.

Per-message authorization on server streaming RPC

import grpc from '@grpc/grpc-js';
import jwt from 'jsonwebtoken';

// Periodic token expiry check on streaming RPC handlers
function createStreamingHandler(serviceImpl) {
  return function(call) {
    // Extract and validate token at stream open
    const metadata = call.metadata.get('authorization');
    if (!metadata.length) {
      call.destroy(new Error('UNAUTHENTICATED'));
      return;
    }

    let claims;
    try {
      claims = jwt.verify(metadata[0].replace('Bearer ', ''), process.env.JWT_SECRET);
    } catch {
      call.destroy(new Error('UNAUTHENTICATED'));
      return;
    }

    // Periodic expiry check — every 30 seconds for long-lived streams
    const expiryCheckInterval = setInterval(() => {
      const now = Math.floor(Date.now() / 1000);
      if (claims.exp && claims.exp < now) {
        clearInterval(expiryCheckInterval);
        call.destroy({ code: grpc.status.UNAUTHENTICATED, message: 'token_expired' });
        return;
      }
      // Also check revocation list
      if (isRevoked(claims.jti)) {
        clearInterval(expiryCheckInterval);
        call.destroy({ code: grpc.status.PERMISSION_DENIED, message: 'token_revoked' });
      }
    }, 30_000);

    call.on('end', () => clearInterval(expiryCheckInterval));
    call.on('error', () => clearInterval(expiryCheckInterval));

    // Delegate to service implementation with verified claims
    serviceImpl(call, claims);
  };
}

Bidirectional stream message-level authorization

In bidirectional streaming, each incoming message may request a different resource or operation. Authorization must be checked per-message, not just at stream open. A bidirectional stream handler that validates the JWT once and then accepts all subsequent messages has a time window where a stolen stream can be used to access additional resources the initial auth didn't cover.

// Bidirectional streaming handler with per-message scope checking
server.addService(ToolStreamService, {
  executeTool: createStreamingHandler(function(call, claims) {
    call.on('data', async (request) => {
      // Authorization check on every incoming message
      const requiredScope = getRequiredScope(request.toolName);
      if (!claims.scopes || !claims.scopes.includes(requiredScope)) {
        // Cannot reject a single message in bidirectional streaming —
        // must terminate the stream or send an error message
        call.write({
          type: 'error',
          code: 'PERMISSION_DENIED',
          message: 'insufficient_scope for ' + request.toolName,
        });
        return; // skip processing this message
      }

      // Resource ownership check
      if (request.resourceId && !userOwnsResource(claims.sub, request.resourceId)) {
        call.write({ type: 'error', code: 'PERMISSION_DENIED', message: 'resource_ownership_check_failed' });
        return;
      }

      // Process the tool call and write response
      const result = await executeToolHandler(request.toolName, request.args);
      call.write({ type: 'result', toolName: request.toolName, result });
    });

    call.on('end', () => call.end());
  }),
});

Deadline propagation in gRPC streaming

gRPC deadlines (not to be confused with timeouts) represent an absolute time at which the entire RPC must complete. Deadlines propagate through the call chain — if a client sets a 10-second deadline on a streaming call, that deadline should flow to every downstream service the streaming handler calls.

The common vulnerability is failing to propagate the deadline to outbound gRPC or HTTP calls made inside the streaming handler. If a streaming handler makes a database query or calls another service without propagating the deadline, those operations can run past the client's deadline and consume resources for a cancelled call.

// WRONG — outbound call uses a fixed timeout, not the propagated deadline
call.on('data', async (request) => {
  const result = await downstreamClient.getData(request.id, { deadline: Date.now() + 5000 });
  call.write(result);
});

// RIGHT — extract the call's deadline and propagate it
function getCallDeadline(call) {
  // grpc-js exposes the deadline on the call object
  return call.getDeadline(); // returns absolute Date or Infinity
}

call.on('data', async (request) => {
  const deadline = getCallDeadline(call);

  // Check if deadline already passed before making outbound call
  if (deadline !== Infinity && deadline <= new Date()) {
    call.destroy({ code: grpc.status.DEADLINE_EXCEEDED, message: 'client_deadline_already_passed' });
    return;
  }

  // Propagate deadline to downstream call
  const result = await downstreamClient.getData(
    request.id,
    { deadline: deadline === Infinity ? undefined : deadline }
  );
  call.write(result);
});

Stream cancellation and resource cleanup

When a gRPC client cancels a stream (network drop, client-side timeout, or explicit cancel), the server receives a cancelled event. If the streaming handler has in-progress work — database queries, file reads, downstream API calls — those must be cancelled or aborted. Failing to handle stream cancellation leads to resource leaks and, in LLM agent workloads where reconnect cycling is common, resource exhaustion.

server.addService(ToolStreamService, {
  streamTools: function(call) {
    let cancelled = false;
    const abortController = new AbortController();

    call.on('cancelled', () => {
      cancelled = true;
      abortController.abort(); // propagate cancellation to fetch/DB calls
    });

    call.on('data', async (request) => {
      if (cancelled) return; // discard messages received before cancel propagated

      try {
        // Pass AbortSignal to async operations so they can be cancelled
        const result = await fetch(request.url, { signal: abortController.signal });
        if (cancelled) return; // check again after await

        call.write({ result: await result.text() });
      } catch (err) {
        if (err.name === 'AbortError') return; // stream was cancelled
        call.write({ error: err.message });
      }
    });

    call.on('end', () => {
      if (!cancelled) call.end();
    });
  },
});

Per-stream rate limiting and reconnect bypass

Rate limiting applied per-stream is bypassed by stream reconnects. An LLM agent that reconnects after hitting a stream-level rate limit gets a fresh limit. Rate limiting for gRPC streaming must be applied at the identity level, tracked across streams, with the same reconnect-aware patterns as WebSocket per-identity rate limiting.

// WRONG — per-stream rate limit is bypassed by reconnecting
const streamMessageCount = new Map(); // keyed by stream object — not persistent across reconnects

// RIGHT — per-identity rate limit tracked across streams
const identityMessageCount = new Map(); // keyed by userId — persistent across reconnects

function streamingRateLimitInterceptor(call, claims) {
  const userId = claims.sub;
  const windowKey = userId + ':' + Math.floor(Date.now() / 60000); // 1-minute window

  call.on('data', () => {
    const count = (identityMessageCount.get(windowKey) || 0) + 1;
    identityMessageCount.set(windowKey, count);

    if (count > 100) { // 100 messages per minute per identity
      call.destroy({ code: grpc.status.RESOURCE_EXHAUSTED, message: 'rate_limit_exceeded' });
    }
  });
}

SkillAudit findings: gRPC streaming

Critical Token validated only at stream open — no periodic expiry or revocation check during long-lived streaming RPC. Score penalty: −20 points.

High Bidirectional stream handler checks authorization once per stream, not per message — scope escalation by sending unauthorized tool names after initial auth. Score penalty: −12 points.

High Deadline not propagated to downstream calls — resource usage continues after client cancellation. Score penalty: −8 points.

High Stream cancellation event not handled — async operations continue after stream cancel, resource leaks. Score penalty: −8 points.

Medium Rate limit applied per-stream — bypassed by reconnect cycling. Score penalty: −6 points.

Medium No maximum stream lifetime — streams accumulate over time without expiry. Score penalty: −4 points.

Run a SkillAudit scan on your gRPC MCP server to detect auth drift, missing deadline propagation, and stream cancellation gaps automatically.