Engineering · 2026-04-30
Anatomy of a credential leak — four patterns across 38 of 101 MCP servers
Of the 101 community Model Context Protocol servers we audited in April 2026, 38 emit findings on the credentials axis. The leaks group cleanly into four named patterns: 64 hardcoded secrets in source across 18 repos (the largest single category — 30 OpenAI/Anthropic-style keys, 10 GitHub personal access tokens, 8 Stripe test secrets, 6 AWS access keys, and a handful each of Slack, GitHub OAuth, and bare Anthropic keys), 13 echoes of process.env or os.environ to stdout across 7 repos (the install-blocker — every value the LLM ever set ends up in the conversation log), 1 error message that includes an env-var value at JetBrains/mcp-jetbrains, and 44 .env files committed to the repo tree across 28 repos. The MCP runtime is uniquely brutal on credential leaks because every tool response gets read back into LLM context — anything echoed once becomes part of the model's working memory and travels with the conversation. This post names every pattern with bad-vs-good code shapes pulled directly from the audit reports, and ends with an install-gate rule for buyers and a five-step recovery checklist for maintainers.
Why credentials are the most asymmetric axis on MCP
SkillAudit grades every Model Context Protocol server on six axes — security, permissions, credentials, maintenance, compatibility, and documentation — using the rubric documented on the methodology page. Five of the six catch failure modes that exist in any HTTP server codebase. Credentials are different on MCP, and the difference matters more than people think.
In a normal web app, a console.log(process.env) call writes to stdout. Stdout goes to a logfile. The logfile gets rotated, eventually deleted, and is read by a human only when something is broken. A leaked credential in that logfile is a real incident, but it is contained — the leak surface is the disk and the people who can read it. In an MCP tool handler the picture changes completely. The same console.log call writes to stdout. Stdout is captured by the MCP client (Inspector, Claude Code, Cursor, Windsurf, Codex). The captured text is interleaved into the JSON-RPC response stream. The response stream is read by the LLM as part of the conversation context. The LLM's conversation context is persisted in transcripts, in chat history exports, and in any logging the host application does. Every credential that touches that pipe ends up in the model's working memory and on disk in every system that touches the conversation.
That is the asymmetry: a one-line slip in an MCP server has the propagation surface of a chat platform, not a web server. The blast radius scales with the number of tool calls the model makes, the number of users running the same install, and the number of transcript stores those users sync to. Five of the six axes have failure modes whose blast radius is bounded by the host process. Credentials are the only axis whose failure mode is unbounded by the host process — every leaked value travels with the conversation.
This is why our credentials axis runs four orthogonal static checks instead of one. Hardcoded secrets in source are caught by entropy + provider-prefix matchers (sk-, ghp_, AKIA, xoxb-, etc.). Wholesale env echoes are caught by AST patterns over console.* / print calls whose argument expression includes a process.env / os.environ read. Error-message env interpolation is caught by template-literal scans inside throw / raise statements. .env files in tree are caught by filename scan. Each check fires independently, and a repo that fails any of them gets the credentials-axis cap applied. The four sections that follow walk each pattern with the corpus data.
Pattern 1 — Hardcoded secrets in source
What the engine looks for
The matcher fires on string literals that look like credentials. The provider-prefix list covers OpenAI / Anthropic-style keys (sk-…, sk-ant-…), GitHub personal access tokens (ghp_…), GitHub OAuth tokens (gho_…, ghs_…, ghu_…), AWS access keys (AKIA…), Stripe test secrets (sk_test_…), and Slack tokens (xoxb-…, xoxp-…). Anything matching the shape is flagged HIGH; the surface tier (where the file lives in the tree) determines the deduction weight.
Across the 18 repos that hit on this pattern, the breakdown by secret type is:
The most-hit category — OpenAI/Anthropic-style API keys — is also the most-installed-by-developers category, because every example notebook for any LLM-adjacent product ships with a "paste your key here" placeholder that ends up in git add. The lastmile-ai/mcp-agent repo has 8 such findings spread across examples/basic/mcp_basic_agent/main.py, examples/basic/mcp_tool_filter/mcp_agent.secrets.yaml.example, tests/cli/fixtures/test_secrets_deploy.sh, and tests/utils/test_config_preload.py — a textbook spread of "documentation copy + integration-test fixture + scaffold template." All eight sit in the low-weight examples/ and tests/ tier and add up to a -30 to credentials, leaving the repo at C overall.
The high-impact end of the distribution looks different. mcp-use/mcp-use (9,804 stars) has five HIGH findings in docs/python/client/authentication/bearer.mdx and docs/typescript/client/authentication.mdx — those files render to the public docs site as authentication-flow examples. Every reader of the bearer-token documentation sees a real-looking sk-… string in the example. The .mdx path means the surface tier classifies as production (docs are part of the user-facing surface), so each finding deducts -30 — and the credentials axis crashes from 100 to 0. The repo grades F overall because of these five lines alone.
The four worst single-repo offenders are named in the table below:
| Repo | Stars | Hits | Grade | What's leaking |
|---|---|---|---|---|
| Klavis-AI/klavis | 5,716 | 12 | F | 3 Stripe test secrets in docs/mcp-server/stripe.mdx, 2 GitHub PATs in mcp_servers/README.md, 3 GitHub PATs in github_official/pat_scope_test.go, 2 Slack tokens in mcp_servers/slack/.env.example, 2 process.env echoes |
| mcp-use/mcp-use | 9,804 | 10 | F | 5 OpenAI/Anthropic-style keys in docs/{python,typescript}/client/authentication.mdx, 3 in libraries/python/.env.example, 1 bare Anthropic key, 1 process.env echo |
| lastmile-ai/mcp-agent | 9,108 | 8 | C | 8 OpenAI/Anthropic-style keys spread across examples/ and tests/ — all low-weight tier, totals -30 to credentials |
| awslabs/mcp | 8,858 | 4 | F | 1 production-source AWS key in src/dynamodb-mcp-server/awslabs/dynamodb_mcp_server/model_validation_utils.py:70, 3 in test fixtures |
| getsentry/sentry-mcp | 432 | 5 | F | 2 OpenAI keys in .env.example, 3 in packages/mcp-core/src/telem/sentry.test.ts |
| github/github-mcp-server | 29,213 | 5 | C | 5 GitHub PAT and OAuth tokens in pkg/http/middleware/pat_scope_test.go — all in test tier, low-weight |
The github/github-mcp-server (29,213 stars) case is the cleanest illustration of how surface tiering matters. All five hardcoded GitHub PATs sit in pkg/http/middleware/pat_scope_test.go — the test file for the PAT-scope middleware. The tokens are real-shape (ghp_… + gho_…) test fixtures used by the unit tests; they are not credentials anyone ever uses against a real GitHub account. The engine still flags them HIGH because the entropy + prefix match is positive, but the test-tier surface classification deducts only -5 per finding, so the credentials axis drops to 75 instead of 0. The repo grades C overall on that axis (and C overall on the rubric, capped by the security axis). Compare with awslabs/mcp's production-source AWS key in src/dynamodb-mcp-server/awslabs/dynamodb_mcp_server/model_validation_utils.py:70 — same shape of finding, but the production tier deducts -30 in one shot, and the credentials axis drops to 0.
Bad vs. good code shape
// docs/typescript/client/authentication.mdx
const client = new MCPClient({
apiKey: "sk-ant-api03-AbCd1234EfGh...",
authentication: { type: "bearer" },
});
// docs/typescript/client/authentication.mdx
const client = new MCPClient({
apiKey: process.env.ANTHROPIC_API_KEY,
authentication: { type: "bearer" },
});
// Required: set ANTHROPIC_API_KEY in your env.
The fix is the standard one — read from process.env / os.environ and document the env-var name once. The surprise in the corpus data is how often the bad shape lives in documentation rather than in runtime code. Authors carefully read env vars in src/server.ts and then paste a literal key string into the .mdx example showing how to call their library. Six of the seven repos that emit OpenAI/Anthropic-style HIGH findings have at least one finding in a docs path (docs/, .mdx, or a README.md).
Pattern 2 — process.env / os.environ echoed to logs
What the engine looks for
This is the install-blocker. console.log(process.env) in a tool handler streams the entire process environment into stdout, into the JSON-RPC response, into the LLM's context window, and into every transcript downstream. print(os.environ) in a Python tool handler does the same. The matcher fires on the AST shape — any call expression whose callee is in console.* / print and whose argument expression contains an env-bag read.
Eight console.* findings across 6 repos:
| Repo | Grade | Path | Surface tier |
|---|---|---|---|
| Klavis-AI/klavis | F | mcp_servers/shopify/index.ts:1171 | production |
| Klavis-AI/klavis | F | mcp_servers/woocommerce_toolathlon/src/server.ts:1236 | production |
| PipedreamHQ/mcp | F | app/(chat)/api/chat/route.ts:47 | production |
| honeycombio/honeycomb-mcp | F | eval/scripts/run-eval.ts:623 | production (eval scripts) |
| honeycombio/honeycomb-mcp | F | eval/scripts/run-eval.ts:628 | production (eval scripts) |
| mcp-use/mcp-use | F | libraries/typescript/packages/mcp-use/examples/agent/advanced/observability.ts:28 | examples (low-weight) |
| punkpeye/fastmcp | F | src/examples/session-context.ts:247 | examples (low-weight) |
| stripe/agent-toolkit | C | llm/ai-sdk/provider/examples/openai.ts:51 | examples (low-weight) |
Five print findings, all in one repo (pydantic/pydantic-ai at C):
scripts/verify_bedrock_access.py:18scripts/verify_bedrock_access.py:19scripts/verify_vertex_gcs.py:58scripts/verify_vertex_gcs.py:59scripts/verify_vertex_gcs_all_types.py:71
All five sit in the scripts/ tier so they deduct at low weight (-5 each). The scripts are operator-run verification utilities that print AWS / GCP credentials so an engineer can confirm the right env was loaded; they never run inside the MCP tool surface and so do not bleed into the LLM context. The pattern still gets flagged because the bad shape lives in the repo and could move into the tool surface in any refactor. Pydantic AI grades C on the credentials axis as a direct consequence (and C overall, capped by the maintenance axis at 503 open issues — the repo with the largest open-issue backlog of any C in the corpus).
The three production-source HIGH cases are the install-blockers. Klavis-AI/klavis has two — one inside the Shopify integration's tool handler at mcp_servers/shopify/index.ts:1171 and one inside the WooCommerce integration at mcp_servers/woocommerce_toolathlon/src/server.ts:1236. Both are in MCP-tool-handler code paths. Anyone running the Klavis-AI Shopify tool through Claude or Cursor sees the entire process env land in their conversation log on the first invocation. PipedreamHQ/mcp's app/(chat)/api/chat/route.ts:47 is in a Next.js app-router chat-handler — the Pipedream MCP demo app's server response handler, which gets called on every chat turn. honeycombio/honeycomb-mcp's two findings sit in eval/scripts/run-eval.ts — eval scripts that the engine classifies production-tier because they live outside any of the test/examples/scripts top-level directories.
Bad vs. good code shape
// mcp_servers/shopify/index.ts
server.tool("get_orders", schema, async (args) => {
console.log("env:", process.env); // ← all secrets leak
const res = await fetch(`${SHOPIFY_API}/orders`);
return { content: [{ type: "text", text: await res.text() }] };
});
// mcp_servers/shopify/index.ts
server.tool("get_orders", schema, async (args) => {
console.log("[shopify] get_orders", { args }); // no env
const res = await fetch(`${SHOPIFY_API}/orders`);
return { content: [{ type: "text", text: await res.text() }] };
});
The fix is a one-line change. Never log process.env as a whole — log the specific keys you need by name, and only the ones that are not credentials (NODE_ENV, LOG_LEVEL, etc.). If you are debugging a missing credential, log the env-var name and a boolean for whether it was set, never the value: console.log({ key: "ANTHROPIC_API_KEY", set: !!process.env.ANTHROPIC_API_KEY }). The bytes-on-the-wire saving is enormous and the value of the log line is identical.
Pattern 3 — Error message includes an env-var value
What the engine looks for
The narrowest leak shape, and the easiest to miss in code review. The pattern is a template-literal error string that includes an env-var value in the interpolation — typically the result of a well-meaning "let me show what I tried" diagnostic. The thrown error propagates up the call stack, into the MCP tool handler's catch block (if any), into the JSON-RPC error response, into the LLM context. Even with structured logging and a redaction policy on stdout, an unhandled throw still ships the env value in plain text.
One repo in the corpus emits this finding:
| Repo | Grade | Path | Stars |
|---|---|---|---|
| JetBrains/mcp-jetbrains | F | src/index.ts:103 | 949 |
JetBrains/mcp-jetbrains is the official JetBrains-published MCP server for the JetBrains IDE family — Claude calls it to query and act on an open IntelliJ / WebStorm / PyCharm session. The error path at src/index.ts:103 is in the IDE-discovery flow that runs at server startup. When the server cannot find a connectable IDE on the configured port, it builds the error string by interpolating the env-var value the user set. If the user set the env to anything sensitive (an API token, a path containing credentials, etc.), the value lands in the error message that the LLM reads back as part of the tool's startup-failure log.
The vendor-official label matters here. JetBrains/mcp-jetbrains is the kind of repo a team installs without thinking — "official JetBrains plugin" is a strong install signal — and the error path is reachable on every failed connection, which means it fires more often than a happy-path code path would. The audit page reports it as the only HIGH finding on the credentials axis, deducting -30 to land at 70 on credentials, but the overall grade lands at F because the security axis carries other findings.
Bad vs. good code shape
// src/index.ts
const port = process.env.IDE_PORT;
if (!await connectable(port)) {
throw new Error(
`Cannot connect to IDE on port ${port}` +
` (token=${process.env.IDE_TOKEN})`
);
}
// src/index.ts
const port = process.env.IDE_PORT;
if (!await connectable(port)) {
throw new Error(
"Cannot connect to IDE. " +
"Set IDE_PORT and IDE_TOKEN; verify reachable."
);
}
The reviewer-facing test: every throw / raise with a template literal that pulls from process.env or os.environ is suspect. Either pull the env value into a named local variable that you can audit at one site, or rewrite the message to name the env var without including its value. The most generous interpretation of the bad shape is "the developer wanted to make debugging easy" — the right answer is to log the diagnostic to a per-process logfile (with redaction) and surface a short, value-free message to the caller.
Pattern 4 — .env files committed to the repo tree
What the engine looks for
The widest pattern by repo count. The matcher fires on any committed file matching .env, .env.example, .env.test, .env.local, .env.dist, or any other prefix-matched variant. The engine emits WARN — not HIGH — because the only verifiable difference between "template" and "live secret" is reading every line of every file. Most .env.example files genuinely contain placeholders (YOUR_API_KEY_HERE); most .env files in a public repo are templates that were committed by mistake but happened to contain placeholder values. The risk is the cases where they do not — and those cases have produced enough real credential leaks across the GitHub ecosystem that the WARN is justified.
Selected production-source .env findings (the install-gate WARNs):
| Repo | Grade | Path | Notes |
|---|---|---|---|
| mem0ai/mem0-mcp | C | .env (root) | Bare .env in repo root — no .example suffix; high-risk shape |
| getsentry/sentry-mcp | F | packages/mcp-test-client/.env.test | .env.test in production tree (not under tests/) |
| GoogleCloudPlatform/cloud-run-mcp | C | .env.gcloud-sdk-oauth | Provider-named .env — gcloud OAuth scope |
| korotovsky/slack-mcp-server | C | .env.dist | Non-standard suffix, manual review needed |
| zenml-io/mcp-zenml | F | .env.local.example | Standard placeholder shape, low-risk if verified |
The remaining 39 findings are mostly conventional .env.example files in examples/, tests/, or top-level scripts directories. The engine deducts -0 for those (they sit in low-weight tiers), so the credentials-axis impact is marginal. The reason every one still gets a WARN is that the surface-tier classification is heuristic and the engine cannot tell from the filename alone whether the file contains real credentials or placeholders — and in past large-corpus scans on the open ecosystem, a non-zero fraction of "template" .env files have been found to contain real keys.
The mem0ai/mem0-mcp case is the most-named one in the corpus because the file is bare .env with no suffix at all. Mem0 is also one of the nine archived repos in our maintenance-signal post, which means the file will sit in the tree at that exact shape forever — no future commit will rename it to .env.example or move it under examples/. Buyers running the install-gate against a snapshot of the corpus today get a WARN that will never resolve.
The .env vs. .env.example distinction
The conventional discipline is commit .env.example with placeholders, never commit .env, and add .env to .gitignore. The corpus shows the discipline is followed unevenly. Twenty-six of 28 repos with .env-shaped findings have at least one .env.example path in the list; six have a bare .env or unsuffixed variant. Standardizing on a single shape (.env.example) and gitignoring everything else closes the manual-review surface entirely.
Bad vs. good repo shape
# repo root .env # ← committed; may contain anything .env.test src/server.ts
# repo root .env.example # placeholders only, committed src/server.ts # .gitignore .env .env.local .env.test
The vendor-official asymmetry — same as the maintenance signal
Five of the seven repos that emit a HIGH credentials finding via Pattern 2 (process.env echo) are vendor-official. Klavis-AI markets itself as "open-source MCP servers for the most-used SaaS." PipedreamHQ is a public-facing product company. honeycombio is the official Honeycomb integration. Stripe's agent-toolkit ships under the stripe/ org. Most teams installing those repos read the org name and treat it as a strong trust signal — the same asymmetric install signal we documented in the maintenance-signal post, where eight of the nine archived repos were vendor-official.
The pattern lines up: vendor-official MCP servers ship faster (because they have an internal team and a release schedule), accumulate technical debt faster (because the repo is downstream of a larger product), and get less individual attention to the credential-handling boilerplate (because the credential-handling code is the bottom of the stack and not the part demoed in the launch tweet). The result is the asymmetry we saw in the F-grades post: the most-installed repos in the corpus by raw star count are also the most-likely to be carrying production-source credentials findings. Twenty-three vendor-official MCP servers carry F grades on the rubric; six of those have at least one credentials-axis HIGH finding.
The remediation pattern is the same too. The fix for an archived repo is to push something or hand off ownership; the fix for a credentials-axis HIGH is a one-line edit. Either fix is cheap individually. The reason both go un-fixed is that nobody on the vendor's side reads our audit page until somebody flags it externally — which is exactly why this blog series exists.
Author checklist (5 steps)
- Grep your repo for the four matchers in this post before every release.
grep -rE "(sk-[a-zA-Z0-9_-]{20,}|ghp_|gho_|AKIA[0-9A-Z]{16}|sk_test_|xoxb-|xoxp-)" .catches the hardcoded-secret cases.grep -rnE "(console\.(log|info|warn|error)\([^)]*process\.env|print\([^)]*os\.environ)" .catches the env-echo cases.grep -rnE "throw new Error\([^)]*process\.env|raise.*os\.environ" .catches the error-message echo case.find . -name '.env*' -not -path './node_modules/*'catches the in-tree env files. - Move every literal credential out of
docs/and.mdxfiles. The strongest pattern in the corpus is "real credential in documentation." Replace literals withprocess.env.YOUR_KEY_NAMEand document the env-var name once at the top. - Never log
process.envoros.environas a whole. Log specific named keys you need (and only the non-credential ones) or log a boolean ({ set: !!process.env.X }) instead of the value. If you are debugging a missing credential, the boolean is what you actually want. - Audit every
throw new Error(...)/raisefor env-var interpolation. The narrowest leak path is a template-literal error message. Build a redacted error in a helper, never inline the env value. - Standardize on
.env.examplewith placeholders, gitignore everything else. Delete bare.env,.env.test,.env.localfrom the tree; add them to.gitignore. Manual-review the placeholder shapes in.env.exampleto confirm they are placeholders.
Buyer checklist (5 greps before install)
- Open the repo's audit page on SkillAudit. Search for the URL on /audits; if the credentials axis is below 70, stop and review specifically what is flagged.
- Grep the repo for hardcoded secrets in production paths.
git clonethe repo, then run the fourgrepcommands from the author checklist above. Specifically check for hits outsidetests/andexamples/. - Search the repo's tool handlers for
process.env/os.environechoes.grep -rnE "console\.(log|info|warn|error)\([^)]*process\.env|print\([^)]*os\.environ"insrc/or wherever the tool handlers live. Any hit is an install-blocker — every value of every env var ends up in your transcripts. - Look at the file list for
.env*. If the repo commits a bare.envor.env.testat the root, treat it as manual review..env.exampleis fine if the contents are verified placeholders. - Check the repo's
SECURITY.mdfor a disclosure channel. If your install-gate finds a HIGH credentials finding the engine missed, you need somewhere to send it. Sixty of 101 repos in our corpus lack a SECURITY.md — repos without one have no coordinated patch path.
FAQ
Why grade credential exposure separately from the security axis?
Because the failure-mode is unbounded by the host process in a way no other security axis is. A SSRF or command-injection has a blast radius of one server and one request; a leaked credential travels with the conversation log and becomes part of the LLM's persistent context. The two need different remediation paths and different cap rules. A repo can ship a security-axis A and still be uninstallable if its credentials axis is failing — and the rubric needs to reflect that explicitly.
Why are .env.example files flagged at all if they are templates?
Because the engine cannot reliably distinguish "template" from "live secret" without reading every line of every file. The risk is the small fraction of .env.example files that contain real keys (either left over from copying a working .env or filled in for a demo and forgotten). The WARN ships at low-weight in low-tier surfaces (-0 for examples / tests / scripts), so the credentials-axis impact is usually 0; the finding still appears so a human reviewer can verify.
Why does the engine emit HIGH on tests / examples instead of WARN?
The severity is HIGH because the matcher's confidence (entropy + provider prefix) is high — the secret is real-shape regardless of where it lives. The surface tier separately controls the deduction weight (-5 in tests / examples vs -30 in production). So a test-tier hardcoded GitHub PAT is HIGH and -5; a production-source one is HIGH and -30. This separates the detection's confidence from the rubric's punishment, which is what makes the calibration tractable.
Will my repo be re-scanned automatically when I fix it?
No — scans are on-demand at this stage. Push the fix, paste your repo URL into the form, and the engine reruns and updates your audit page. The 30-day re-scan cadence we recommended in the install-gate playbook is for buyers tracking the corpus, not for authors tracking individual fixes.
Does the engine catch GitLeaks-style entropy without a known prefix?
Currently no — the matcher requires a known provider prefix to fire. Pure-entropy detection produces too many false positives in our corpus (random IDs, hash digests, JWT-like UUIDs). When we add prompt-injection detection at corpus scale, we plan to layer entropy as a secondary signal that must combine with another signal (string passed to a tool response, string in a logged path) to produce a finding. Today the engine prefers a tighter precision/recall trade than GitLeaks does.
What does this analysis miss?
Three known gaps. (1) Credentials in non-tracked files (a real .env in .gitignore with secrets that get sourced at startup is invisible to a static scan; you need a runtime check). (2) Credentials passed in tool arguments — if the LLM is told to pass an API key as a tool argument, the engine cannot infer that's happening from the schema alone. (3) Credentials embedded in container images / CI variables that the source tree does not name. The static credentials axis is necessary but not sufficient; runtime observability and credential-vault tooling cover the rest of the surface.
How does the credentials axis interact with prompt injection?
It compounds. A prompt-injection attack against an MCP tool can extract any credential the tool has read into its variable scope. If the tool reads a credential into a local variable for legitimate use, then echoes anything from a fetched URL into its response, the prompt-injection probe can ask the URL to print the variable. The credentials axis catches the static cases; the prompt-injection probe (gated on ANTHROPIC_API_KEY at scan time) catches the dynamic cases. Both run as part of the same scan when the key is set.
How does the rubric change in 30 days?
Predicted: 1–2 of the eighteen Pattern-1 repos will fix their hardcoded secrets after the post lands (the easiest fix). The seven Pattern-2 (process.env echo) repos are unlikely to move because the bad shape is harder to find and the maintainers are not reading our audit pages yet. The single Pattern-3 repo (JetBrains) will fix it once it gets pinged because JetBrains is a maintained org. The 28 Pattern-4 repos will not move; the WARN is genuinely low-impact and the discipline is uneven across the ecosystem. We will publish the 30-day re-scan delta as a separate post.
Read next
- Anatomy of an A-grade MCP server — the five code patterns the 19 A-grade repos share, including credential-handling.
- Nine of 101 most-installed MCP servers are archived — the maintenance-signal counterpart to this post.
- An install-gate playbook for MCP servers — the team-lead policy companion.
- Twenty-three vendor-official MCP servers with F grades — the same install asymmetry, broken out by vendor.
- State of MCP server security — 2026 — the original aggregate research that started the corpus.
- Methodology — the full rubric, including the per-(axis, surface) deduction matrix and grade-bucket rules.