Playbook · 2026-04-30
Block 52 of 101 community MCP servers with one CI gate — the 2026 team policy template
If your team adopts a one-line policy — "no MCP server installs below grade C" — you block 52 of the 101 most-installed Model Context Protocol servers in 2026, including 29 vendor-official releases your developers would have waved through on brand alone. This is the policy in one paragraph, the GitHub Action template in 30 lines, and the 12-week rollout calendar your VP-Engineering can hand a security engineer next Monday.
The data behind the headline
Across the 101 most-installed Model Context Protocol servers we could find — vendor-official releases from Cloudflare, Stripe, Heroku, MongoDB, GitHub, AWS, Azure, Auth0, Sentry, and Anthropic itself, plus the indie corpus most teams adopt by name (FastMCP, mcp-installer, Klavis, the LastMile agent stack) — SkillAudit v0.2.1 reports the distribution above: 19 A, 0 B, 30 C, 10 D, 42 F. The full methodology and aggregate findings are in the state-of-MCP-security post; the per-vendor F-grade breakdown names every file path the engine flagged for the 29 vendor-official F's. Both posts are open. Both link directly to the per-repo report cards. Nothing in this playbook is paywalled — your security engineer can verify every grade by clicking the audit links.
Three things to read off the distribution before we get to the policy. First, a min-grade-A gate is not viable in 2026 — only 19 of 101 community MCPs would clear it, and most of the A's are on narrow surface (Pinecone, Redis, Snowflake, Couchbase, Microsoft Playwright, ElevenLabs, Qdrant, Vectara, Meilisearch, ZilliZ Milvus, the LangChain adapters, FireCrawl, Exa, FastAPI-MCP, fetch-mcp, DuckDuckGo). Forcing your developers to pick only from these 19 turns "we have a policy" into "we have a freeze" which turns into shadow installs. Second, there are zero B grades. That is not a bug in the engine; it is a property of the corpus — the rubric currently lands a repo on either A (clean across all six axes) or C (one-axis warning, others clean) or worse. A v0.3 calibration update will likely produce more B's; until then, B-and-up is functionally equivalent to A-and-up. Third, a min-grade-C gate clears 49 of 101 repos for install (19 A + 30 C) — a working choice set that includes most of the database-vector-search and developer-tool MCPs your team actually wants. Start there.
The policy in one paragraph
That is the whole policy. Six sentences. Read it, paste it into your security wiki, and move to the rollout.
Step 1 — Inventory what your team already has installed
Before you can gate new installs you need to know which MCP servers are already in production agents on your team's laptops. The single source of truth is .claude/plugins.lock in the developer's home directory or the team-managed agent profile, with the equivalent in ~/.cursor/extensions/, ~/.windsurf/plugins/, and ~/.config/codex/plugins.json. Most teams in 2026 do not yet inventory these — there is no equivalent of npm ls for MCP plugins. Three approaches that work in week 1:
- Developer self-attestation — a 5-minute Slack survey: "list every MCP server you have installed across every agent, with the GitHub URL or npm name". Goes 70% of the way; misses the ones developers forgot they installed for a hackathon.
- Filesystem walk — a one-line script across all team laptops via your MDM (
find ~ -name 'plugins.lock' -o -name 'mcp_settings.json' 2>/dev/nullcovers Claude Code, Cursor, and Windsurf). Combine the lockfiles. This catches the forgotten installs. - Outbound DNS log review — pull a week of egress connections from your endpoint-protection vendor (CrowdStrike, SentinelOne, Defender) and grep for
npmjs.org,github.com,raw.githubusercontent.com, plus the per-vendor MCP installer endpoints (Cloudflare, Anthropic, etc.). MCPs running stdio that fetch their dependencies show up here. This catches the install-by-curl shadow paths.
Run all three. Cross-reference. Land at a single Google Sheet or Notion table with columns repo URL, installed by, used in agent (which one), last update, SkillAudit grade. The grade column is filled by pasting the GitHub URL into the audit form for any repo not already on the public board.
Step 2 — Pick the threshold (and why C is right for week 1)
The grade distribution above gives you four candidate thresholds.
- Minimum A — clears 19 of 101. Right for hardened-lab and high-regulation contexts (ITAR, HIPAA-PHI agent flows, regulated-finance tooling). Wrong for week 1 of a typical team rollout — too many useful MCPs blocked, drives shadow installs.
- Minimum B — there are zero B grades in the current corpus, so this is currently equivalent to minimum A. Reconsider after the v0.3 calibration ships.
- Minimum C — clears 49 of 101. The right week-1 default. Blocks the F-grade vendor-official releases (Cloudflare, Heroku, Stripe, MongoDB, GitHub, AWS, Azure, Auth0, Sentry, etc.) and the indie F's (FastMCP forks, mcp-installer, Klavis, LastMile mcp-agent, etc.). Lets through the database-vector-search heavy hitters (Redis, Qdrant, Pinecone, Snowflake, ClickHouse, Couchbase, Elastic, Milvus) and the developer tools developers actually need (Playwright, ElevenLabs, FireCrawl, Exa, FastAPI-MCP).
- Minimum D — clears 59 of 101. Worth considering only as a transitional setting if your team has a long tail of D-grade installs already in production and you want to phase F-blocking in first.
Recommended: ship at minimum C in week 1. Tighten to B once the v0.3 calibration update produces a real B-grade band. Allow named exceptions per the policy paragraph above for the D-grade installs that have a fix-or-replace deadline on file.
Step 3 — Wire the CI gate
This is the 30-line GitHub Action. Drop it into .github/workflows/mcp-gate.yml in any repo whose .claude/plugins.lock is committed (any team-managed agent profile typically commits the lockfile to a private team repo for tracking). It runs on every PR that touches the lockfile, calls the SkillAudit public grade endpoint for each new entry, and fails the check if any new entry is below the threshold. There is no API key, no auth, and no billing path — public grades are open.
# .github/workflows/mcp-gate.yml
name: mcp-install-gate
on:
pull_request:
paths:
- '.claude/plugins.lock'
jobs:
gate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 2 }
- name: Diff added MCP entries
id: diff
run: |
git diff origin/main -- .claude/plugins.lock \
| grep -E '^\+\s+"[a-z0-9_-]+/[a-z0-9_.-]+"' \
| sed -E 's/.*"([^"]+)".*/\1/' > added.txt
echo "count=$(wc -l < added.txt)" >> "$GITHUB_OUTPUT"
- name: Check each entry against SkillAudit
if: steps.diff.outputs.count != '0'
env:
MIN_GRADE: 'C' # one of: A, B, C, D
run: |
rank() { case "$1" in A) echo 4;; B) echo 3;; C) echo 2;; D) echo 1;; *) echo 0;; esac; }
MIN=$(rank "$MIN_GRADE"); FAIL=0
while read -r repo; do
slug=$(echo "$repo" | tr '/' '-')
grade=$(curl -fsSL "https://skillaudit.dev/audit-index.json" | jq -r --arg k "$repo" '.[$k].grade // empty')
[ -z "$grade" ] && { echo "::error::$repo not yet audited — submit at https://skillaudit.dev/#audit-req"; FAIL=1; continue; }
if [ "$(rank "$grade")" -lt "$MIN" ]; then
echo "::error::$repo grade=$grade below MIN_GRADE=$MIN_GRADE — see https://skillaudit.dev/audits/$slug/"
FAIL=1
else
echo "$repo grade=$grade OK"
fi
done < added.txt
exit "$FAIL"
Two things worth saying about this gate before you ship it. First, it talks to https://skillaudit.dev/audit-index.json directly — that endpoint is a static JSON file refreshed on every corpus re-scan and served from CDN; there is no rate limit and no auth. Second, the ::error:: output uses GitHub's standard annotation syntax so the failure shows up inline on the PR, not just in the run log — the developer who tried to add an F-grade MCP sees the link to the audit page they should read before they argue with their security engineer about an exception.
Step 4 — Set the re-scan cadence
Grades age. A maintainer who fixes the one SSRF that pulled them down to C lands at A on the re-scan. A maintainer who pushes a bad commit that breaks the SSRF allow-list moves the same repo from A back to F. The policy paragraph specifies 30 days or on plugins.lock change, whichever comes first; here is what that looks like operationally.
- A weekly cron in your team-policy repo runs the same audit lookup against every entry in
plugins.lockand writes a Slack notification for any grade change. (One CI workflow withschedule: cron: '0 9 * * 1'; same script as the gate above, different trigger.) - Any grade drop below C triggers a 14-day window during which the engineer who installed that MCP must (a) re-scan against a newer release if one exists, (b) replace it with a higher-graded MCP that solves the same problem, or (c) document a written exception with re-scan deadline.
- Grade improvements (e.g. C → A on a re-scan after a maintainer fix) are logged but not noisy — the inventory table updates, no Slack ping needed.
Step 5 — Communicate the policy and exception process
The hard part of this is not the technical gate; it is the social agreement. Three things to land before the policy goes live in CI:
- Publish the policy paragraph and the threshold in the team wiki, with a link to the public audit board so anyone affected by the gate can verify the grade themselves rather than treating it as an opaque ruling. The policy text above is provided under the same Creative Commons attribution we use for the rest of the site (link back to this post).
- Define the exception process before the first developer needs one. The policy paragraph lists three exception conditions: named owner, written remediation, re-scan deadline. Hard-code the workflow — a one-page exception template in the wiki, a single Slack channel where requests are filed, a 24-hour-SLA security review by name. This avoids the "exception by attrition" pattern where developers add an unreviewed F-grade MCP, the gate fails, the developer pings the security engineer at 6pm Friday, and the engineer waves it through to unblock the merge.
- Run a 2-week observe-only mode before flipping the gate to fail.
continue-on-error: trueon the gate step plus a dashboard pulling the failed-grade lookups gives you the first install-attempt-to-policy-deny rate for free. If your team is currently installing 2 F-grade MCPs/week (about the median we see in early-design-partner data), the observe-only mode will surface this signal cleanly without the policy framing being seen as a freeze.
The 12-week rollout calendar
For the security engineer the VP-Engineering hands this to next Monday, here is the calendar.
- Week 1. Inventory pass per Step 1. Land a single source-of-truth table. Submit any uncovered MCPs to the audit form for grading.
- Week 2. Pick the threshold. Default is C; document why and where it differs from the data above if your team has different needs (ITAR, regulated-finance, customer-data agent flows).
- Weeks 3–4. Wire the CI gate in observe-only mode. Watch the dashboard. Confirm zero false-positives on already-installed MCPs that pass the threshold.
- Week 5. Publish the policy paragraph in the team wiki. Open the exception channel. Run the security stand-up walkthrough.
- Week 6. Flip the CI gate to fail mode. Track the first 5 PR failures by hand to make sure the failure message reads usefully to the developer.
- Weeks 7–8. Walk every existing D-grade and F-grade install through the policy's 14-day fix-or-replace window. Most of them will land on a higher-graded alternative — the 49 C-and-up MCPs in the public corpus cover the same job-to-be-done as nearly all of the F's.
- Weeks 9–10. Wire the weekly re-scan cron. Subscribe the security stand-up to the grade-change Slack channel.
- Week 11. Embed the SkillAudit badge next to every approved MCP in the team wiki, so the policy is visible inline at the same place developers find install instructions. The badge updates automatically on re-scan.
- Week 12. Retrospective: counts of installs gated, exceptions filed, fixes shipped. If you ran observe-only correctly in weeks 3–4 you have a baseline; a successful rollout shows the F-grade install rate trending toward zero across the same period.
Common gotchas
Vendor-official is not a security signal
The single biggest mistake teams make in week 1 is to allow-list vendor-official releases past the gate. Twenty-nine of the forty-two F's in our corpus are vendor-official; we name every one with the file path. The dev-rel team that wrote the demo MCP for the conference is not the security team that audits the SaaS API. Brand is not the signal you are looking for.
Repo-wide token scopes hide single-tool blast-radius
Several F-grade MCPs (Heroku, Auth0, MongoDB) attach a bearer token to every outbound fetch() call by construction in the API client layer. The blast-radius of a single SSRF in a single tool handler is therefore "the entire token scope" — not just the one record that tool was supposed to read. The policy paragraph's "named owner + written remediation" requirement should trigger a token-scope review for any exception involving an MCP that uses a single shared token; the easiest mitigation is to scope the token down before granting the exception.
Examples and scripts do count when developers copy-paste from them
The honest calibration note in the per-vendor F-grade post distinguishes runtime-tool-surface F's from F's partially driven by scripts/, benchmarks/, samples/, or examples/ findings. For an audit-the-engine-itself view this distinction matters; for a team adopting the MCP, less so. If a developer reads the README, follows the example, and the example contains the SSRF — that's a real install-time path, not a calibration artifact. The v0.3 engine update will weight these less aggressively, but the policy should treat "F driven by examples" the same as "F driven by runtime tool surface" until the maintainer fixes both.
Community MCPs ship without a CHANGELOG more often than not
The maintenance-axis check on the SkillAudit rubric explicitly looks for a CHANGELOG / RELEASE-NOTES / Releases-tab presence; over half of the F-grade community MCPs in our corpus have none. If you cannot tell what changed between the version your team installed and the version the upstream tagged this morning, you cannot run the re-scan-on-version-change cadence Step 4 calls for. Prefer MCPs with explicit version tags and visible release notes over MCPs that ship from main with no release process.
FAQ
Further reading
- We scanned 52 MCP servers — 56% had SSRF, 44% leaked credentials — the methodology and aggregate grade distribution this policy is calibrated against.
- 29 vendor-official MCP servers earned an F — every name, every file path — the per-vendor breakdown of which F-grades your CI gate will block.
- SkillAudit vs Snyk and SkillAudit vs GitHub Code Scanning — for teams already running supply-chain hygiene or SAST and trying to figure out where the install gate fits in the existing CI pipeline.
- SkillAudit vs the Anthropic Skills Directory — for teams already gating installs on Anthropic's editorial allowlist who want to extend the gate to off-directory MCPs.
- The SkillAudit embed badge — drop the per-MCP grade badge into your team wiki next to install instructions so the policy is visible at the point of install.
- The public audit board — every grade in the corpus, every finding linked.
Adopting community MCP servers at your team? Start with the public board.
See every grade → Submit a repo to audit →