Playbook · 2026-04-30

Block 52 of 101 community MCP servers with one CI gate — the 2026 team policy template

If your team adopts a one-line policy — "no MCP server installs below grade C" — you block 52 of the 101 most-installed Model Context Protocol servers in 2026, including 29 vendor-official releases your developers would have waved through on brand alone. This is the policy in one paragraph, the GitHub Action template in 30 lines, and the 12-week rollout calendar your VP-Engineering can hand a security engineer next Monday.

TL;DR

The current grade distribution across our public 101-repo MCP corpus: 19 A, 0 B, 30 C, 10 D, 42 F. A min-grade-C install gate blocks 52 of 101 (51.5%). A min-grade-B gate blocks 82 of 101 (81%) — too aggressive at the current calibration; start at C.
The policy is one paragraph: "Any MCP server installed into a team agent must hold a SkillAudit grade of C or higher, scanned within the last 30 days, against the version pinned in plugins.lock. New installs run a CI gate. Exceptions require a named owner and a re-scan deadline."
The CI gate is a 30-line GitHub Action (template below) that diffs .claude/plugins.lock on every PR, fetches the SkillAudit grade for each new entry, and fails the check if any entry is below the configured threshold.
The re-scan cadence is 30 days or on plugins.lock change, whichever comes first. Grades are time-bound — last week's A can be this week's D after a botched maintainer commit.
The four highest-leverage gotchas your security engineer will hit in week 1: vendor-official is not a security signal, repo-wide token scopes hide single-tool blast-radius, examples and scripts in the same repo do count when developers copy-paste from them, and community MCPs ship without a CHANGELOG more often than not.
This post is the team-lead playbook. The matching author-side playbook (how to fix the SSRFs / cred-echoes / exec primitives the engine flags so your repo earns a C) is in 29 vendor-official MCP servers earned an F — every name, every file path.

The data behind the headline

Across the 101 most-installed Model Context Protocol servers we could find — vendor-official releases from Cloudflare, Stripe, Heroku, MongoDB, GitHub, AWS, Azure, Auth0, Sentry, and Anthropic itself, plus the indie corpus most teams adopt by name (FastMCP, mcp-installer, Klavis, the LastMile agent stack) — SkillAudit v0.2.1 reports the distribution above: 19 A, 0 B, 30 C, 10 D, 42 F. The full methodology and aggregate findings are in the state-of-MCP-security post; the per-vendor F-grade breakdown names every file path the engine flagged for the 29 vendor-official F's. Both posts are open. Both link directly to the per-repo report cards. Nothing in this playbook is paywalled — your security engineer can verify every grade by clicking the audit links.

Three things to read off the distribution before we get to the policy. First, a min-grade-A gate is not viable in 2026 — only 19 of 101 community MCPs would clear it, and most of the A's are on narrow surface (Pinecone, Redis, Snowflake, Couchbase, Microsoft Playwright, ElevenLabs, Qdrant, Vectara, Meilisearch, ZilliZ Milvus, the LangChain adapters, FireCrawl, Exa, FastAPI-MCP, fetch-mcp, DuckDuckGo). Forcing your developers to pick only from these 19 turns "we have a policy" into "we have a freeze" which turns into shadow installs. Second, there are zero B grades. That is not a bug in the engine; it is a property of the corpus — the rubric currently lands a repo on either A (clean across all six axes) or C (one-axis warning, others clean) or worse. A v0.3 calibration update will likely produce more B's; until then, B-and-up is functionally equivalent to A-and-up. Third, a min-grade-C gate clears 49 of 101 repos for install (19 A + 30 C) — a working choice set that includes most of the database-vector-search and developer-tool MCPs your team actually wants. Start there.

The policy in one paragraph

Any Model Context Protocol server installed into a team-managed Claude Code, Cursor, Windsurf, or Codex agent must hold a current SkillAudit grade of C or higher, scanned within the last 30 days, against the version pinned in .claude/plugins.lock or equivalent. New installs are gated by a CI check on the lockfile. Existing installs are scanned weekly; any drop below C triggers a 14-day fix-or-replace window. Exceptions are allowed for grades D and F provided (a) a named engineering owner is on file, (b) a written remediation plan is in the team wiki, and (c) the re-scan deadline is no further than 30 days out. Grade F installs are blocked outright unless the owner acknowledges in writing that the MCP runs only against attacker-trusted input. Below-grade installs are reviewed at the monthly security stand-up.

That is the whole policy. Six sentences. Read it, paste it into your security wiki, and move to the rollout.

Step 1 — Inventory what your team already has installed

Before you can gate new installs you need to know which MCP servers are already in production agents on your team's laptops. The single source of truth is .claude/plugins.lock in the developer's home directory or the team-managed agent profile, with the equivalent in ~/.cursor/extensions/, ~/.windsurf/plugins/, and ~/.config/codex/plugins.json. Most teams in 2026 do not yet inventory these — there is no equivalent of npm ls for MCP plugins. Three approaches that work in week 1:

Developer self-attestation — a 5-minute Slack survey: "list every MCP server you have installed across every agent, with the GitHub URL or npm name". Goes 70% of the way; misses the ones developers forgot they installed for a hackathon.
Filesystem walk — a one-line script across all team laptops via your MDM (find ~ -name 'plugins.lock' -o -name 'mcp_settings.json' 2>/dev/null covers Claude Code, Cursor, and Windsurf). Combine the lockfiles. This catches the forgotten installs.
Outbound DNS log review — pull a week of egress connections from your endpoint-protection vendor (CrowdStrike, SentinelOne, Defender) and grep for npmjs.org, github.com, raw.githubusercontent.com, plus the per-vendor MCP installer endpoints (Cloudflare, Anthropic, etc.). MCPs running stdio that fetch their dependencies show up here. This catches the install-by-curl shadow paths.

Run all three. Cross-reference. Land at a single Google Sheet or Notion table with columns repo URL, installed by, used in agent (which one), last update, SkillAudit grade. The grade column is filled by pasting the GitHub URL into the audit form for any repo not already on the public board.

Step 2 — Pick the threshold (and why C is right for week 1)

The grade distribution above gives you four candidate thresholds.

Minimum A — clears 19 of 101. Right for hardened-lab and high-regulation contexts (ITAR, HIPAA-PHI agent flows, regulated-finance tooling). Wrong for week 1 of a typical team rollout — too many useful MCPs blocked, drives shadow installs.
Minimum B — there are zero B grades in the current corpus, so this is currently equivalent to minimum A. Reconsider after the v0.3 calibration ships.
Minimum C — clears 49 of 101. The right week-1 default. Blocks the F-grade vendor-official releases (Cloudflare, Heroku, Stripe, MongoDB, GitHub, AWS, Azure, Auth0, Sentry, etc.) and the indie F's (FastMCP forks, mcp-installer, Klavis, LastMile mcp-agent, etc.). Lets through the database-vector-search heavy hitters (Redis, Qdrant, Pinecone, Snowflake, ClickHouse, Couchbase, Elastic, Milvus) and the developer tools developers actually need (Playwright, ElevenLabs, FireCrawl, Exa, FastAPI-MCP).
Minimum D — clears 59 of 101. Worth considering only as a transitional setting if your team has a long tail of D-grade installs already in production and you want to phase F-blocking in first.

Recommended: ship at minimum C in week 1. Tighten to B once the v0.3 calibration update produces a real B-grade band. Allow named exceptions per the policy paragraph above for the D-grade installs that have a fix-or-replace deadline on file.

Step 3 — Wire the CI gate

This is the 30-line GitHub Action. Drop it into .github/workflows/mcp-gate.yml in any repo whose .claude/plugins.lock is committed (any team-managed agent profile typically commits the lockfile to a private team repo for tracking). It runs on every PR that touches the lockfile, calls the SkillAudit public grade endpoint for each new entry, and fails the check if any new entry is below the threshold. There is no API key, no auth, and no billing path — public grades are open.

# .github/workflows/mcp-gate.yml
name: mcp-install-gate
on:
  pull_request:
    paths:
      - '.claude/plugins.lock'
jobs:
  gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 2 }
      - name: Diff added MCP entries
        id: diff
        run: |
          git diff origin/main -- .claude/plugins.lock \
            | grep -E '^\+\s+"[a-z0-9_-]+/[a-z0-9_.-]+"' \
            | sed -E 's/.*"([^"]+)".*/\1/' > added.txt
          echo "count=$(wc -l < added.txt)" >> "$GITHUB_OUTPUT"
      - name: Check each entry against SkillAudit
        if: steps.diff.outputs.count != '0'
        env:
          MIN_GRADE: 'C'  # one of: A, B, C, D
        run: |
          rank() { case "$1" in A) echo 4;; B) echo 3;; C) echo 2;; D) echo 1;; *) echo 0;; esac; }
          MIN=$(rank "$MIN_GRADE"); FAIL=0
          while read -r repo; do
            slug=$(echo "$repo" | tr '/' '-')
            grade=$(curl -fsSL "https://skillaudit.dev/audit-index.json" | jq -r --arg k "$repo" '.[$k].grade // empty')
            [ -z "$grade" ] && { echo "::error::$repo not yet audited — submit at https://skillaudit.dev/#audit-req"; FAIL=1; continue; }
            if [ "$(rank "$grade")" -lt "$MIN" ]; then
              echo "::error::$repo grade=$grade below MIN_GRADE=$MIN_GRADE — see https://skillaudit.dev/audits/$slug/"
              FAIL=1
            else
              echo "$repo grade=$grade OK"
            fi
          done < added.txt
          exit "$FAIL"

Two things worth saying about this gate before you ship it. First, it talks to https://skillaudit.dev/audit-index.json directly — that endpoint is a static JSON file refreshed on every corpus re-scan and served from CDN; there is no rate limit and no auth. Second, the ::error:: output uses GitHub's standard annotation syntax so the failure shows up inline on the PR, not just in the run log — the developer who tried to add an F-grade MCP sees the link to the audit page they should read before they argue with their security engineer about an exception.

Step 4 — Set the re-scan cadence

Grades age. A maintainer who fixes the one SSRF that pulled them down to C lands at A on the re-scan. A maintainer who pushes a bad commit that breaks the SSRF allow-list moves the same repo from A back to F. The policy paragraph specifies 30 days or on plugins.lock change, whichever comes first; here is what that looks like operationally.

A weekly cron in your team-policy repo runs the same audit lookup against every entry in plugins.lock and writes a Slack notification for any grade change. (One CI workflow with schedule: cron: '0 9 * * 1'; same script as the gate above, different trigger.)
Any grade drop below C triggers a 14-day window during which the engineer who installed that MCP must (a) re-scan against a newer release if one exists, (b) replace it with a higher-graded MCP that solves the same problem, or (c) document a written exception with re-scan deadline.
Grade improvements (e.g. C → A on a re-scan after a maintainer fix) are logged but not noisy — the inventory table updates, no Slack ping needed.

Step 5 — Communicate the policy and exception process

The hard part of this is not the technical gate; it is the social agreement. Three things to land before the policy goes live in CI:

Publish the policy paragraph and the threshold in the team wiki, with a link to the public audit board so anyone affected by the gate can verify the grade themselves rather than treating it as an opaque ruling. The policy text above is provided under the same Creative Commons attribution we use for the rest of the site (link back to this post).
Define the exception process before the first developer needs one. The policy paragraph lists three exception conditions: named owner, written remediation, re-scan deadline. Hard-code the workflow — a one-page exception template in the wiki, a single Slack channel where requests are filed, a 24-hour-SLA security review by name. This avoids the "exception by attrition" pattern where developers add an unreviewed F-grade MCP, the gate fails, the developer pings the security engineer at 6pm Friday, and the engineer waves it through to unblock the merge.
Run a 2-week observe-only mode before flipping the gate to fail. continue-on-error: true on the gate step plus a dashboard pulling the failed-grade lookups gives you the first install-attempt-to-policy-deny rate for free. If your team is currently installing 2 F-grade MCPs/week (about the median we see in early-design-partner data), the observe-only mode will surface this signal cleanly without the policy framing being seen as a freeze.

The 12-week rollout calendar

For the security engineer the VP-Engineering hands this to next Monday, here is the calendar.

Week 1. Inventory pass per Step 1. Land a single source-of-truth table. Submit any uncovered MCPs to the audit form for grading.
Week 2. Pick the threshold. Default is C; document why and where it differs from the data above if your team has different needs (ITAR, regulated-finance, customer-data agent flows).
Weeks 3–4. Wire the CI gate in observe-only mode. Watch the dashboard. Confirm zero false-positives on already-installed MCPs that pass the threshold.
Week 5. Publish the policy paragraph in the team wiki. Open the exception channel. Run the security stand-up walkthrough.
Week 6. Flip the CI gate to fail mode. Track the first 5 PR failures by hand to make sure the failure message reads usefully to the developer.
Weeks 7–8. Walk every existing D-grade and F-grade install through the policy's 14-day fix-or-replace window. Most of them will land on a higher-graded alternative — the 49 C-and-up MCPs in the public corpus cover the same job-to-be-done as nearly all of the F's.
Weeks 9–10. Wire the weekly re-scan cron. Subscribe the security stand-up to the grade-change Slack channel.
Week 11. Embed the SkillAudit badge next to every approved MCP in the team wiki, so the policy is visible inline at the same place developers find install instructions. The badge updates automatically on re-scan.
Week 12. Retrospective: counts of installs gated, exceptions filed, fixes shipped. If you ran observe-only correctly in weeks 3–4 you have a baseline; a successful rollout shows the F-grade install rate trending toward zero across the same period.

Common gotchas

Vendor-official is not a security signal

The single biggest mistake teams make in week 1 is to allow-list vendor-official releases past the gate. Twenty-nine of the forty-two F's in our corpus are vendor-official; we name every one with the file path. The dev-rel team that wrote the demo MCP for the conference is not the security team that audits the SaaS API. Brand is not the signal you are looking for.

Repo-wide token scopes hide single-tool blast-radius

Several F-grade MCPs (Heroku, Auth0, MongoDB) attach a bearer token to every outbound fetch() call by construction in the API client layer. The blast-radius of a single SSRF in a single tool handler is therefore "the entire token scope" — not just the one record that tool was supposed to read. The policy paragraph's "named owner + written remediation" requirement should trigger a token-scope review for any exception involving an MCP that uses a single shared token; the easiest mitigation is to scope the token down before granting the exception.

Examples and scripts do count when developers copy-paste from them

The honest calibration note in the per-vendor F-grade post distinguishes runtime-tool-surface F's from F's partially driven by scripts/, benchmarks/, samples/, or examples/ findings. For an audit-the-engine-itself view this distinction matters; for a team adopting the MCP, less so. If a developer reads the README, follows the example, and the example contains the SSRF — that's a real install-time path, not a calibration artifact. The v0.3 engine update will weight these less aggressively, but the policy should treat "F driven by examples" the same as "F driven by runtime tool surface" until the maintainer fixes both.

Community MCPs ship without a CHANGELOG more often than not

The maintenance-axis check on the SkillAudit rubric explicitly looks for a CHANGELOG / RELEASE-NOTES / Releases-tab presence; over half of the F-grade community MCPs in our corpus have none. If you cannot tell what changed between the version your team installed and the version the upstream tagged this morning, you cannot run the re-scan-on-version-change cadence Step 4 calls for. Prefer MCPs with explicit version tags and visible release notes over MCPs that ship from main with no release process.

FAQ

Why C and not B?

Because there are zero B grades in the current corpus — the rubric currently lands repos on A or C-or-worse, with no middle band. The v0.3 calibration update will likely produce a real B band by subdividing static findings into runtime-tool-surface vs scripts/examples buckets. Until then, B-and-up is functionally equivalent to A-and-up — too tight for a typical team's week-1 rollout.

What if my team needs an MCP that has not yet been audited?

The CI gate's failure message points to the audit form. Submit the GitHub URL; an audit takes about 60 seconds and the result is public on the board the moment it lands. If you need it audited faster than the public queue, file an exception with a 7-day re-scan deadline and unblock the install on the team policy's exception path.

Does this work for Claude Code skills, or only MCP servers?

Both. The audit board grades Claude skills (the .claude/skills/ ecosystem) on the same six-axis rubric as MCP servers. The CI gate template above checks .claude/plugins.lock which on Claude Code includes both skills and MCPs; the lookup against audit-index.json resolves either entry shape transparently. Cursor extensions and Windsurf plugins are graded on the four axes that translate (security, credentials, maintenance, docs); permissions and client-compatibility do not apply outside of MCP-shaped surfaces.

How do we know SkillAudit's grade is right?

Every grade on the public board links to the underlying findings — file path, line number, finding shape — so a security engineer reading a grade can verify it against the source themselves. Findings we get wrong, mail hello@skillaudit.dev with the file path; we re-scan within 24 hours. The compare hub has honest side-by-side framing against Snyk, Dependabot, Socket, OSV-Scanner, npm audit, MCP Inspector, the Anthropic Skills Directory, StackHawk, and GitHub Code Scanning for teams already running one of those.

What if my MCP gets a D and we cannot replace it?

The exception path. File the named owner, the written remediation, and the re-scan deadline. Most D-grade community MCPs become C or A within a maintainer commit or two — the engine reports specific file paths and line numbers, so a remediation PR is usually 20-50 lines of code. If you are willing to mail the maintainer a one-page audit summary, our experience is most fix-and-re-scan loops close inside 14 days.

Does the gate work for private repo MCPs?

Public grades are open. For private repo MCPs, the audit runs against a single-repo OAuth scope (never org-wide) on the SkillAudit Pro tier, and the same audit-index.json lookup pattern works against an authenticated endpoint. The CI template above is written for the public path because most teams hit that case first; the private-repo extension is a five-line edit to add the auth header.

How do you handle MCPs that are still on grade A but the corpus expanded since?

Nothing happens — A grades are stable across corpus expansion. What changes with corpus expansion is the denominator in posts like this one (101 today, more tomorrow), not the per-repo grade. If you want to track corpus drift, the aggregate state-of-MCP post carries the running grade-distribution numbers; the policy paragraph references the threshold, not the corpus.