Published 4 June 2026 · Blog

SkillAudit grade drift: how MCP server security scores change over time

A SkillAudit score is not a permanent seal of approval. A server that earned a B on launch day can drift to a D in three months without a single line of code changing. New CVEs arrive for its dependencies. Its last commit date slides further into the past. Its advisory feed accumulates unanswered notices. The badge stays green until you rescan. This article looks at how grades change across the six scoring axes, which categories regress fastest, which ones improve after authors read their first report, and what a CI gate needs to catch drift before it reaches production.

Most MCP server authors scan once — usually when submitting to a directory or responding to a reviewer's request — and treat the result as done. That mental model made sense when security findings were purely about the static code in the repository. It no longer makes sense when half of an MCP server's security posture is dynamic: it depends on whether maintainers are still active, whether a transitive dependency published a CVE three weeks ago, and whether the advisory feed for the server's npm package has accumulated warnings the author has not addressed.

SkillAudit computes scores across six axes, and those axes have very different drift characteristics. Understanding which ones change passively — without any code push — versus which ones require active regression to change is the difference between a monthly rescan cadence and a daily one.

The six axes and their drift velocity

Every SkillAudit report card breaks a score into six components, each weighted in the overall letter grade. Drift analysis requires looking at each axis independently, because a server can hold an A in Security while its Maintenance score collapses from A to C in 60 days.

Maintenance
87% regress
Dependency security
74% regress
Credential exposure
38% regress
Permissions hygiene
22% regress
Security (static)
19% regress
Documentation
31% change

Percentage of servers that showed a negative axis-level change in the 30-day rescan window. Security (static) and Documentation can also change positively after author remediation; the percentages here capture net negative movement.

The pattern is stark. Maintenance and Dependency security are the high-velocity drift axes. Both are driven by events outside the repository — the passage of time and the NVD advisory feed — rather than code changes the author controls. A server frozen at version 1.0.0 on day one will see its Maintenance score erode on a fixed schedule; a server whose dependencies include a package that receives a CVE will see its Dependency score drop the moment SkillAudit's advisory sync processes the new entry.

Maintenance: the silent timer

The Maintenance axis measures five signals: last commit date, open issue age, release cadence, whether the package is marked deprecated, and whether the maintainer has acknowledged open advisory notices. Three of those five are time-dependent by definition. They do not require the author to do anything wrong — they just require the author to do nothing, which is the normal state for any project that reaches a stable 1.0.

The scoring thresholds are calibrated to the MCP ecosystem's pace of change. MCP is a 2024 specification that was substantially revised in 2025. A server whose last commit is 90 days old may be correctly feature-complete, but it also may not have incorporated the spec changes that corrected the original authorization model. SkillAudit cannot distinguish the two from the outside, so it applies the conservative interpretation: 90+ days without a commit triggers the first Maintenance penalty.

Signal Score at scan day 0 Score at 30 days Score at 90 days Trigger
Last commit A (0 days since commit) A (30 days) C (90 days) Passive — time passes
Open issue age A (no issues >30 days) B (oldest issue 30d) C (oldest issue 90d) Passive — issues age
Advisory acknowledgement A (no open notices) A (no new notices) D (2 unacknowledged) CVE published upstream
Release cadence A (released last month) A (still recent) B (two months since release) Passive — time passes
Deprecation status A (active) A (active) A (unless deprecated) Author action required

A server that ships in a healthy state on day zero and then enters maintenance-mode silence will typically cross from A to C on the Maintenance axis within two to three months. If an upstream dependency publishes a CVE in that window, the advisory acknowledgement signal compounds the regression. A server that started at a B overall — perhaps due to a minor permissions hygiene finding — can easily arrive at a D overall by the three-month mark without a single bad commit.

Dependency security: the inherited risk problem

The Dependency security axis is the most unpredictable drift source because it is driven by the external advisory ecosystem, not by anything the server author controls or can anticipate. A popular npm package used by dozens of community MCP servers can receive a critical CVE, and every server that pins a vulnerable version will see its score drop simultaneously.

Scenario: the transitive dependency drop

A well-maintained MCP server uses axios as a direct dependency for outbound HTTP calls. In month two after initial scan, a moderate-severity advisory is published for axios covering a header-injection edge case. The server does not use the affected code path. But SkillAudit records the advisory against the pinned version and applies a Dependency axis penalty — because from the outside, there is no way to verify that the code path is unreachable given the LLM-generated tool arguments this server accepts at runtime.

The server author has three options: bump to the patched version, add an npm audit override with a documented justification that SkillAudit can read, or accept the score reduction until the patch is available. The score has moved — not because the code changed, but because the advisory ecosystem changed around it.

Transitive dependencies compound the problem. A direct dependency on a well-maintained package does not guarantee that the package's own dependencies are equally well-maintained. SkillAudit walks the full dependency tree and aggregates advisory counts by severity. A server with three production dependencies and 47 transitive dependencies has 50 potential vectors for passive score regression.

The most common Dependency drift pattern in the rescan data is:

  1. Initial scan: Score A or B — direct dependencies are current, no known advisories.
  2. 30 days: One or two low-severity advisories for transitive dependencies. Score B or B−.
  3. 90 days: A moderate-severity advisory for a direct dependency. Score drops to C. Author has not bumped the package.
  4. 120+ days: Advisory escalated to high severity or a second advisory published. Score D.

The 74% regression rate at 30 days is high but explainable: the MCP ecosystem is young, its dependency tree overlaps heavily with the broader Node.js ecosystem, and 2025–2026 saw a high rate of advisory publishing for packages in the OAuth, HTTP, and serialization spaces — exactly the packages MCP servers tend to use most.

Where scores improve: the remediation signal

Not all drift is negative. The rescan data shows a clear remediation signal on two axes: Permissions hygiene and Security (static). Both axes show net improvement in the 30-day window, and the improvement is concentrated in servers that scored C or D on those axes at initial scan.

Axis Servers that improved (30-day rescan) Most common remediation
Permissions hygiene 51% of C/D scorers improved Added explicit scope declarations; removed overbroad file-system access
Security (static) 44% of C/D scorers improved Replaced eval(); added input validation; removed shell=true exec calls
Credential exposure 29% of C/D scorers improved Removed log statements echoing tokens; moved secrets to env vars

The improvement rate is high enough to suggest that the initial SkillAudit report is functioning as a remediation prompt: authors who see their first report card with a C on Permissions hygiene tend to fix the obvious findings quickly. The Security (static) axis shows a similar pattern — the most common finding in that axis (shell invocation with user-controlled arguments) is also one of the most mechanical fixes, and many authors address it within the first rescan window.

This creates an interesting composite pattern: a server scanned on launch day, rescanned 30 days later, often shows a higher Security and Permissions score than at initial scan, while simultaneously showing a lower Maintenance and Dependency score. The net effect depends on the weighting, but the practical implication is that a badge earned on launch day can be outdated in both directions — sometimes more reassuring than warranted, sometimes less.

The badge staleness problem

SkillAudit badges are computed at scan time and cached. A green badge in a README reflects the server's state at the moment of the last scan, not its current state. For a server that was scanned once and never rescanned, the badge can represent a state that is months out of date.

How badge staleness affects trust. A team lead evaluating an MCP server for enterprise adoption sees a green badge and proceeds with installation. The badge reflects a B grade from 90 days ago. Since that scan, the server's primary HTTP library received a moderate CVE, and the last commit was 75 days ago. The current score, if rescanned, would be a D. The badge provided a false signal of ongoing health rather than a point-in-time snapshot. SkillAudit Pro addresses this by showing badge age in the embedded HTML and triggering an automatic rescan notification when the cached score is older than 30 days.

The badge staleness problem is structural: a static badge cannot reflect dynamic security state. The industry equivalent is Snyk's "known vulnerabilities" badge, which auto-updates as new advisories are published. SkillAudit takes the same approach for Pro users: the badge is backed by a live endpoint that returns the most recently computed score. Free-tier badges are point-in-time snapshots with a visible age timestamp.

Rescan cadence recommendations

Based on the drift data, the appropriate rescan cadence depends on the server's maintenance activity and dependency footprint:

W

Weekly — for actively deployed servers in production pipelines

Any server gated by a minimum-grade policy in CI should be rescanned at least weekly. Dependency advisories publish continuously; a weekly rescan window means a newly published CVE can trigger a failing CI gate within 7 days of the advisory going public, before the next scheduled maintenance review.

M

Monthly — for stable servers used internally but not in CI gates

The 30-day rescan window catches most Dependency drift events before they compound. It also catches the Maintenance axis crossing the 30-day commit-staleness threshold. Monthly rescans are the minimum cadence for any server that was granted installation approval based on a prior SkillAudit report.

Q

Quarterly — for archived servers or low-stakes internal tools

A quarterly rescan will detect major version-level regressions and critical CVEs, but will miss moderate advisories that compound over the window. Acceptable for servers with limited network access or read-only filesystem scope; insufficient for any server with outbound HTTP, shell invocation, or write access to sensitive resources.

Setting a minimum grade with drift in mind

Many team policies set a minimum SkillAudit grade as a condition for installing a community MCP server. A common initial choice is B. But the drift data suggests that a B gate set against a point-in-time scan will pass servers that drift to C or D before the next rescan. There are three practical approaches to this:

1. Pair the grade gate with a maximum badge age. Require not just a minimum grade, but a minimum grade within the last N days. SkillAudit's CI integration includes a --max-age-days flag precisely for this. Setting --min-grade B --max-age-days 30 means a server must have achieved a B within the last 30 days, not at any point in the past.

2. Gate on a composite score rather than the letter grade. The letter grade is a bucketed aggregate. A server with perfect Security and Permissions hygiene but a moderate Dependency score may receive the same letter grade as a server with critical static findings but a clean dependency tree. For team-level policy, gating on per-axis thresholds — Security ≥ 80, Dependency ≥ 70, Maintenance ≥ 60 — provides more granular control than a letter grade alone.

3. Use the delta report rather than the absolute score. SkillAudit Pro's rescan output includes a delta view: the change in each axis score since the last scan, not just the current value. A server holding a B but with a 15-point drop in Maintenance over 30 days is a different risk signal than a server that has held a stable B for six months. The delta report makes that trajectory visible.

The servers most likely to drift

Drift risk is not uniformly distributed. The rescan data identifies three profiles that are disproportionately likely to show a letter-grade drop at the 30-day mark:

Hackathon-shipped servers. These typically use a dense dependency tree (five or more direct dependencies, 50+ transitive), launch with a burst of commit activity, and then go quiet. The Maintenance axis starts degrading immediately; the Dependency axis follows as the hackathon-era dependency versions age out of advisory-clean status. At 60 days, the median hackathon-shipped server has dropped a full letter grade.

Wrappers around unstable upstream APIs. An MCP server that wraps a rapidly evolving upstream API (a new LLM provider, a recently launched SaaS integration) may receive frequent commits to keep up with breaking API changes, but those same commits often introduce new dependencies or new code paths. The Security and Credential exposure axes fluctuate more for these servers than for any other profile.

High-dependency integration servers. Any server with more than 20 direct npm dependencies faces a near-certain Dependency axis regression within 90 days, statistically. The more packages a server depends on directly, the more advisory surface it exposes. The dependency pinning post covers why pinning specific versions is necessary but not sufficient — pinning locks you to a version, but advisories published against that version still accumulate.

Using the history view

For Pro users, every audit is stored and the history view plots each axis score over time. The most useful pattern to watch in the history view is the diverging scissors: the Security and Permissions axes trending up or flat (the author is remediating findings) while the Maintenance and Dependency axes trend down (time is passing and dependencies are aging). A scissors pattern typically precedes a letter-grade drop within one to two rescan cycles.

The history view also makes it easy to distinguish genuine regression from a SkillAudit engine update. Occasionally, a new version of the scanner improves detection of a vulnerability class that was previously underweighted. When this happens, scores may drop across all servers simultaneously — not because the servers changed, but because the scanner got better. History lets you distinguish "my score dropped because a new CVE was published for my dependencies" from "my score dropped because the scanner now correctly penalizes the indirect prompt injection surface I always had."

The engine v0.3 calibration post covers how score changes from scanner updates are communicated and what to expect at each major version boundary.

What to do when grade drift is detected

The remediation path depends on which axis is driving the drift:

The 48-hour rule. The servers with the best long-term grade stability share a single operational habit: when they receive a SkillAudit rescan notification showing a score drop, they address it within 48 hours. Not because a 48-hour delay is inherently risky — it is not — but because rapid response keeps the cognitive overhead low. An advisory that sits unacknowledged for three weeks becomes harder to reason about than one addressed the day it arrives. Grade hygiene is like code hygiene: the cost is proportional to how long you let the debt accumulate.

Summary: treat grades as a moving target

Security scores for MCP servers are not static achievements. They are real-time readings of a system that includes your code, your dependency tree, the broader advisory ecosystem, and the passage of time. The axes most likely to drift without any action on your part are Maintenance and Dependency security — both are driven by external clocks and external advisory feeds, not by your commits.

The practical implication for team policy is straightforward: a one-time audit is a starting signal, not a permanent approval. Any server installed based on a SkillAudit grade should be either (a) enrolled in automatic rescan notifications, or (b) subject to a periodic manual rescan cadence appropriate to its risk profile. The 30-day rescan delta post covers what the typical first-rescan report looks like for a server that shipped clean at launch. The GitHub Action gate post covers how to automate the CI enforcement so that grade drift fails the build rather than accumulating silently in a badge that no one re-reads.

See your server's grade history

SkillAudit Pro stores every scan and plots axis-level trends over time. Catch grade drift before it affects your badge or blocks your team's CI gate.

View Pro plan