Buyer Guide · 2026-05-31

How to Read a SkillAudit Report

A SkillAudit report covers six graded axes, individual severity-tagged findings, and a composite letter grade. If you're looking at one for the first time — whether you're a team lead evaluating a community MCP server before adoption, or an author reading your own results — this guide walks through every section in the order it appears, explains what the numbers mean in practice, and gives you a decision framework for what to do next.

The composite grade: what the letter means

The top of every report shows a single letter grade: A through F. This is not a simple average. It's a weighted rollup that gives extra weight to the Security and Credentials axes, since exploitable vulnerabilities in those areas carry real incident risk. A server can score well on Documentation and Maintenance and still earn an F if it has an unmitigated SSRF or credential echo — and that's intentional.

Grade scale

A
No HIGH findings. Zero or one WARN. Safe to install with standard review. This represents the top 19% of the 101-server corpus. Authors at this level typically use parameterized subprocess calls, allowlist-based URL validation, and never echo environment variables in error output.
B
No HIGH findings. Two or three WARNs. Installable with documented caveats. WARNs at this level are usually permission hygiene issues (asking for more OAuth scope than necessary) or soft maintenance signals (no CHANGELOG, no semver tags). Nothing actively exploitable.
C
No HIGH on Security or Credentials. Moderate WARNs or one isolated HIGH on a lower-weight axis. Requires review before team-wide deployment. C-grade servers are common among actively maintained repos that haven't done a security-specific pass — the code is functional and well-documented but hasn't been hardened.
D
One or two HIGH findings on Security or Credentials. Do not deploy to shared team environments until findings are remediated. D-grade servers often have a single clear fix — parameterize a shell call, remove a console.log(process.env) line — but that fix has not yet been made.
F
Multiple HIGH findings, or any critical-severity finding (exploitable data exfiltration path, unrestricted shell exec). Block from installation pending author remediation. 64% of the 101-server corpus earned an F or D. F is not rare — it's what happens when a capable developer writes an MCP server without a security-aware pass.

One important nuance: a perfect score of 100/100 is different from an A. A score of 100 means no findings at all on any axis — no WARNs, no INFO notes, no maintenance flags. Only two servers in the 101-server corpus achieved 100: the LangChain MCP adapters and the Vectara MCP server. An A-grade covers scores from roughly 85–99; a perfect 100 is its own category. We call these out explicitly in the public audits index.

The six axes explained

Below the composite grade, the report shows a score and letter for each of the six graded axes. Each axis is independent — a server can earn an A on Documentation while earning an F on Security. This breakdown is more actionable than the composite because it tells you exactly where the problems are.

Axis 1

Security

Static taint-flow analysis for SSRF, command-exec, and path-traversal patterns; LLM-assisted prompt-injection probe on tool descriptions and response handling. This is the highest-weight axis in the composite. A HIGH here means an actively exploitable code path exists.

Axis 2

Credentials

Checks for credential echo (environment variables surfaced in error messages or tool responses), hardcoded tokens in source files, and token presence in git history. Second-highest weight. A HIGH here means user secrets are at active risk of exfiltration.

Axis 3

Permissions

Does the mcp_config.json declare only the scopes it actually needs? Do OAuth flows request the minimum viable token scope? WARNs here are common and usually fixable by trimming the permissions declaration — they rarely indicate actual exploitation paths.

Axis 4

Maintenance

Last commit date, presence of open security-relevant issues, advisory feed subscription, CHANGELOG cadence, and semver discipline. A C or D on Maintenance doesn't mean the server is unsafe today — it means you can't rely on timely patches when a vulnerability is found.

Axis 5

Compatibility

Does the server work on Claude Code, Cursor, Windsurf, and Codex without undocumented configuration? Transport-layer checks (stdio vs HTTP-transport handling), protocol version declarations, and multi-client smoke tests. WARNs here are common for servers built for one client that haven't been tested on others.

Axis 6

Documentation

Is there a runnable README example? Does the installation section cover all supported clients? Are tool descriptions accurate enough to avoid prompt-injection via misleading capability claims? The lowest-weight axis in the composite — a pure D here won't fail an otherwise clean server.

For a deeper explanation of how each axis is scored, see the Methodology page. The methodology page also documents the specific AST patterns used in the Security axis and the prompt-injection probe format used in LLM-assisted testing.

Individual findings: severity levels

Each axis break-out lists the individual findings that contributed to its score. Findings have four severity levels. Understanding the distinction matters most when you're deciding whether to remediate before or after deployment.

Severity levels

HIGH
Exploitable vulnerability with a documented attack path. The report includes the specific file, line number, and function name where the issue exists, plus a minimal reproduction case. HIGH findings on the Security or Credentials axes are the only findings that can cause a grade to drop to D or F on their own. Fix these before any team deployment. The report's remediation section gives a specific code change, not just a description of the problem.
WARN
A configuration, design, or practice that creates meaningful risk under realistic conditions. Not immediately exploitable from the outside but not safe to ignore. Examples: requesting write scope when only read is needed (Permissions), a dependency with a known non-critical CVE (Maintenance), error messages that include stack traces with internal paths (Credentials). WARNs accumulate in the scoring — three WARNs on a single axis can drop it from A to C.
PASS
The check was run and the pattern was not found. PASSes are shown explicitly rather than hidden so you can confirm which checks actually ran. If a PASS is listed for SSRF detection, you know the taint-flow analysis completed successfully — not that we skipped it. An audit with zero PASSes is an audit that couldn't load the source tree.
INFO
A factual observation with no scoring impact. INFOs are surfaced when something is worth noting that isn't a risk: a large tool count that might be worth consolidating, a transport type that limits some client integrations, or an unusual dependency that warrants a manual review even though no CVE is listed. INFOs do not affect the grade and are not remediation targets — they're context for your own judgment.

The finding detail view

Each WARN or HIGH finding in the report expands into a detail card. The card has four fields:

Finding card anatomy

The remediation block is designed to be copy-pasteable for the common cases. We don't list the remediation as "sanitize your inputs" — we show the specific function call, the specific argument reshaping, or the specific config change that eliminates the pattern. If a finding requires judgment (e.g., "consolidate these three permission scopes into one read-only grant depending on your platform"), that judgment is spelled out explicitly rather than left as an exercise for the author.

The badge and public permalink

Every completed audit generates two outputs beyond the report itself:

The public report at the permalink shows the composite grade, each axis grade, and the WARN/HIGH finding list. It does not show the full remediation detail — that requires a Pro account. The reasoning: the finding list is what directory reviewers and buyers need; the remediation detail is what authors need to fix their code, and that's a paid workflow.

The install-gate decision framework

For team leads evaluating a community MCP server, the report gives you enough signal to make a quick install-or-block call. Here's the framework we recommend:

Install

Composite grade A or B, no HIGH findings on Security or Credentials. Add to your approved server list. If the server has WARNs on Permissions or Maintenance, note them in your internal documentation but don't block on them.

Review first

Composite grade C, or B with a HIGH on Permissions or Maintenance. Read the specific findings. Most C-grade servers have one or two fixable WARNs on lower-weight axes — if the Security and Credentials axes are clean, the server is probably safe to install with documented caveats. File an issue with the author linking the public audit permalink and noting the WARNs. Install in a sandboxed dev environment while waiting for the fix.

Block until remediated

Composite grade D or F, or any HIGH finding on Security or Credentials regardless of composite grade. Do not deploy to shared team environments. Forward the public audit permalink to the author with the specific HIGH findings highlighted. Re-run the audit after the author submits a fix PR — the audit will update automatically if you've enabled webhook polling, or you can trigger a manual re-run from the audits dashboard.

For teams using the Team plan, you can encode this framework directly into CI by setting a minimum grade in your policy export. A pipeline that runs skillaudit-gate --min-grade C on any mcp_config.json change will block PRs that add a new server below your threshold — no manual review required. See the MCP install gate policy post for the GitHub Actions integration.

If you're an author: what to fix first

Authors reading their own report for the first time often ask where to start. The priority order below reflects both the scoring weights and the actual remediation effort — the highest-impact fixes are usually also the fastest to implement.

1

Any HIGH on Security — especially SSRF and command injection

These are the findings that cause 43% of servers to earn an F. The fix is almost always mechanical: replace template-string subprocess calls with spawn(cmd, [args], { shell: false }), and add an origin allowlist before any URL-fetch operation. See the 12-item security checklist for the grep commands to find these patterns in your own repo.

2

Any HIGH on Credentials — especially env echo

The most common Credentials HIGH is also the easiest to fix: a console.error(err) or throw new Error(JSON.stringify(config)) that includes the entire config object — which includes your API key. Wrap error output in a sanitizer that strips keys matching common token patterns before logging. The anatomy of a credential leak post walks through the three most common patterns.

3

WARNs on Permissions (scope reduction)

After the Security and Credentials HIGHs are clear, scope reduction is the next-highest-value fix. It doesn't require code changes — just a mcp_config.json update. Trim the declared scopes to exactly what your tool handlers actually use. This is often a 5-minute change that moves a B to an A.

4

Maintenance axis improvements (CHANGELOG, semver tags)

Add a CHANGELOG.md if you don't have one, and tag your current release with a semver version. Both are 10-minute operations that improve the Maintenance score and signal to buyers that the repo is actively stewarded. They also unlock advisory feed coverage since most tools key off git tags for version mapping.

5

Documentation axis improvements (runnable example, multi-client installation)

Add a minimal runnable example to your README — a three-command install sequence that actually works on a clean machine. Then add a Claude Code–specific installation section alongside your Cursor section if you only have one. These are low-effort changes that move Documentation from D to B and make the server much easier to evaluate for non-technical buyers.

Re-running an audit after a fix

After you've pushed a fix, you can trigger a re-run from the Audits dashboard without re-entering the URL. The re-run pulls from the current default branch HEAD. If you're on the Free plan, re-runs count against your 3-audit monthly limit. Pro accounts have unlimited re-runs, which is the primary reason authors on active repos upgrade — you want to run the audit after every fix PR to confirm the finding closed, not discover it reopened a month later.

The badge URL updates immediately on re-run completion. If you've already embedded the badge in your README, it will show the new grade without any README edit required.

What the report doesn't cover

No audit tool covers everything. SkillAudit's current version has three known gaps, documented here so you're not surprised when you encounter them:

  1. Cross-tool privilege chaining. When a server exposes multiple tools and one tool's output is fed into another tool's input by the LLM, the combined capability may exceed what either tool could do alone. Static analysis sees individual tool handlers; it doesn't model multi-step LLM workflows. We're building a dynamic analysis layer for this.
  2. Long-lived session state. Some servers maintain state between tool calls. If that state can be poisoned by a malicious tool response earlier in the session, subsequent calls may behave insecurely. The current audit is stateless — it doesn't model session-carry attacks.
  3. Runtime supply chain. The audit checks your declared dependencies for known CVEs but doesn't execute the code, so a dependency that is clean at audit time but fetches malicious content at runtime (post-install hooks, lazy fetches) won't be caught. Use lockfiles and verify them.

These gaps are documented in the Methodology page alongside the techniques we do use. We'd rather you know the limits than discover them in production.


Ready to run your first audit? Paste a GitHub URL into the Audits page — the first three are free, no account required.