Building a SkillAudit-Ready MCP Server From Scratch: Security-First Architecture Walkthrough
Most MCP servers start with functionality and retrofit security later — if at all. This tutorial reverses that order. We build a real MCP server from line one with every security layer in place: authentication on every tool call, Zod input validation with strict allow-lists, secrets loaded from environment (never hardcoded), structured audit logging, error messages that reveal nothing about internals, and a CI gate that blocks any commit that would drop the SkillAudit grade below B. When you finish, you'll have an A-grade server and understand exactly why each piece earns its grade.
What the SkillAudit grade measures
Before writing a line of code, it's worth understanding what the scanner actually checks. SkillAudit evaluates six axes, each weighted differently. A grade drops to the worst axis score — a single CRITICAL finding on any axis caps the overall grade at D regardless of how well the others score.
| Axis | Weight | What earns a CRITICAL |
|---|---|---|
| Security | 35% | SSRF, command injection, path traversal, unsanitized shell arguments |
| Input validation | 20% | Tool arguments used in shell/fs/network without validation |
| Credential exposure | 20% | Secrets hardcoded in source; tokens echoed in logs or error messages |
| Permissions hygiene | 10% | Requesting scopes never used; write access when read suffices |
| Maintenance | 10% | Last commit >12 months; open CVEs in direct dependencies |
| Documentation | 5% | No runnable example; no SECURITY.md; no version changelog |
An A grade requires no CRITICAL or HIGH findings and a composite score ≥ 90. A B requires no CRITICAL findings and a score ≥ 75. Our goal is A from the start — not a retrospective remediation.
You can read more about how SkillAudit weighs these factors in the scorecard methodology post and see real examples of each in the 10 most common reasons servers land at grade C.
What we're building
A GitHub Repository MCP server — a realistic, non-trivial server that lets AI agents query repository metadata, list recent issues, and fetch file contents from a GitHub repo. This is representative: it makes outbound HTTP calls (SSRF surface), handles user-controlled path-like arguments (path traversal surface), and uses an API token (credential exposure surface). Every dangerous pattern shows up here.
The server exposes three tools:
list_issues— list open issues with optional label and milestone filtersget_file— read a file from the default branch by pathget_repo_info— return metadata (stars, description, license) about the repository
├── src/
│ ├── index.ts # entry point: server wiring
│ ├── tools/ # one file per tool
│ │ ├── list-issues.ts
│ │ ├── get-file.ts
│ │ └── get-repo-info.ts
│ ├── auth.ts # authentication middleware
│ ├── audit.ts # structured audit log emitter
│ ├── errors.ts # safe error abstraction
│ └── config.ts # validated env config
├── SECURITY.md
├── CHANGELOG.md
├── package.json
└── .github/workflows/security.yml
Config validation: reject garbage at startup
The first security problem in most MCP servers is silent misconfiguration — a placeholder API token, a missing environment variable, a wrong endpoint URL. The server starts, runs, and either fails cryptically at runtime or — worse — makes API calls with an invalid token that happens to be accepted by a test environment.
The fix is to validate all configuration at startup using Zod and refuse to start if anything is wrong. This also prevents placeholder values like "your-token-here" or empty strings from ever reaching your tools.
// src/config.ts
import { z } from 'zod';
const ConfigSchema = z.object({
GITHUB_TOKEN: z
.string()
.min(20, 'GITHUB_TOKEN too short — is it set?')
.refine(v => !['your-token-here', 'changeme', 'xxx'].includes(v), {
message: 'GITHUB_TOKEN looks like a placeholder',
}),
GITHUB_REPO: z
.string()
.regex(/^[a-zA-Z0-9_.-]+\/[a-zA-Z0-9_.-]+$/, 'GITHUB_REPO must be owner/repo'),
MCP_API_KEY: z
.string()
.min(32, 'MCP_API_KEY must be at least 32 characters'),
PORT: z.coerce.number().int().min(1024).max(65535).default(3000),
});
function loadConfig() {
const result = ConfigSchema.safeParse(process.env);
if (!result.success) {
const issues = result.error.issues
.map(i => ` • ${i.path.join('.')}: ${i.message}`)
.join('\n');
// Log config key names (never values) and exit
console.error(`[startup] Config validation failed:\n${issues}`);
process.exit(1);
}
// Log that config loaded — key names only, never values
console.log('[startup] Config loaded:', Object.keys(result.data).join(', '));
return result.data;
}
export const config = loadConfig();
Key decisions here: the startup log prints key names, never values. This is the pattern described in our secrets management deep dive — audit that configuration was present without creating a credential-exposure finding by logging the actual token value.
Authentication: verify on every tool call
Authentication should be a hard prerequisite at the tool-call level, not a session-level gate that can be bypassed. The pattern that earns grade FAIL: validate auth once during handshake, then trust all subsequent calls from that session. Sessions can be hijacked, replayed, or the auth check can be bypassed entirely by framing requests correctly.
The correct pattern for MCP servers is to validate the API key on every tool invocation, before any tool logic runs.
// src/auth.ts
import { McpError, ErrorCode } from '@modelcontextprotocol/sdk/types.js';
import { config } from './config.js';
import { emitAuditEvent } from './audit.js';
export function requireAuth(callerId: string, providedKey: string | undefined): void {
if (!providedKey) {
emitAuditEvent({ type: 'auth_missing', callerId });
throw new McpError(ErrorCode.Unauthorized, 'API key required');
}
// Constant-time comparison to prevent timing attacks
const expected = Buffer.from(config.MCP_API_KEY);
const actual = Buffer.from(providedKey);
if (
expected.length !== actual.length ||
!require('crypto').timingSafeEqual(expected, actual)
) {
emitAuditEvent({ type: 'auth_failed', callerId });
throw new McpError(ErrorCode.Unauthorized, 'Invalid API key');
}
emitAuditEvent({ type: 'auth_ok', callerId });
}
Two details matter here. First: timingSafeEqual prevents an attacker from learning the key length by timing comparisons of different-length inputs. Second: the error message is identical for "missing key" and "wrong key" — there is no distinction an attacker can use to determine whether an account exists. Both paths emit an audit event so anomaly detection can fire on repeated failures.
Structured audit logging: every tool call leaves a record
An audit log is not console.log('tool called'). An effective audit log for a SkillAudit-grade-A server captures: who called (caller ID, session), what was called (tool name), what arguments were used (scrubbed of sensitive data), whether it succeeded or failed, and how long it took. This makes anomaly detection possible and satisfies SOC 2 Type II requirements for audit trail completeness.
// src/audit.ts
type AuditEvent =
| { type: 'auth_missing' | 'auth_failed' | 'auth_ok'; callerId: string }
| { type: 'tool_called'; tool: string; callerId: string; argsHash: string }
| { type: 'tool_ok'; tool: string; callerId: string; durationMs: number }
| { type: 'tool_error'; tool: string; callerId: string; code: string; durationMs: number }
| { type: 'rate_limited'; callerId: string; tool: string };
export function emitAuditEvent(event: AuditEvent): void {
// Structured JSON to stdout — ship to your log aggregator from here
console.log(JSON.stringify({
ts: new Date().toISOString(),
...event,
}));
}
// Wrap a tool handler to auto-emit timing + outcome events
export function withAudit<T>(
tool: string,
callerId: string,
argsHash: string,
fn: () => Promise<T>,
): Promise<T> {
emitAuditEvent({ type: 'tool_called', tool, callerId, argsHash });
const start = Date.now();
return fn().then(
result => {
emitAuditEvent({ type: 'tool_ok', tool, callerId, durationMs: Date.now() - start });
return result;
},
err => {
emitAuditEvent({
type: 'tool_error',
tool,
callerId,
code: err?.code ?? 'UNKNOWN',
durationMs: Date.now() - start,
});
throw err;
},
);
}
The argsHash argument is a SHA-256 hash of the serialized tool arguments. This gives audit log consumers the ability to correlate calls without exposing the actual argument values — a search query that happens to contain a password won't land in your SIEM logs verbatim.
Safe error messages: never expose internals
Tool errors are a high-value information leak channel. A stack trace in a tool error tells an attacker your file paths, library versions, and internal variable names. An error message that echoes back the problematic argument confirms the attack vector. SkillAudit's scanner looks for err.message and err.stack propagated directly into McpError bodies.
// src/errors.ts
import { McpError, ErrorCode } from '@modelcontextprotocol/sdk/types.js';
// Map internal errors to safe, opaque MCP errors
export function toMcpError(err: unknown, tool: string): McpError {
if (err instanceof McpError) return err; // already safe — pass through
// Log the real error internally (to audit, not to caller)
console.error(`[${tool}] Internal error:`, err);
// Return a safe, opaque error to the caller
return new McpError(
ErrorCode.InternalError,
`${tool} encountered an error. If this persists, contact support.`,
);
}
// Validate-and-throw helper: safe error for invalid tool arguments
export function badParam(param: string, reason: string): never {
throw new McpError(
ErrorCode.InvalidParams,
`Invalid parameter '${param}': ${reason}`,
);
}
The pattern is: log the real error internally (where only your ops team can see it), return a sanitized error to the MCP client. The caller learns that something went wrong and which tool failed — nothing more. This is the same approach covered in our post on the distinction between rejecting a tool call versus returning an error.
Input validation: Zod schemas with strict allow-lists
Every tool argument that touches a filesystem path, a network call, a shell command, or a database query is an injection surface. The only safe approach is to define exactly what you accept with a schema, and reject everything else. Allow-lists beat deny-lists — you can't enumerate every dangerous input, but you can enumerate every valid one.
Here's the get_file tool — the most dangerous one, since it involves a file path argument:
// src/tools/get-file.ts
import { z } from 'zod';
import { McpError, ErrorCode } from '@modelcontextprotocol/sdk/types.js';
import { config } from '../config.js';
import { requireAuth } from '../auth.js';
import { withAudit, emitAuditEvent } from '../audit.js';
import { toMcpError } from '../errors.js';
import * as crypto from 'crypto';
// Strict allow-list: file paths only, no traversal sequences
const GetFileSchema = z.object({
path: z
.string()
.max(500)
.regex(
/^[a-zA-Z0-9_\-./]+$/,
'path may only contain letters, digits, underscores, hyphens, dots, and forward slashes',
)
.refine(p => !p.includes('..'), { message: 'path traversal sequences not allowed' })
.refine(p => !p.startsWith('/'), { message: 'absolute paths not allowed' }),
caller_id: z.string().min(1).max(100),
api_key: z.string().min(1),
});
export async function handleGetFile(rawArgs: unknown) {
// 1. Validate arguments — throws McpError(InvalidParams) on failure
const parseResult = GetFileSchema.safeParse(rawArgs);
if (!parseResult.success) {
const msg = parseResult.error.issues.map(i => i.message).join('; ');
throw new McpError(ErrorCode.InvalidParams, `get_file: ${msg}`);
}
const { path: filePath, caller_id, api_key } = parseResult.data;
// 2. Authenticate
requireAuth(caller_id, api_key);
// 3. Execute with audit wrapping
const argsHash = crypto
.createHash('sha256')
.update(JSON.stringify({ path: filePath }))
.digest('hex')
.slice(0, 16);
return withAudit('get_file', caller_id, argsHash, async () => {
const url =
`https://api.github.com/repos/${config.GITHUB_REPO}/contents/${filePath}`;
const resp = await fetch(url, {
headers: {
Authorization: `Bearer ${config.GITHUB_TOKEN}`,
Accept: 'application/vnd.github.v3+json',
'User-Agent': 'github-mcp-server/1.0',
'X-Request-ID': argsHash,
},
});
if (!resp.ok) {
if (resp.status === 404) {
throw new McpError(ErrorCode.InvalidParams, `File not found: ${filePath}`);
}
throw new McpError(ErrorCode.InternalError, 'get_file: upstream error');
}
const data = await resp.json() as { content?: string; encoding?: string; size?: number };
if (data.size && data.size > 1_000_000) {
throw new McpError(ErrorCode.InvalidParams, 'File exceeds 1 MB limit');
}
const content = data.encoding === 'base64' && data.content
? Buffer.from(data.content, 'base64').toString('utf8')
: '';
return { content, path: filePath };
}).catch(err => { throw toMcpError(err, 'get_file'); });
}
config.GITHUB_REPO — a startup-validated value that must match owner/repo format. Never interpolate raw tool arguments into a URL without the same strict validation. The path traversal refine checks (.. sequences, no leading /) prevent a caller from escaping the target repository path structure.
Rate limiting: protect against agentic abuse
Agentic runtimes can get stuck in loops. A bug in an orchestrator can cause a single agent to call your tool thousands of times per minute. Without rate limiting, this creates availability risk (your upstream API gets blocked), cost risk (GitHub API rate limits count per token), and security risk (a large number of calls is the signature of active exploitation — and you need the circuit to trip before an attacker can exfiltrate significant data).
// src/rate-limit.ts
import { McpError, ErrorCode } from '@modelcontextprotocol/sdk/types.js';
import { emitAuditEvent } from './audit.js';
// Simple in-process sliding window. Use Redis for multi-instance deployments.
const windows = new Map<string, number[]>();
export function checkRateLimit(callerId: string, tool: string): void {
const key = `${callerId}:${tool}`;
const now = Date.now();
const windowMs = 60_000; // 1 minute
const limit = 30; // 30 calls per minute per caller per tool
const timestamps = (windows.get(key) ?? []).filter(t => now - t < windowMs);
timestamps.push(now);
windows.set(key, timestamps);
if (timestamps.length > limit) {
emitAuditEvent({ type: 'rate_limited', callerId, tool });
throw new McpError(
ErrorCode.ResourceExhausted,
`Rate limit exceeded for ${tool}. Limit: ${limit} calls/min.`,
);
}
}
The rate limiter runs before the auth check is expensive or the tool logic runs any I/O. It also emits an audit event — which is how anomaly detection picks up the exploitation signature described in our zero-day incident timeline.
Server wiring: compose it all together
The entry point wires config, tools, and MCP SDK together. Each tool registration names the tool, declares its input schema (for client introspection), and delegates to the handler:
// src/index.ts
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { CallToolRequestSchema } from '@modelcontextprotocol/sdk/types.js';
import { handleGetFile } from './tools/get-file.js';
import { handleListIssues } from './tools/list-issues.js';
import { handleGetRepoInfo } from './tools/get-repo-info.js';
import { checkRateLimit } from './rate-limit.js';
// config is validated at import time — process.exit(1) if invalid
import './config.js';
const server = new Server(
{ name: 'github-mcp-server', version: '1.0.0' },
{ capabilities: { tools: {} } },
);
server.setRequestHandler(CallToolRequestSchema, async request => {
const { name, arguments: args } = request.params;
// Rate limit check before any tool logic
const callerId = (args as Record<string, unknown>)?.caller_id as string ?? 'anonymous';
checkRateLimit(callerId, name);
switch (name) {
case 'get_file': return handleGetFile(args);
case 'list_issues': return handleListIssues(args);
case 'get_repo_info': return handleGetRepoInfo(args);
default:
throw new Error(`Unknown tool: ${name}`);
}
});
const transport = new StdioServerTransport();
await server.connect(transport);
SECURITY.md and CHANGELOG: documentation axis
The documentation axis has the lowest weight (5%) but is the easiest 5% to earn — and some findings here can prevent CRITICAL findings in the security axis. A SECURITY.md file signals to the SkillAudit scanner (and to security researchers) that there is a responsible disclosure channel. Without it, the scanner flags a LOW finding: "No responsible disclosure mechanism."
# SECURITY.md
## Reporting a vulnerability
If you discover a security vulnerability in this MCP server, please report it
via GitHub's private vulnerability disclosure feature:
https://github.com/owner/github-mcp-server/security/advisories/new
Do NOT open a public issue for security vulnerabilities.
We aim to acknowledge reports within 48 hours and release a patch within
14 days for confirmed vulnerabilities.
## Security design
- All tool calls require API key authentication (per-call, not per-session)
- Tool arguments are validated with strict allow-list schemas before use
- No secrets are stored in source code; all credentials are environment variables
- Error messages never include stack traces or internal details
- All tool calls are logged to a structured audit log
A CHANGELOG.md with dated version entries satisfies the "versioning" documentation check. It doesn't need to be elaborate — one line per release noting the version, date, and what changed is enough to earn the full score on that sub-check.
CI security gate: enforce the grade on every commit
Everything built so far can regress. A new contributor adds console.log(config.GITHUB_TOKEN) for debugging and forgets to remove it. Someone adds a new tool that passes arguments directly to a shell command. The CI gate is what prevents grade regressions from shipping.
# .github/workflows/security.yml
name: Security Gate
on:
push:
branches: [main]
pull_request:
jobs:
skillaudit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run SkillAudit
uses: skillaudit/github-action@v1
with:
token: ${{ secrets.SKILLAUDIT_TOKEN }}
min-grade: B
fail-on: HIGH,CRITICAL
- name: Upload audit report
if: always()
uses: actions/upload-artifact@v4
with:
name: skillaudit-report
path: skillaudit-report.json
The min-grade: B setting blocks merges if the overall grade falls below B. The fail-on: HIGH,CRITICAL setting fails the CI job on any finding of severity HIGH or CRITICAL, regardless of composite grade. Together these create a two-layer gate: you can't ship a regression that drops the grade, and you can't ship a serious individual finding even if the composite holds up.
For full details on setting up this gate, see our dedicated post on the GitHub Action security gate.
The first audit: what you'll see
Running SkillAudit on the server built in this walkthrough produces a clean report. Here's what each axis scores and why:
| Axis | Score | Finding count | Why |
|---|---|---|---|
| Security | A | 0 findings | No SSRF (URL from validated config), no path traversal (allow-list regex + .. check), no shell injection (no shell calls) |
| Input validation | A | 0 findings | All tool arguments pass through Zod schemas before use; rejections throw typed McpError |
| Credential exposure | A | 0 findings | Config loaded from env; startup log prints key names only; error handler strips stack traces; no credential echoed in any code path |
| Permissions hygiene | A | 0 findings | GitHub token scope documented in README as read-only; contents:read is the only required scope |
| Maintenance | A | 0 findings | No open CVEs in fresh install; last commit is recent; CHANGELOG present |
| Documentation | A | 0 findings | SECURITY.md with disclosure contact; CHANGELOG with version history; README with runnable example |
Common shortcuts that cost you the A grade
Building this from scratch is easier than retrofitting these patterns onto an existing server. Here are the shortcuts developers take that most commonly drop grades:
"I'll validate the input in the tool, not at the boundary"
Tool logic tends to grow. A validation check buried inside a conditional or written as an early return gets removed or bypassed when the logic is refactored. Zod schemas at the argument-parse boundary are immovable — you can't run tool logic without passing through them.
"The session is authenticated; I don't need to check on every call"
Session-level auth is breakable. Per-call auth isn't. The overhead of a constant-time string comparison is microseconds — it's not a performance concern. The security difference is substantial: per-call auth means a stolen session token's damage is bounded to the window in which it's used, not the entire session lifetime.
"I'll use err.message in the error response so callers know what happened"
Internal error messages are written for developers, not for callers. They contain file paths, variable names, and hints about the internal data model. Stripping them before the response is not hiding information from the caller — it's not handing an attacker a map of your internals.
"The CI gate would slow down iteration"
Grade regressions accumulate invisibly without a CI gate. A server that starts at A and has a CI gate stays at A. A server without a gate drifts toward C over twelve months as each contributor adds a small convenience that trades off a security control. The gate is cheap insurance.
Maintaining the grade over time
An A grade at first publish is not permanent. Dependency vulnerabilities appear. Contributors add new tools. The GitHub API changes. A few patterns keep the grade stable over the life of the server:
- Pin major versions in package.json and use Dependabot or Renovate with auto-merge for patch updates that pass CI. CVEs in unpinned dependencies are the most common cause of the maintenance score dropping over time.
- Add a SkillAudit badge to your README. The badge updates automatically on every push. Maintaining a green badge in public creates a reputational commitment that makes it harder to let findings accumulate.
- Treat HIGH findings as blocking. A single HIGH finding that sits unresolved for 30 days becomes a pattern that reviewers flag when deciding whether to adopt your server. The CI gate prevents this by making the finding visible immediately.
- Add a
securitykey to yourpackage.jsonpointing to your SECURITY.md. This is indexed by npm's security advisories system and picked up by SkillAudit's documentation check.
Run the audit on your own server
If you have an existing MCP server and want to see where it stands against the checklist above, paste your GitHub URL into SkillAudit. The scanner returns a graded report across all six axes, with specific findings, file locations, and remediation hints. The free tier covers public repos with no account required.
The patterns in this post — config validation at startup, per-call auth, structured audit logging, Zod input schemas, safe error abstraction, CI gate — are the same patterns SkillAudit's scanner is checking for. Build them in from the start and the audit confirms what you already know.
Run a free audit on your MCP server
Audit my server →