DevSecOps Guide

MCP server security for DevSecOps: integrating SkillAudit into CI/CD pipelines

Shift security left: wire grade-gates into GitHub Actions, GitLab CI, and pre-commit hooks so a server that fails the SSRF or prompt-injection check never reaches the registry.

2026-06-10 · DevSecOps Guide · 12 min read

Why CI/CD is the right choke point

MCP security vulnerabilities are discovered late. The typical sequence: a developer ships a server, a security team runs a post-hoc scan, findings pile into a backlog, and by the time the fix lands the server has been installed by dozens of Claude Code users or approved in an enterprise directory.

The DevSecOps answer is to move the audit into the pipeline — treat a failing grade the same way you'd treat a failing unit test or a failing SAST scan. The build is red. The merge is blocked. The author sees the issue in the same PR review where they'd fix any other bug.

SkillAudit's CI webhook makes this mechanical. You add a GitHub Action or GitLab CI job that calls the audit API, parses the returned grade, and sets the exit code accordingly. If the grade is below your policy threshold, the job fails and the merge is blocked. No humans required in the normal path; the security team's job becomes setting policy, not triaging individual findings.

What the CI webhook returns

A SkillAudit audit API call returns a JSON report card. The structure you need for grade-gating:

{
  "audit_id": "au_8f3b4c2d",
  "repo": "https://github.com/myorg/mcp-filesystem-server",
  "grade": "B",
  "score": 74,
  "axes": {
    "security":       { "grade": "B", "score": 71, "findings": 2 },
    "permissions":    { "grade": "A", "score": 95, "findings": 0 },
    "credentials":    { "grade": "A", "score": 100, "findings": 0 },
    "maintenance":    { "grade": "B", "score": 68, "findings": 1 },
    "compatibility":  { "grade": "C", "score": 52, "findings": 3 },
    "documentation":  { "grade": "B", "score": 80, "findings": 0 }
  },
  "critical_findings": [
    {
      "axis": "security",
      "code": "SSRF-001",
      "severity": "HIGH",
      "file": "src/tools/fetch.ts",
      "line": 47,
      "message": "fetch() called with unvalidated URL — SSRF risk"
    }
  ],
  "badge_url": "https://skillaudit.dev/badge/au_8f3b4c2d.svg",
  "report_url": "https://skillaudit.dev/reports/au_8f3b4c2d"
}

The fields you'll use most in policy scripts: grade (overall letter), score (0–100 numeric for threshold comparisons), axes.security.grade (can set a stricter gate on security specifically), and critical_findings (any HIGH severity finding can be a hard block regardless of overall grade).

Strategy 1: fail-closed (recommended for production)

Fail-closed: block merges below grade threshold

The pipeline job fails (non-zero exit) if the audit grade falls below the configured minimum. PR merge is blocked until the author either fixes the findings or gets a security-team exemption override.

Pros

Zero insecure servers reach the registry
Author fixes issues in context
Audit trail on every merge
Security policy is code, not a checklist

Cons

Can block urgent fixes if scanner is down
Requires an exemption workflow
Grade drops on dependency update PRs need policy

Strategy 2: fail-open (recommended for onboarding)

Fail-open: annotate and warn, never block

The pipeline job always exits 0, but posts the audit report as a PR comment and annotates findings with the inline annotation API. Developers see the security feedback but aren't blocked. Graduate to fail-closed after the team has adapted to the findings cadence.

Pros

No friction during adoption period
Developers learn the scoring system
No exemption process needed

Cons

Insecure servers still merge
Findings can be ignored indefinitely
Requires discipline to enforce later

Recommendation: Start fail-open for 2–3 sprints to calibrate grade baselines across your existing servers. Set the fail-closed threshold at the 25th percentile of current grades — blocks new regressions without retroactively blocking every open PR.

GitHub Actions: complete workflow

Save this as .github/workflows/mcp-security.yml in your MCP server repository:

name: MCP Security Audit

on:
  pull_request:
    branches: [ main ]
  push:
    branches: [ main ]

jobs:
  skillaudit:
    name: SkillAudit grade gate
    runs-on: ubuntu-latest
    timeout-minutes: 10

    steps:
      - uses: actions/checkout@v4

      - name: Run SkillAudit
        id: audit
        env:
          SKILLAUDIT_API_KEY: ${{ secrets.SKILLAUDIT_API_KEY }}
        run: |
          # Trigger audit — pass the public or private repo URL
          REPO_URL="https://github.com/${{ github.repository }}"
          RESPONSE=$(curl -sf \
            -H "Authorization: Bearer $SKILLAUDIT_API_KEY" \
            -H "Content-Type: application/json" \
            -d "{\"repo\": \"$REPO_URL\", \"ref\": \"${{ github.sha }}\"}" \
            https://skillaudit.dev/api/v1/audits)

          echo "$RESPONSE" > audit.json
          GRADE=$(jq -r '.grade' audit.json)
          SCORE=$(jq -r '.score' audit.json)
          REPORT=$(jq -r '.report_url' audit.json)
          SECURITY_GRADE=$(jq -r '.axes.security.grade' audit.json)
          CRITICAL=$(jq '.critical_findings | length' audit.json)

          echo "grade=$GRADE"             >> $GITHUB_OUTPUT
          echo "score=$SCORE"             >> $GITHUB_OUTPUT
          echo "report_url=$REPORT"       >> $GITHUB_OUTPUT
          echo "security_grade=$SECURITY_GRADE" >> $GITHUB_OUTPUT
          echo "critical_count=$CRITICAL" >> $GITHUB_OUTPUT

          echo "### SkillAudit Results" >> $GITHUB_STEP_SUMMARY
          echo "" >> $GITHUB_STEP_SUMMARY
          echo "| Metric | Value |" >> $GITHUB_STEP_SUMMARY
          echo "|--------|-------|" >> $GITHUB_STEP_SUMMARY
          echo "| Overall grade | **$GRADE** ($SCORE/100) |" >> $GITHUB_STEP_SUMMARY
          echo "| Security axis | **$SECURITY_GRADE** |" >> $GITHUB_STEP_SUMMARY
          echo "| Critical findings | $CRITICAL |" >> $GITHUB_STEP_SUMMARY
          echo "| Full report | [$REPORT]($REPORT) |" >> $GITHUB_STEP_SUMMARY

      - name: Post PR comment
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const grade = '${{ steps.audit.outputs.grade }}';
            const score = '${{ steps.audit.outputs.score }}';
            const report = '${{ steps.audit.outputs.report_url }}';
            const secGrade = '${{ steps.audit.outputs.security_grade }}';
            const critical = parseInt('${{ steps.audit.outputs.critical_count }}', 10);

            const emoji = { A: '🟢', B: '🔵', C: '🟡', D: '🟠', F: '🔴' };
            const icon = emoji[grade] || '⚪';

            const body = [
              `## ${icon} SkillAudit: Grade **${grade}** (${score}/100)`,
              '',
              `| Axis | Grade |`,
              `|------|-------|`,
              `| Security | **${secGrade}** |`,
              '',
              critical > 0
                ? `⚠️ **${critical} critical finding(s) detected** — see full report.`
                : '✅ No critical findings.',
              '',
              `[View full report](${report}) · [SkillAudit methodology](/methodology)`
            ].join('\n');

            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.payload.pull_request.number,
              body
            });

      - name: Enforce grade policy
        env:
          MIN_GRADE: B          # Set to A, B, C, or D
          MIN_SCORE: 70         # Numeric fallback (used when grade ties)
          BLOCK_ON_CRITICAL: true
        run: |
          GRADE="${{ steps.audit.outputs.grade }}"
          SCORE="${{ steps.audit.outputs.score }}"
          CRITICAL="${{ steps.audit.outputs.critical_count }}"

          GRADE_ORDER="A B C D F"
          min_index=$(echo $GRADE_ORDER | tr ' ' '\n' | grep -n "^${MIN_GRADE}$" | cut -d: -f1)
          actual_index=$(echo $GRADE_ORDER | tr ' ' '\n' | grep -n "^${GRADE}$" | cut -d: -f1)

          FAILED=false

          if [ "$actual_index" -gt "$min_index" ]; then
            echo "::error::Grade $GRADE is below minimum $MIN_GRADE"
            FAILED=true
          fi

          if [ "$BLOCK_ON_CRITICAL" = "true" ] && [ "$CRITICAL" -gt "0" ]; then
            echo "::error::$CRITICAL critical finding(s) detected — blocking merge"
            FAILED=true
          fi

          if [ "$FAILED" = "true" ]; then
            echo ""
            echo "Fix the findings, push a new commit, and re-run the audit."
            echo "For emergency exemptions, a team lead can add the 'security-exemption' label to bypass this gate."
            exit 1
          fi

          echo "Grade policy passed: $GRADE ($SCORE/100)"

Secret setup: Add SKILLAUDIT_API_KEY to GitHub repository secrets (Settings → Secrets and variables → Actions). On the SkillAudit Team plan, private repo audits are enabled and the API key is per-workspace, not per-repository.

GitLab CI: equivalent pipeline

For GitLab CI/CD, add this to your .gitlab-ci.yml:

skillaudit:
  stage: test
  image: alpine:3.20
  before_script:
    - apk add --no-cache curl jq
  script:
    - |
      REPO_URL="https://gitlab.com/$CI_PROJECT_PATH"
      RESPONSE=$(curl -sf \
        -H "Authorization: Bearer $SKILLAUDIT_API_KEY" \
        -H "Content-Type: application/json" \
        -d "{\"repo\": \"$REPO_URL\", \"ref\": \"$CI_COMMIT_SHA\"}" \
        https://skillaudit.dev/api/v1/audits)

      echo "$RESPONSE" > audit.json

      GRADE=$(jq -r '.grade' audit.json)
      SCORE=$(jq -r '.score' audit.json)
      CRITICAL=$(jq '.critical_findings | length' audit.json)
      REPORT=$(jq -r '.report_url' audit.json)

      echo "Grade: $GRADE ($SCORE/100)"
      echo "Critical findings: $CRITICAL"
      echo "Report: $REPORT"

      # Emit GitLab metrics for the pipeline dashboard
      cat > audit_metrics.txt <<EOF
      skillaudit_score $SCORE
      skillaudit_critical_findings $CRITICAL
      EOF

      # Grade gate
      GRADE_VALUES="A:1 B:2 C:3 D:4 F:5"
      MIN_GRADE="${SKILLAUDIT_MIN_GRADE:-B}"

      get_value() { echo "$GRADE_VALUES" | tr ' ' '\n' | grep "^$1:" | cut -d: -f2; }
      min_val=$(get_value "$MIN_GRADE")
      actual_val=$(get_value "$GRADE")

      if [ "$actual_val" -gt "$min_val" ]; then
        echo "POLICY FAILED: Grade $GRADE below minimum $MIN_GRADE"
        exit 1
      fi

      if [ "$CRITICAL" -gt 0 ]; then
        echo "POLICY FAILED: $CRITICAL critical finding(s)"
        exit 1
      fi

      echo "Policy passed."
  artifacts:
    reports:
      metrics: audit_metrics.txt
    paths:
      - audit.json
    expire_in: 30 days
  variables:
    SKILLAUDIT_MIN_GRADE: B
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
    - if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH'

Set SKILLAUDIT_API_KEY in GitLab CI/CD variables (Settings → CI/CD → Variables, masked, not protected so it's available on feature branches).

Pre-commit hook: scan before push

For a second line of defence — or for teams where PRs are too infrequent for real-time feedback — add a pre-push hook that audits the local working copy before allowing a push to reach CI:

#!/usr/bin/env bash
# .git/hooks/pre-push
# Install: cp pre-push .git/hooks/pre-push && chmod +x .git/hooks/pre-push
# Or use pre-commit framework: see .pre-commit-config.yaml below

set -euo pipefail

API_KEY="${SKILLAUDIT_API_KEY:-}"
if [ -z "$API_KEY" ]; then
  echo "SkillAudit: SKILLAUDIT_API_KEY not set — skipping pre-push audit"
  exit 0
fi

REPO_URL=$(git remote get-url origin)

echo "SkillAudit: auditing $REPO_URL ..."
RESPONSE=$(curl -sf \
  --max-time 60 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"repo\": \"$REPO_URL\"}" \
  https://skillaudit.dev/api/v1/audits) || {
  echo "SkillAudit: audit API unreachable — failing open (push allowed)"
  exit 0
}

GRADE=$(echo "$RESPONSE" | jq -r '.grade')
SCORE=$(echo "$RESPONSE" | jq -r '.score')
CRITICAL=$(echo "$RESPONSE" | jq '.critical_findings | length')
REPORT=$(echo "$RESPONSE" | jq -r '.report_url')

echo "SkillAudit: Grade $GRADE ($SCORE/100) | Critical findings: $CRITICAL"
echo "Full report: $REPORT"

if [ "$CRITICAL" -gt 0 ]; then
  echo ""
  echo "Push blocked: $CRITICAL critical security finding(s)."
  echo "Fix them before pushing. Run locally: skillaudit scan ."
  exit 1
fi

# Warn on grade below B, but don't block
if [[ "$GRADE" == "C" || "$GRADE" == "D" || "$GRADE" == "F" ]]; then
  echo ""
  echo "WARNING: Grade $GRADE — below recommended minimum B."
  echo "This won't block your push, but CI will."
fi

exit 0

For teams using the pre-commit framework, add this to .pre-commit-config.yaml:

repos:
  - repo: https://github.com/skillaudit-dev/pre-commit-hooks
    rev: v1.0.0
    hooks:
      - id: skillaudit-scan
        args: [--min-grade, B, --block-on-critical]
        always_run: false
        stages: [push]

Policy thresholds: what grade to require

Context	Recommended threshold	Rationale
Internal tooling, single-team, low sensitivity	C + no criticals	Blocks only the worst offenders; workable during adoption
Enterprise internal deployment, multi-team	B + no criticals	Catches SSRF, command-exec, credential leakage; standard DevSecOps bar
Community marketplace / public directory listing	B on overall, A on security axis	Anthropic's directory requires security review; A on security axis approximates that bar
Regulated environment (HIPAA, SOC 2, FedRAMP)	A overall	Compliance audit evidence requires documented risk assessment; A-grade is the defensible artifact

Axis-level gating

Overall grade is a weighted average. If your threat model cares most about a specific axis, gate on that axis independently. The most common split in enterprise deployments:

# Strict gate: overall B AND security A AND no credentials exposure
enforce_policy() {
  local overall="$1"
  local security="$2"
  local credentials="$3"
  local critical="$4"

  local failed=false

  # Overall grade: minimum B
  [[ "$overall" == "C" || "$overall" == "D" || "$overall" == "F" ]] && {
    echo "::error::Overall grade $overall below B"
    failed=true
  }

  # Security axis: minimum A — no SSRF, command-exec, prompt-injection
  [[ "$security" != "A" ]] && {
    echo "::error::Security axis grade $security — must be A for this repo"
    failed=true
  }

  # Credentials: any exposure = block regardless of score
  [[ "$credentials" != "A" ]] && {
    echo "::error::Credentials axis grade $credentials — credential exposure blocks merge"
    failed=true
  }

  # Critical findings: hard block always
  [[ "$critical" -gt 0 ]] && {
    echo "::error::$critical critical finding(s) — hard block"
    failed=true
  }

  $failed && exit 1
  echo "All axis policies passed"
}

SBOM export and audit log for compliance

SkillAudit's Team plan exports a Software Bill of Materials alongside each audit report. In a regulated environment, capture this as a build artifact tied to the commit SHA:

- name: Export SBOM
  run: |
    AUDIT_ID=$(jq -r '.audit_id' audit.json)
    curl -sf \
      -H "Authorization: Bearer ${{ secrets.SKILLAUDIT_API_KEY }}" \
      "https://skillaudit.dev/api/v1/audits/$AUDIT_ID/sbom" \
      -o sbom.json

- name: Upload SBOM artifact
  uses: actions/upload-artifact@v4
  with:
    name: skillaudit-sbom-${{ github.sha }}
    path: sbom.json
    retention-days: 365

This creates a per-commit SBOM artifact retained for one year — sufficient for SOC 2 Type II audit evidence and most HIPAA security risk assessments.

Exemption workflow

Grade-gates occasionally block legitimate work — a dependency upgrade that drops maintenance score, or a network tool with unavoidable external calls. You need an exemption path that doesn't become a rubber-stamp:

Author requests exemption

Adds the security-exemption-requested label on the PR. CI job detects the label and switches to fail-open mode — posts the audit results as a comment but exits 0.

Security team reviews

A required CODEOWNERS reviewer on the .github/security-exemptions/ path must approve before merge. They add the specific exemption to a YAML file: repo, commit SHA, findings exempted, expiry date, and justification.

Exemption expires

A weekly scheduled workflow re-runs audits on all exempted servers. If a finding is still open at expiry, the exemption file is auto-deleted and the next PR to that repo is blocked again. Exemptions don't accumulate silently.

# .github/security-exemptions/mcp-filesystem-server.yml
exemptions:
  - finding: "SSRF-001"
    file: "src/tools/fetch.ts"
    justification: "fetch target is a fixed internal registry URL; runtime config prevents external calls"
    exempted_by: "security-team@myorg.com"
    expires: "2026-09-10"
    commit: "a3f7c9d"

Monitoring: pipeline grade trends

Grade-gating prevents regressions, but monitoring catches slow drift — a server that held an A grade for months slowly accumulating C-grade maintenance findings as its dependencies go unmaintained.

Wire the score numeric output into your observability stack:

# In your scheduled re-scan workflow (runs weekly):
- name: Emit grade metric to Datadog
  run: |
    SCORE=$(jq -r '.score' audit.json)
    GRADE=$(jq -r '.grade' audit.json)
    REPO="${{ github.repository }}"

    curl -sf \
      -H "Content-Type: application/json" \
      -H "DD-API-KEY: ${{ secrets.DD_API_KEY }}" \
      "https://api.datadoghq.com/api/v1/series" \
      -d "{
        \"series\": [{
          \"metric\": \"skillaudit.score\",
          \"points\": [[$(date +%s), $SCORE]],
          \"tags\": [\"repo:${REPO/\//:}\", \"grade:$GRADE\"],
          \"type\": \"gauge\"
        }]
      }"

Alert on: score dropping below threshold over a 4-week window; any new critical finding on a server that was previously finding-free; any server going more than 30 days without a rescan.

Connecting CI/CD to SkillAudit's broader audit surface

CI/CD integration catches what can be caught statically at author time — SSRF patterns, command-exec paths, credential exposure, schema validation. The axes that improve after CI/CD adoption are security (finding-based) and credentials (static scan).

The axes that require runtime behaviour to score well are maintenance (last commit date, open CVEs in dependencies) and compatibility (tested against live Claude Code, Cursor, Windsurf clients). Those are scored by SkillAudit's scheduled weekly re-scan, not the CI webhook. See managing security debt over time for how to track maintenance grade drift and set up Dependabot to keep it green.

For the prompt-injection axis specifically — LLM-assisted red-teaming of tool inputs — the CI webhook runs a fast static pass. Deep prompt-injection testing (adversarial LLM calls against live tool handlers) requires the full audit report rather than the CI webhook. Run those on the main branch weekly, not on every PR. See how SkillAudit red-teams for prompt injection for the methodology.

Quick start: Copy the GitHub Actions workflow above, add your API key as a repo secret, set MIN_GRADE: C to start fail-open on criticals only, and push a commit. The first audit report will appear in your PR comment thread within 60 seconds. Graduate to MIN_GRADE: B after one sprint. That's the entire onboarding path for most teams.

Related resources

Input validation patterns — what the CI scan checks for under the security axis
Rate limiting deep dive — missing rate limiting is a C-grade trigger
Security policy template — SECURITY.md spec that feeds the documentation axis
Security debt over time — tracking maintenance grade drift after CI/CD is wired up
Permissions hygiene checklist — the permissions axis in the CI report card
SSRF attack patterns reference — the most common critical finding the CI gate will block