F

Public audit · 2026-04-30

Klavis-AI/klavis

Overall: F (0/100) · v0.3 scan · 6 axes · LLM prompt-injection probe

SkillAudit report — Klavis-AI/klavis

Scanned 2026-04-30 by SkillAudit v0.3 (surface-tiered static checks + LLM-assisted prompt-injection red-team).
Commit: 9ab5070 · Stars: 5716 · Days since last push: 0
LLM prompt-injection probe: skipped — set ANTHROPIC_API_KEY to enable the LLM-assisted prompt-injection red-team

Overall grade: F (0/100)

AxisScoreGrade
security0/100F
permissions100/100A
credentials0/100F
maintenance90/100A
compatibility70/100C⚠️
docs100/100A

Security findings

Production sources:

const response = await axios.get(url, {

const response = await axios.get(url + '?' + scheme.toString(), {

const response = await fetch(url, {

const response = await fetch(url, {

const webResponse = await fetch(webUrl, {

const response = await fetch(url, {

Examples / samples (low-weight) — 4 total, deduct 5/0 per high/warn:

response = requests.post(url, headers=self.headers, json=payload)

response = requests.post(url, headers=self.headers, json=payload)

response = requests.post(url, headers=self.headers, json=payload)

response = requests.post(openai_url, headers=openai_headers, json=payload)

Test source (low-weight) — 3 total, deduct 5/0 per high/warn:

const response = await axios.get(\${BASE_URL}/openapi.json\)

const response = await axios.get(\${BASE_URL}/openapi.json\)

const response = await axios.get(\${BASE_URL}/openapi.json\)

Permissions

_No findings on this axis._

Credentials

Production sources:

sk_test_*** (Stripe test secret, 35 chars)

sk_test_*** (Stripe test secret, 35 chars)

sk_test_*** (Stripe test secret, 35 chars)

ghp_*** (GitHub personal access token, 26 chars)

ghp_*** (GitHub personal access token, 26 chars)

console.log(\server running on port ${process.env.PORT || 5000}\);

console.log(\Connected to: ${process.env.WORDPRESS_SITE_URL}\);

Examples / samples (low-weight) — 3 total, deduct 5/0 per high/warn:

xoxb-*** (Slack bot token, 24 chars)

xoxp-*** (Slack user token, 25 chars)

examples/agno-klavis/.env.example

Test source (low-weight) — 3 total, deduct 5/0 per high/warn:

ghp_*** (GitHub personal access token, 40 chars)

ghp_*** (GitHub personal access token, 40 chars)

ghp_*** (GitHub personal access token, 40 chars)

Maintenance

Production sources:

238 open

Compatibility

Production sources:

Documentation

_No findings on this axis._


Methodology

SkillAudit v0.3 clones the repo at the provided ref (default: default branch, HEAD) into an ephemeral sandbox, runs six static checks over .js/.ts/.py sources, queries the GitHub API for maintenance signals, and runs an LLM-assisted prompt-injection red-team over the MCP tool surface. Each axis is scored against the published rubric — surface tiers, per-(axis, surface) caps, grade buckets, and worked examples are all documented there.

The v0.3 calibration update introduces surface tiering: every finding is tagged with the code path it lives in (production / installer / examples / benchmarks / scripts / test). Production findings deduct at full weight (-30 high, -10 warn); installer findings deduct at half (-15 / -5); examples, benchmarks, top-level scripts, and tests deduct at low weight (-5 / 0). This stops a chatty benchmarks/ or samples/ directory from dominating an otherwise-clean MCP server's grade.

The prompt-injection axis extracts each server.tool(...) / @app.tool registration + the first ~60 lines of handler body, hands them to Claude Haiku 4.5 with a red-team system prompt, and asks for structured findings on untrusted-content flow into tool responses. One API call per scan, bounded at ~15K input tokens.

How to improve this grade

_Report generated by skillaudit.dev_

Want your repo audited?

First 100 audits go to waitlist signups in order. The engine runs against public GitHub URLs today.

Join the waitlist →