F

Public audit · 2026-04-24

stripe/agent-toolkit

Overall: F (0/100) · v0.2 scan · 6 axes · LLM prompt-injection probe

SkillAudit report — stripe/agent-toolkit

Scanned 2026-04-24 by SkillAudit v0.2 (static checks + LLM-assisted prompt-injection red-team).
Commit: dd6deb0 · Stars: 1499 · Days since last push: 2
LLM prompt-injection probe: skipped — set ANTHROPIC_API_KEY to enable the LLM-assisted prompt-injection red-team

Overall grade: F (0/100)

AxisScoreGrade
security10/100F
permissions100/100A
credentials0/100F
maintenance100/100A
compatibility70/100C⚠️
docs100/100A

Security findings

Production sources:

const response = await fetch(\/session-status?session_id=${sessionId}\);

const response = await fetch(\/session-status?session_id=${sessionId}\);

const res = await fetch(fetchUrl, {

with urllib.request.urlopen(url, timeout=20) as response:

const response = await fetch(\/session-status?session_id=${sessionId}\);

const response = await fetch(\/session-status?session_id=${sessionId}\);

const fetcher = (url: string) => fetch(url).then((res) => res.json());

const fetcher = (url: string) => fetch(url).then((res) => res.json());

const fetcher = (url: string) => fetch(url).then((res) => res.json());

const fetcher = (url: string) => fetch(url).then((res) => res.json());

Permissions

_No findings on this axis._

Credentials

Production sources:

sk_test_*** (Stripe test secret, 28 chars)

sk_test_*** (Stripe test secret, 28 chars)

sk_test_*** (Stripe test secret, 28 chars)

sk_test_*** (Stripe test secret, 28 chars)

sk_test_*** (Stripe test secret, 28 chars)

console.log(\Customer ID: ${process.env.STRIPE_CUSTOMER_ID}\n\);

benchmarks/card-element-to-checkout/environment/server/.env.example

benchmarks/card-element-to-checkout/grader/.env.example

benchmarks/card-element-to-checkout/solution/server/.env.example

benchmarks/checkout-gym/.env.example

Maintenance

_No findings on this axis._

Compatibility

Production sources:

Documentation

_No findings on this axis._


Methodology

SkillAudit v0.2 clones the repo at the provided ref (default: default branch, HEAD) into an ephemeral sandbox, runs six static checks over .js/.ts/.py sources, queries the GitHub API for maintenance signals, and runs an LLM-assisted prompt-injection red-team over the MCP tool surface. Each axis is scored against the rubric at .

The prompt-injection axis extracts each server.tool(...) / @app.tool registration + the first ~60 lines of handler body, hands them to Claude Haiku 4.5 with a red-team system prompt, and asks for structured findings on untrusted-content flow into tool responses. One API call per scan, bounded at ~15K input tokens.

How to improve this grade

_Report generated by skillaudit.dev_

Want your repo audited?

First 100 audits go to waitlist signups in order. The engine runs against public GitHub URLs today.

Join the waitlist →