Jun 7, 2026 Product Management

MCP server security for product managers: translating SkillAudit grades into business risk

Your team wants to ship faster by adding new MCP servers. Your security team is worried about what those servers can access. You're caught in the middle, trying to make a call without the vocabulary to speak both languages fluently. This guide gives you that vocabulary.

Why product managers need to understand MCP security — not just delegate it

Traditional software procurement has a comfortable division of labor: engineers evaluate technical fit, legal reviews contracts, security reviews compliance, and PMs synthesize the business case. MCP servers break this model.

An MCP server runs locally on the machines of everyone who uses it. It executes code. It makes outbound network requests. It holds credentials — often the same API tokens that connect to your customers' data, your internal databases, or your cloud infrastructure. And crucially, the primary attack surface is the LLM itself: a malicious document in a file an MCP server reads can instruct the server to exfiltrate credentials or make unauthorized API calls. There is no clickable phishing link to block. There is no obvious attack indicator.

This threat model is different enough from traditional SaaS tools that standard procurement checklists — SOC 2 report, penetration test, data processing agreement — miss the most important questions. As a PM, you are often the person who decides whether a new MCP server is approved for internal use before those checklists even start. Getting that call wrong costs more than a delayed feature.

You do not need to become a security engineer. You need to understand what the grades mean in business terms — and know when to escalate.

The SkillAudit grade scale, in plain English

SkillAudit scans MCP servers and produces an overall letter grade (A–F) plus five sub-scores: Security, Permissions, Credentials, Maintenance, and Documentation. The overall grade is a weighted composite. Here is what each letter means in the language of business risk.

A (90–100) — Approve for standard use

The server follows current best practices across all dimensions. Credentials are scoped narrowly, outbound requests are allowlisted, the dependency tree is patched, and the documentation is good enough that your team can audit behavior. An A-grade server adds no material security risk relative to any other approved internal tool. Standard monitoring applies.

B (75–89) — Approve with noted mitigations

Mostly clean. One or two sub-scores are below ideal — typically a slightly broad token scope, a dependency that is outdated but not critically vulnerable, or a missing security disclosure policy. These are real gaps but not in the dangerous tier. Approve with a note in your decision log, and plan a rescan in 90 days. If the B grade is driven by a Credentials or Security sub-score (rather than Maintenance or Docs), require the gap to be addressed before production rollout.

C (60–74) — Conditional approval with exception

There are meaningful gaps that your security team needs to evaluate. A C grade almost always means at least one sub-score is in the D range — an SSRF risk left open, credentials with write access to production data, or a dependency with a known CVE. A C-grade server should not be approved for use with production credentials or customer data without a security exception sign-off. Require the vendor or team to run through a remediation plan before full rollout.

D (40–59) — Do not approve without security review

Multiple serious gaps. At this grade level, findings typically include direct injection risks (user inputs reach file paths or shell commands without sanitization), credentials with far broader scope than the tool needs, or dependencies with critical CVEs. A D grade is a hard stop on production use. Internal development environments only, and only with stripped-down credentials. Require a full security review and remediation before any production exposure.

F (0–39) — Hard block

Do not approve under any circumstances until remediated. F-grade servers have critical unmitigated vulnerabilities — typically SSRF with no allowlisting, shell command execution with user-controlled input, or credentials stored in plaintext accessible via the MCP interface. These are not edge-case theoretical risks; they are patterns that have been exploited in real incidents. Block installation organization-wide and notify the vendor that remediation is required.

The sub-score override rule: why the overall grade is not the whole story

The most important thing to understand about SkillAudit grades is that a single critical sub-score can be masked by the overall composite. A server with Maintenance A, Documentation A, Permissions B — and Security D — will produce an overall grade somewhere in the C range. The composite looks manageable. The Security D is not.

Apply this rule: if any single sub-score is D or F, treat the server as if the overall grade is D or F, regardless of the composite.

This is not theoretical conservatism. The security sub-score covers SSRF, injection, and prompt manipulation — the vulnerabilities most likely to cause an actual incident. A D on Security means the scanner found active exploitable patterns. A great maintenance record and good documentation do not compensate for a server that lets a malicious document trigger outbound requests to an attacker-controlled server.

Sub-score escalation rule

In your intake process, always surface the five sub-scores explicitly. Never rely on the composite grade alone. If the Permissions or Security sub-score is below 60, escalate to your security team before proceeding — regardless of what the overall grade shows.

What each sub-score means for your business

Here is a translation of each sub-score dimension into the business categories your stakeholders care about.

Security sub-score — direct exploitation risk

This is the sub-score that determines whether a vulnerability can be exploited right now. It covers SSRF (server-side request forgery — whether the tool can be tricked into making requests to internal network addresses), injection (whether user input or third-party content can reach shell commands or file paths without sanitization), and prompt injection resistance (whether the tool's code passes raw third-party content back to the LLM in ways that allow instruction injection).

Business translation: A low Security sub-score means there is a credible path from "attacker influences content" to "credentials exfiltrated" or "internal API called without authorization." In the current threat environment — where MCP servers read documents, emails, issue trackers, and Slack messages — this is not a low-probability scenario.

Permissions sub-score — blast radius control

This covers whether the tool requests only the capabilities it actually needs (least privilege), whether tool schema arguments are constrained to narrow, valid inputs (or accept arbitrary freeform strings that reach sensitive operations), and whether the server's MCP permission declarations match its actual runtime behavior.

Business translation: A low Permissions sub-score means that when something goes wrong — whether a bug, a compromised account, or a prompt injection attack — the damage is larger than it needs to be. A tool that has full repository write access when it only needs issue read access doesn't just expose a single resource; it exposes everything. Permissions are your blast radius control.

Credentials sub-score — access scope and handling hygiene

This covers how the server requests, stores, and logs credentials. Does it request minimum-necessary OAuth scopes? Are credentials stored in a way that prevents leakage to error logs or LLM responses? Can credentials be rotated without restarting the server?

Business translation: A low Credentials sub-score means that your team's credentials — API keys, OAuth tokens, service account secrets — are being managed in ways that make them easier to steal or accidentally expose. This is the sub-score most directly tied to your data breach incident response obligations.

Maintenance sub-score — supply chain risk

This covers whether the server's dependency tree is up to date, whether a lockfile is committed (so you know exactly what is installed), and whether the project has a published security disclosure policy. It does not measure how recently the project was updated — it measures how responsibly the author manages dependencies and vulnerability disclosure.

Business translation: A low Maintenance sub-score is a supply chain risk signal. It means the server's dependencies may include packages with known CVEs. It also means that if a vulnerability is discovered in the server itself, there may be no channel for responsible disclosure — no security contact, no SLA on patches.

Documentation sub-score — operational risk and auditability

This covers whether the README contains a runnable quickstart (verifying that the server actually works as described), whether the project includes a SECURITY.md with a disclosure contact, and whether the code is commented enough that a reviewer can audit its behavior without running it.

Business translation: Documentation gaps create operational risk in two forms: onboarding risk (your team configures the tool incorrectly because the instructions are wrong) and auditability risk (when something goes wrong, you cannot determine from the documentation what the tool was supposed to do, making incident investigation harder). A low Documentation sub-score is not a blocker for approval, but it is a signal to require internal runbooks before rollout.

How to communicate MCP security findings to leadership and legal

Security engineers speak in CVSSv3 scores, CWE identifiers, and vulnerability categories. Leadership and legal speak in liability, probability of incident, cost of breach, and regulatory exposure. You need to translate in both directions.

Audience	What they care about	How to frame the grade
VP / Director	Can we ship this feature? What is the business risk if we do or do not?	"The server grades C overall with a Security D. An exploit path exists that could expose our production credentials. We can unblock the team in two weeks with a specific remediation plan — here is the tradeoff."
Legal / Compliance	Does this create a contractual exposure? Does it touch regulated data?	"The Credentials sub-score is D, meaning the tool requests write access to customer data repositories and does not have a mechanism for rotation. If we use this with production credentials, a breach would require customer notification under our DPA. We are blocking it until the scope is narrowed."
Security team	What does the scanner actually find? Is this a known CWE pattern?	Share the full SkillAudit report URL. Let them read the raw findings. Your job here is triage and escalation, not interpretation.
Engineering team	How do we fix this without rewriting the tool? What is the fastest path to approval?	Point to the remediation sub-scores and the specific findings. A C→A remediation usually takes 2–4 weeks with a focused sprint. Link to the C-to-A remediation plan if the tool is internal.

Framing tip

Avoid framing grades as "the tool is bad." Frame them as "the tool has specific, fixable gaps." Leadership blocks when they hear condemnation; they engage when they hear a path forward. A C grade with a two-week remediation plan is a scheduling discussion. A C grade with no path forward is a risk veto.

Common PM mistakes when evaluating MCP servers

Mistake 1: Using star count and README quality as proxies for security

GitHub stars measure popularity. A README measures documentation effort. Neither correlates reliably with security grade. Some of the most popular community MCP servers grade D or F on Security because they were written to demonstrate functionality, not to be deployed in production environments handling real credentials. The inverse is also true — some relatively unknown servers grade A because a security-focused author wrote them carefully.

Mistake 2: Approving based on overall grade without checking sub-scores

As described above, a C overall can contain a Security D. Always look at the five sub-scores. Set a policy: any sub-score below 60 in Security or Permissions is an automatic escalation, regardless of the composite.

Mistake 3: Treating "it's open source" as a security signal

Open source means the code can be audited — it does not mean it has been. Most community MCP servers have never received a security review. Treat open source as an opportunity to request a scan, not as a substitute for one.

Mistake 4: Approving for "dev only" without credential separation

The phrase "it's only for development" is only meaningful if developers use different credentials in development than in production. If your engineers configure the MCP server with the same API token they use for production work — which is the default when MCP configuration lives in a shared dotfile — then "dev only" approval is production exposure. Require credential separation as part of any conditional approval.

Mistake 5: Not tracking the MCP server inventory

MCP servers are installed locally, often without going through a procurement process. They may be present on dozens of developer machines before a PM ever hears about them. Establish a light intake process — even a shared spreadsheet — and communicate it to engineering leads. The goal is not to block installation; it is to ensure that servers touching production credentials have been evaluated before they are configured with them.

Incident pattern

The most common MCP security incident pattern is not a sophisticated supply chain attack. It is a developer installing a community MCP server, configuring it with their production API key to test a feature, forgetting to rotate the key, and the server later being found to have a SSRF vulnerability. The install was unofficial. The key was never rotated. There was no inventory record. The audit trail was empty. None of this requires a sophisticated attacker.

Building a lightweight MCP server intake process

You do not need a lengthy procurement process for every MCP server request. You need a lightweight triage process that catches the high-risk cases quickly and auto-approves the low-risk ones without creating bottlenecks. Here is a four-step intake workflow that fits inside a single async ticket:

Run the SkillAudit scan before anything else

Before reading the README or opening the GitHub page, scan the repository. The grade takes seconds and gives you the triage tier. An A? Route straight to engineering with a standard approval note. An F? Block immediately and notify the requester. C or D? Route to security review.

Check which credentials will be configured with it

Ask the requester: what API key or token will be used to configure this server, and what does that token have access to? A C-grade server configured with a read-only, single-service token is a different risk profile than the same server configured with an admin token for your production environment. Credential scope is the multiplier on every vulnerability.

Apply the sub-score override rule

If any sub-score is below 60 in Security or Permissions, escalate to your security team. This is a non-negotiable gate. Every other approval decision can be made at the PM level — this one requires a security sign-off. The escalation does not mean the tool is blocked; it means the risk decision is owned by the right person.

Document the decision and set a rescan date

Record: the grade at approval time, which sub-scores triggered review, which credential scope was approved, and a 90-day rescan date. MCP server security grades change as the code evolves — a server that grades A today may grade C in six months if maintainers stop updating dependencies or introduce a new feature with a vulnerability. The rescan date is your safety net.

PM intake checklist

SkillAudit scan run and grade recorded
Sub-scores reviewed individually (not just overall grade)
Security sub-score ≥ 60 confirmed, or escalated to security team
Permissions sub-score ≥ 60 confirmed, or escalated to security team
Credential scope reviewed — minimum-necessary token, not admin key
Approval tier assigned (A: standard / B: noted / C: exception / D: blocked / F: hard block)
Decision documented in intake tracker
Rescan date set (90 days from approval)

When to involve legal, and what to tell them

Not every MCP server needs a legal review. Use this simple trigger: if the server has access to any data that falls under a contractual or regulatory obligation — customer data, PII, financial records, health information — legal needs to know it exists and what its access scope is.

When you brief legal on an MCP server, they do not need to understand MCP. They need to know three things: what data the tool can access, what the security grade is, and what would be required for a notification obligation in the event of a breach. Frame it this way: "We are evaluating an internal developer tool that reads our GitHub repositories including issues that contain customer-reported bugs. The tool's security grade is B. The access scope is read-only on a single repository. A breach of this tool would require us to assess whether any customer PII was included in issues — which we should define now rather than under incident pressure."

That framing gives legal the information they need to prepare, without requiring them to understand MCP protocol semantics.

Using SkillAudit rescans to track vendor improvement

For MCP servers your team relies on heavily — especially vendor-supplied servers that integrate with production systems — periodic rescans are a lightweight way to monitor whether the vendor is maintaining the security posture that justified your initial approval.

Set a policy: any server used with production credentials is rescanned quarterly. Any server that drops a full grade tier (A to B, B to C) triggers an automatic review. Any server that drops into C territory requires a decision about whether the current credential scope is still appropriate.

This is especially important for community-maintained servers that your team depends on. Maintainers change. Attention drifts. A server that started as a careful, security-focused project may evolve features that introduce vulnerabilities — and without periodic rescans, you would not know until something breaks.

Team plan tip

SkillAudit's Team plan supports organization-wide scanning and periodic rescan scheduling. If your organization has more than a handful of MCP servers in use, automated rescans with grade-change alerts remove the manual tracking burden entirely.

The one-paragraph summary for your next planning meeting

MCP servers execute code on your engineers' machines and often hold credentials to your production systems. Unlike traditional SaaS tools, they can be manipulated by the content they process — a malicious document an MCP server reads can instruct it to make unauthorized API calls. SkillAudit grades measure five dimensions of this risk: exploitability (Security), blast radius (Permissions), credential exposure (Credentials), supply chain hygiene (Maintenance), and operational auditability (Documentation). An A grade means approve; an F means hard block. Any sub-score below 60 in Security or Permissions escalates to your security team regardless of the overall composite. With a four-step intake process and quarterly rescans, you can move fast on new tools without taking on unacceptable risk.