From CVSS to Evidence

CVSS is a useful tool when it is doing what it was designed to do: describe the theoretical severity of a known vulnerability class. It stops being useful the moment a team treats it as a triage oracle. Severity is not impact, and impact is what the engineer needs in order to act.

This post explains the gap between a CVSS score and real-world impact, and what changes when a report leads with evidence instead of a number.

What CVSS actually measures#

The CVSS v3.1 specification is explicit: the base score captures the intrinsic qualities of a vulnerability — how it is exploited and what it could affect — independent of your environment. It was never meant to express the probability that a given finding is reachable and impactful in your deployment. EPSS adds an exploitation-likelihood signal, but it too is a population-level probability, not a statement about your system.

The two failure modes#

Treating the score as a verdict produces predictable mistakes in both directions:

The phantom critical. A CVSS 9.8 finding with no reachable path from the attacker’s perspective produces nothing. Teams burn cycles “fixing” something that was never exploitable.
The underrated chain link. A CVSS 5.3 finding that composes with two other findings to produce tenant compromise produces an incident. The score told you to ignore it.

Evidence is the only signal that discriminates between those cases — and the second case, the composed chain, is exactly what attack-chain escalation is built to surface.

What changes when evidence leads#

When a finding arrives with a replayable bundle, the first question is no longer “is this real” — the bundle answers that. The first question becomes “what breaks when we fix it”. Engineers can replay the chain against a patched branch locally and confirm the fix closes the exploit before merging. That turn changes cycle times more than any severity-based prioritisation ever will, and it is the foundation of the CI-gated pentest runbook.

This is the core of deterministic proof over probabilistic CVSS: the report is built around reproduced impact, captured under sandbox guardrails, rather than a number that estimates it.

CVSS still has a job#

None of this means throwing CVSS away. It stays in the report as a secondary field, and Pentrova consumes both CVSS and EPSS as prioritisation hints that shape which chains an agent attempts first when the surface is large. They simply never decide whether a finding gets reported. Evidence leads; the score supports.

Key takeaways#

CVSS measures intrinsic severity, not environmental impact — it was never a triage oracle.
Score-led triage produces phantom criticals and underrated chain links in equal measure.
Evidence-led triage skips “is this real” and moves straight to “does the fix close it”.
CVSS and EPSS remain useful as prioritisation hints, not as the decision to report.

FAQ#

Should I stop using CVSS entirely? No. CVSS is a good shared vocabulary for severity. The mistake is using the base score as the sole driver of what gets fixed; pair it with evidence of reachability and impact in your environment.

What is the difference between CVSS and EPSS? CVSS scores intrinsic severity; EPSS estimates the probability a vulnerability will be exploited in the wild. Both are population-level signals — neither confirms exploitability in your specific deployment, which is what an evidence bundle does.

How does evidence-led triage speed up fixes? It removes the verification debate. The bundle already proves impact, so the team moves directly to remediation and can replay the exploit against the fix to confirm it before merge.

See how evidence is captured in the platform pipeline, or start a free engagement.

From CVSS to evidence: why severity scores are not a triage oracle