Skip to main content

Pentrova is launching soon. Join the waitlist for early access.Join the waitlist

Research

Sample

Deterministic proof beats probabilistic CVSS: why replayable exploits change triage

Replayable exploit bundles change triage economics more than any severity score. Here is why deterministic proof beats probabilistic CVSS.

Pentrova Research Pentrova Research
7 min read

Reading mode

Security teams have spent a decade drowning in findings that look confident on a dashboard and evaporate the moment an engineer tries to reproduce them. CVSS scores, EPSS percentiles, and “critical” labels all encode a probability that something bad could happen. Pentrova takes a different position: if you cannot replay the exploit against the running system, you do not have a finding — you have a hypothesis.

This post explains why deterministic proof changes the economics of vulnerability triage, what a replayable evidence bundle actually contains, and where probability scores still earn their place.

The hidden cost of probabilistic scanning#

Probabilistic scanners over-report by design. A pattern match on a response header, a CVE match on a dependency manifest, a suspicious-looking parameter — each becomes a ticket that a human has to triage and then manually verify for exploitability in context.

That verification step is where security debt accumulates:

  • Engineers stop trusting the queue. When four out of five “criticals” turn out to be unreachable, the fifth one gets the same skeptical shrug.
  • The backlog grows faster than it drains. Re-verifying the same false positive every quarter is pure waste.
  • Real issues rot inside the noise. The exploitable bug and the theoretical one look identical on the dashboard.

The CVSS specification itself is explicit that the base score describes severity, not risk in your environment. Severity is not impact, and impact is what the engineer needs in order to act.

What deterministic proof looks like#

A deterministic proof of concept is not a screenshot and a sentence. It is a self-contained evidence bundle:

{
  "finding_id": "████",
  "target": "https://staging.api.example.com",
  "request_chain": ["login", "upload", "trigger"],
  "artifacts": ["command_output.txt", "screenshot.png"],
  "bundle_hash": "sha256:████"
}

Every step is replayable. Pentrova’s verifier re-executes the chain in an isolated sandbox, captures the response byte-for-byte, and only then marks the finding confirmed. The bundle hash lets a downstream auditor recompute the SHA-256 over the bundle contents and confirm it has not been altered since the engagement closed.

Critically, the bundle replays without Pentrova’s control plane. An engineer can re-run the captured request — for an API finding, that is a plain reproducible request such as a curl command — against a patched branch and watch the exploit fail. That is the whole point.

Triage becomes a decision, not a debate#

When triage starts from a bundle that already demonstrates impact, the conversation skips the “is this real” loop entirely:

  1. Before: Is this exploitable? Can anyone reproduce it? Is the scanner wrong again?
  2. After: The bundle proves it. What do we change, and does the change close the exploit?

Engineers replay the chain against the fix branch locally and confirm the patch works before merging. That shift compresses cycle time more than any severity-based prioritisation ever will, and it is the same loop our CI-gated pentest runbook builds a release gate around.

Where probability still helps#

Deterministic proof does not mean throwing away every score. Pentrova still consumes CVSS and EPSS — as prioritisation hints, not as evidence. They shape which chains an agent attempts first when the attack surface is large. They never decide whether a finding gets reported. The boundary is the rule we hold the platform to: AI decides what to test next; evidence decides what becomes a finding.

Key takeaways#

  • Probabilistic scanners estimate risk; they cannot confirm it, so they over-report and erode trust in the queue.
  • A deterministic proof of concept is a replayable, hash-verified evidence bundle that reproduces the exploit against the live target.
  • Replayable proof lets engineers verify a fix before merge, which is where the real cycle-time win lives.
  • CVSS and EPSS stay useful as prioritisation signals — never as the decision to report.

FAQ#

Is CVSS useless? No. CVSS is a sound way to describe the theoretical severity of a vulnerability class. It stops being useful the moment a team treats it as a triage oracle for their environment, where reachability and impact — not severity — decide what matters.

How is a finding “confirmed” deterministically? Pentrova re-executes the captured exploit chain in a sealed sandbox. If the markers of success reproduce, the finding is published with its evidence bundle; if they do not, it is dropped before it reaches your queue.

Can my engineers reproduce a finding without Pentrova? Yes. Each bundle ships the captured request/response exchange and a reproducible command so the exploit replays from the audit pack alone. See the platform pipeline for how evidence is captured.

Probabilistic scans estimate risk. Pentrova demonstrates it. See a deterministic engagement in action.

Updated

Written by

Pentrova Research Pentrova Research

Pentrova Research writes about deterministic offensive-security proof, LLM-driven pentest chains, and how to ship exploit-grade evidence into engineering pipelines.

Keep reading

Site search

↑↓ navigateEnter openEsc close