Forge-CRS — Autonomous Cyber Reasoning System
A working AIxCC-style cyber reasoning system that autonomously discovers, exploits, patches, and re-verifies real-world software vulnerability classes — coverage-guided fuzzing, crash triage, semantic patching, and patch validation in one unattended loop.
At a glance
The autonomous loop
1 · Discover
Coverage-guided fuzzing on real V8 block coverage, with dictionary + structure-aware mutation. New-coverage inputs are kept and evolved.
2 · Exploit
A multi-signal crash oracle confirms a real bug; the crash is minimized to a tight PoV and classified to a CWE independently of ground truth.
3 · Patch
A semantic source rewrite for the bug class is synthesized and applied to a throwaway copy — never trusted until proven.
4 · Verify
The patch is accepted only if the PoV is neutralized AND every functional regression case still passes.
Benchmark targets
config-merge CWE-1321
Prototype Pollution — discovered in 3 execs, PoV minimized to 31B, patch verified (2/2 regression).
path-store CWE-22
Path Traversal — discovered in 189 execs, PoV minimized to 2B, patch verified (2/2 regression).
task-runner CWE-78
OS Command Injection — discovered in 14 execs, PoV minimized to 1B, patch verified (1/1 regression).
regex-validate CWE-1333
ReDoS (catastrophic backtracking) — discovered in 40 execs, PoV minimized to 23B, patch verified (4/4 regression).
binary-reader CWE-125
Out-of-bounds Read — discovered in 5 execs, PoV minimized to 2B, patch verified (2/2 regression).
Deliverables
Honest scope
DARPA's AIxCC exists because autonomous find-and-fix on real-world OSS at scale is unsolved. Forge-CRS is a faithful, fully-working microcosm of that loop — not a finalist-grade system. The executed pipeline runs on the JavaScript/Node language adapter; the C/C++/Java adapters (native sanitizers, libFuzzer/AFL++, Jazzer), whole-repo scale, and free-form program repair are disclosed as architectural seams, not live. Full disclosure in the certification report.
Delivery metrics
Tokens, elapsed time, and cost for producing this outcome. Basis: Estimated — reproducible model over this outcome’s published artifacts. metrics.json
Cost and tokens are estimates derived deterministically from published artifacts and representative list pricing; actual billing may differ. Model basis: claude-sonnet-class (representative).