Forge-CRS — Autonomous Cyber Reasoning System

Overview

← Back to outcome

Forge-CRS — Autonomous Cyber Reasoning System

Forge-CRS is a working Cyber Reasoning System (CRS): software that, with no human in the loop, discovers a vulnerability in a piece of code, exploits it (produces a minimal proof-of-vulnerability), patches the source, and proves the patch both closes the hole and preserves behaviour.

This is the same loop DARPA's AI Cyber Challenge (AIxCC) was created to push on: *find-then-fix at machine speed*. Forge-CRS implements that loop end-to-end and runs it as a single command against a benchmark of real-world vulnerability classes.

DISCOVER ──▶ EXPLOIT ──▶ PATCH ──▶ VERIFY
 (fuzz)     (triage/PoV)  (rewrite)  (PoV neutralized + regression held)

What it actually does (live, executed)

Running node bin/crs.mjs run autonomously processes a benchmark of five seeded-but-real vulnerability classes and, for each one:

StageTechnique
DiscoverCoverage-guided fuzzing using real V8 block coverage as feedback, with dictionary + structure-aware mutation. New-coverage inputs are kept in the corpus (AFL/libFuzzer principle).
DetectA multi-signal crash oracle per bug class: prototype-pollution canary, path-containment, shell-metacharacter, wall-clock hang (ReDoS, run in a sandboxed worker), and unhandled out-of-bounds.
ExploitCrash minimization (delta-debugging / binary search) to a tight PoV, then independent CWE classification.
PatchSemantic patch synthesis — a per-class source rewrite (key-guard, base-dir containment, argv-not-shell, linear regex, bounds clamp).
VerifyThe patch is accepted only if the PoV is neutralized *and* every functional regression case still passes.

The benchmark (CWE-classed, all from the real-world OSS bug catalogue):

TargetCWEClass
config-mergeCWE-1321Prototype Pollution
path-storeCWE-22Path Traversal
task-runnerCWE-78OS Command Injection
regex-validateCWE-1333ReDoS / catastrophic backtracking
binary-readerCWE-125Out-of-bounds Read

Latest verified run: 5/5 discovered, 5/5 classified, 5/5 remediated, deterministic across identical seeds, in a few seconds of wall-clock. See verification-report.md.

What it does NOT do (honest scope)

Forge-CRS is a faithful, fully-working microcosm of an AIxCC-style CRS — not a competition-grade system. Read certification-report.md for the full disclosure. In short:

  • The executed pipeline operates on the JavaScript/Node language adapter.

The engine (registry/adapter interface) is language-agnostic, but the C/C++/Java adapters that AIxCC actually scores — native sanitizers (ASan/UBSan), libFuzzer/AFL++ harnessing — are architectural seams, not live in this package.

  • Patch synthesis uses bug-class repair strategies, not free-form

program repair of novel/unknown bugs.

  • It runs at single-file benchmark scale, not whole-repo OSS scale.

Quick start

cd app
node bin/crs.mjs run            # run the autonomous campaign + print the table
node bin/crs.mjs run --report   # also write .work/campaign-report.json
node bin/crs.mjs run --target path-store   # one target
node ../verify.mjs              # full MUST_PASS verification (writes the report)

Requires Node.js ≥ 20 (uses node:inspector precise coverage and worker_threads). Zero external dependencies.

See user-guide.md for how to read the output and add your own target, and run-deploy-instructions.md to wire it into CI.