Forge Engagement · Office of the CFO · Working-Capital Intelligence

Cash Recovery Engine

A decision-intelligence system that directs scarce collection capacity to the receivables where human effort actually converts to cash — pairing calibrated payment-propensity, causal uplift modeling, and constrained optimization.

+127% cash vs. strongest baseline 2.9× FIFO at equal hours AUC 0.8173 Verification 13 / 13
Executive Summary

More cash from the same collection team — by acting where effort changes the outcome

Mid-market and enterprise finance teams carry large overdue balances and a fixed number of collector-hours. The prevailing playbook — work the oldest invoice or the biggest balance — spends that scarce time on accounts that would have paid anyway, and on accounts no outreach will move. The Cash Recovery Engine reallocates the same hours to the moveable middle, the invoices where a call measurably raises the odds of payment.

+127%
More cash recovered
vs. the strongest simple worklist, equal hours
2.9×
Vs. FIFO
cash recovered at the same budget
0.8173
Propensity AUC
held-out; ECE 0.018
49%
Of movable ceiling
captured within budget
The Business Problem

Collection capacity is finite. Most of it is spent in the wrong place.

Every dollar trapped in overdue receivables is working capital that cannot fund payroll, inventory, or growth. Days Sales Outstanding (DSO) is the headline metric finance is measured on — yet the levers to move it are blunt.

Collectors triage by intuition or by simple rules. Oldest-first (FIFO) chases aging that may be uncollectible. Largest-first burns hours on a handful of complex, dispute-laden accounts. Neither asks the only question that matters for ROI: where does an hour of effort actually convert to cash?

The result is predictable — cash that could arrive this quarter arrives next quarter, or never; collector time is consumed by accounts that would have self-cured; and leadership has no defensible way to size the opportunity being left on the table.

1

Effort is mispriced

Large balances cost disproportionately more to collect (negotiation, approvals, disputes) — but conventional worklists ignore effort entirely.

2

Wasted touches

A large share of invoices self-cure. Working them adds cost, not cash.

3

No causal view

Teams rank by "who is late," not by "whom can we actually move" — the question uplift answers.

4

Unquantified upside

Without a model, the recoverable opportunity and the cost of the status quo are invisible to the CFO.

The Solution

Four reasoning steps, one ranked worklist

This is decision intelligence, not a CRUD dashboard. The ranking cannot be reproduced by sorting any single column — it is the joint output of two learned models and a constrained optimizer.

01

Who pays anyway?

A calibrated logistic model estimates each invoice's probability of paying within the horizon with no action — the self-cure rate (ECE 0.018).

02

Whom can we move?

A second model estimates pay-if-worked. The uplift between the two isolates the moveable middle — where a collector hour converts to cash.

03

Where do hours go?

A capacity-constrained optimizer (0/1 knapsack) fills the collector-hours budget by expected cash per hour, pricing in the effort big balances truly cost.

04

What's the play?

Every invoice gets a recovery segment — Persuadable, Self-Cure, At-Risk, Critical — and a recommended action, in a downloadable worklist.

Business Impact

Same hours, materially more cash

Measured on a held-out benchmark ledger under an identical, binding 200-hour budget. Every strategy faces the same invoices, the same effort costs, and the same budget; only the prioritization differs.

$1.67M
Engine — cash recovered
within the budget
+127%
Over strongest baseline
vs. $734K
2.9×
Vs. FIFO
$1.67M vs $566K
49%
Of movable ceiling
$3.42M theoretical max
Prioritization strategyCash recovered (equal hours)Relative to Engine
Cash Recovery Engine (uplift × $/hour)$1,665,4461.00×
Largest balance first$733,6030.44×
FIFO (oldest first)$565,7420.34×
Random$454,2690.27×

Illustrative single-portfolio projection: on a $19.01M open ledger (900 invoices), an 80-hour week directs effort to 61 invoices and projects $524K of cash acceleration (model estimate, not realized cash — see Proof Layer).

Architecture

A deterministic pipeline behind a non-bypassable proof gate

Dependency-free and seeded end to end, so every figure is reproducible and auditable.

DATA

Ingest

Resolved history + open ledger (CSV / ERP export). Observable fields only.

FEATURES

Engineer

Aging, balance, dispute, customer payment history, credit health.

MODELS

Propensity + Uplift

T-learner: self-cure vs. pay-if-worked → per-invoice uplift.

OPTIMIZE

Allocate

Constrained knapsack on expected cash per collector-hour.

OUTPUT

Worklist + Reports

Ranked CSV, JSON, executive summary, live console.

Proof Layer (IRS_AUDITOR) — a mandatory, non-bypassable gate re-runs verification, derives claim lineage, checksums every file, and assigns the certification state, Evidence Grade, and Trust Score. No output ships unless the gate permits it.
Screenshots

The decision console

dashboard.png
Live decision console. The capacity dial re-optimizes the worklist, KPI header, recovery curve and segment mix in real time. The engine (green) sits above FIFO and largest-balance at every collector-hours budget — the visible gap is recoverable cash conventional worklists forfeit.
overview.png
KPI header. Projected cash accelerated, lift versus FIFO at equal hours, invoices on the worklist, and total open receivables analysed.
Interactive Demo

Move the capacity dial — watch the engine re-optimize

A live, in-page instance running on a verified synthetic ledger. Drag the collector-hours dial and the worklist, KPIs, recovery curve, and segment mix recompute instantly with the same deterministic optimizer that ships in the engine.

cash-recovery-engine · live worklist

Open the demo full-screen ↗

Metrics

Model quality and decision quality

MetricValueWhat it tells you
Payment-propensity AUC (held-out)0.8173Ranks payers above non-payers well above chance
Calibration — ECE0.018Predicted probabilities match observed frequencies
Calibration — Brier0.1588Mean squared error of probabilistic predictions
Skill vs. strongest baseline+127%Extra cash from uplift-aware allocation
Lift vs. FIFO (equal hours)2.9×Improvement over the most common worklist
Capture of movable-cash ceiling49%Budget efficiency vs. theoretical maximum
Training data / held-out test4000 / 1500Trained and evaluated on disjoint ledgers
Train time134 msFast, deterministic, dependency-free
Verification Results

13 / 13 automated checks — all passing

Run with node verify.mjs. Structural checks validate the learner, metrics, causal world, and optimizer feasibility; benchmark checks validate ranking, calibration, and head-to-head cash recovery on held-out data.

ResultCheckDetail
PASSLogistic learner recovers a separable signal (acc > 0.95)train acc=1
PASSAUC helper returns 1.0 for a perfectly ranked setauc=1
PASSPotential outcomes monotone: y1 >= y0 for every invoice0 violations / 2000
PASSCapacity-constrained worklist never exceeds the hours budgetused 59.92h <= cap 60h, 59 invoices
PASS[synthetic] Self-cure propensity ranks better than chance (AUC > 0.70)AUC=0.8173
PASS[synthetic] Self-cure probabilities are calibrated (ECE < 0.05)ECE=0.018, Brier=0.1588
PASS[synthetic] Engine beats best simple baseline by >40% cash recoveredengine=$1,665,446 vs best baseline=$733,603 (skill 127.0%)
PASS[synthetic] Engine recovers >=2x the cash of FIFO at equal hours2.94x FIFO
PASS[synthetic] Engine captures >35% of the movable-cash ceiling within budget48.7% of $3,417,896 using 200h
PASSReproducible: same seeds -> identical AUC0.8173 == 0.8173
PASSTrains the uplift model end-to-end in < 8 s134 ms on 4000 historical invoices
PASSEngine emits a complete, budget-feasible worklist schemarows=58 fields✓=true hours=80<=80 cash>=0=true
PASSOfficial dataset evaluation (drop data/official/{history,open}.csv to enable)official data not present — synthetic benchmark only
Proof Layer

Independently verifiable, deliberately honest

The Forge Proof Layer (IRS_AUDITOR standard) has final authority over the posture below. It is not hand-set, and it cannot be overridden by any other subsystem.

Certification statePROOF_INCOMPLETE
Evidence gradeB
Auditor verdictPASS_WITH_DISCLOSED_SEAMS
VerificationPASS · 13 / 13
Unsupported claims0
Checksum self-test34 files OK
Trust score80 / 100

Disclosed seams (6)

Why this is PROOF_INCOMPLETE and not CERTIFIED: no real customer ledger is present, so a company-specific recovery figure is a disclosed seam that contradicts a certified claim — by design.

  • No real customer AR ledger is present in this workspace; all reported numbers are measured on a synthetic, behaviourally-motivated benchmark (src/synth.mjs), not on any company's receivables.
  • Individual treatment uplift is measurable here only because the synthetic world exposes BOTH potential outcomes (y0 and y1). On real data you can never observe both for the same invoice, so production
  • Simulated — Collector effort hours and days-to-pay are modelled parameters, not timed observations.
  • Projection — "Cash accelerated" and "collection-days reduction" are model projections over this ledger, not realized, audited cash movements.
  • Official evaluation path present but inactive (no data/official/ inputs this run).
  • Customer outcome: No real customer AR ledger is present; company-specific cash recovery is not produced (synthetic benchmark only).

Proof report Auditor challenge

Artifacts Produced

A complete, machine-checkable evidence trail

Generated by the Proof Layer and shipped inside the delivery package. Raw run logs are preserved under proof/evidence/.

EXECUTIVE_EVIDENCE.mdAnswers to the 10 IRS_AUDITOR questions
CLAIM_EVIDENCE.jsonEvery claim → source · method · artifact · status
EXECUTION_TRACE.jsonEach command → exit code + stdout sha256
ARTIFACT_MANIFEST.jsonEvery shipped file → bytes + role + sha256
CHECKSUMS.json34 checksums (self-verifying)
TRUST_SCORE.jsonScored 0–100 with full breakdown
EVIDENCE_GRADE.mdGrade with objective basis
AUDITOR_CHALLENGE.mdGenerated hostile interrogation
AUDITOR_OBJECTIONS.mdPre-written objections + responses
LIMITATIONS.mdSeams · simulations · projections
REPRODUCE.mdExact commands for every number
PROOF_DECISION.jsonAuthoritative gate decision
Technical Stack

Lean by design

Runtime & engineering

Deterministic, dependency-free, runs anywhere with Node — no network, no native modules, no build step.

Node.js 18+Zero runtime depsSeeded / reproducibleES modules

Modeling & optimization

Calibrated logistic propensity, a T-learner uplift model, and a greedy-ratio 0/1 knapsack allocator.

Logistic regressionUplift (T-learner)Constrained optimization

Evaluation

Rank-based AUC, Brier and ECE calibration, and head-to-head realized-cash comparison against baselines.

Mann-Whitney AUCBrier / ECEHoldout split

Interface & proof

A framework-free Canvas console mirroring the engine, plus the IRS_AUDITOR proof toolchain.

Vanilla JS / CanvasStatic / CDN-readyProof Layer
Deployment Instructions

Run it in minutes; bring your own data when ready

Local run

cd delivery-package/cash-recovery-engine
node run.mjs      # train + score; writes reports/ + tool data
node verify.mjs   # 13/13 checks; writes verification-report.*
# open public/tool.html in a browser

The tool is static — serve public/ from any host or CDN. No server runtime required.

Bring your own ledger

Drop two CSVs into data/official/ and re-run verification — the identical checks evaluate your data.

data/official/history.csv  # resolved invoices
data/official/open.csv     # live open invoices
node verify.mjs            # official block runs

Full schema and ERP field mapping in the Run & Deploy guide.

Lessons Learned

What the build surfaced

Effort is the hidden variable

Ranking by cash alone loses to ranking by cash-per-hour: large balances cost disproportionately more to collect, so pricing in effort is where most of the lift comes from.

Uplift beats propensity

Chasing likely payers wastes hours. Targeting the moveable middle — high treatment effect, not high payment probability — is what converts effort to cash.

Calibration earns trust

A decision tool needs probabilities that mean what they say. An ECE of 0.018 is what lets a controller act on a number rather than argue with it.

Causal honesty from day one

Individual uplift is never observable on real data. Designing for holdout / A-B validation up front is the only credible path to attributing recovered cash.

Future Production Roadmap

From verified prototype to closed-loop treasury system

Phase 04–6 weeks · Pilot

Real-ledger validation

  • Ingest a real AR export; score in shadow mode alongside the current process.
  • Run an A-B / holdout: engine queue vs. business-as-usual; measure realized cash and DSO.
  • Re-run the Proof Layer on real evidence to retire the data seam.
Phase 1Integration

System-of-record connectors

  • Connectors for NetSuite, SAP, Oracle, QuickBooks; nightly scoring.
  • Role-based worklists pushed into the collections workflow.
  • Identity, access control, and tenant isolation.
Phase 2Action layer

Orchestrated outreach

  • Automated dunning sequences and payment-plan offers, human-in-the-loop.
  • Dispute detection and routing; credit-hold recommendations.
Phase 3Closed loop

Continuous learning

  • Outcome capture, scheduled retraining, drift and calibration monitoring.
  • Uplift re-estimation from ongoing randomized holdouts.
Phase 4Treasury

Working-capital optimization

  • Cash-flow forecasting and portfolio-level capital optimization.
  • Dynamic credit limits and customer-level risk policy.

Honest scope. No real customer receivables ledger is present in this environment, so no company-specific recovery figure is claimed. Every metric on this page is measured on a synthetic AR benchmark with a known causal ground truth (AUC 0.8173, ECE 0.018, +127% vs. the strongest baseline, 2.9× FIFO). Projected cash figures are model estimates, not realized cash. See the proof report and limitations.

Delivery metrics

Tokens, elapsed time, and cost for producing this outcome. Basis: Estimated — reproducible model over this outcome’s published artifacts. metrics.json

~766k
Tokens used
~4h 15m
Elapsed time
~$3.45
Cost (USD)

Cost and tokens are estimates derived deterministically from published artifacts and representative list pricing; actual billing may differ. Model basis: claude-sonnet-class (representative).