Verification Report
Verification Report — Cash Recovery Engine
Strictness: IRS_AUDITOR | Proof status: PROOF_INCOMPLETE (synthetic benchmark only; real AR ledger is a disclosed seam)
Checks: PASS 13 / 13 (100%) | Evaluated on: synthetic | Generated: 2026-06-25T22:57:32.728Z
Disclosed seams & limitations
- DISCLOSED_SEAM: No real customer AR ledger is present in this workspace; all reported numbers are measured on a synthetic, behaviourally-motivated benchmark (src/synth.mjs), not on any company's receivables.
- DISCLOSED_SEAM: Individual treatment uplift is measurable here only because the synthetic world exposes BOTH potential outcomes (y0 and y1). On real data you can never observe both for the same invoice, so production uplift is an estimate validated by holdout/A-B test, not a measured per-invoice truth.
- SIMULATED: Collector effort hours and days-to-pay are modelled parameters, not timed observations.
- PROJECTION: "Cash accelerated" and "collection-days reduction" are model projections over this ledger, not realized, audited cash movements.
- Official evaluation path present but inactive (no data/official/ inputs this run).
What is verified
The engine learns self-cure vs. pay-if-worked propensities from resolved history (an uplift T-learner), ranks open invoices by expected incremental cash per collector-hour, and packs a capacity-constrained worklist. On a held-out synthetic ledger we measure (a) propensity ranking + calibration and (b) the realized incremental cash the worklist captures under a binding hours budget, against the strategies teams use today (FIFO, largest-balance, random).
Synthetic benchmark
| Metric | Value |
|---|---|
| History / test invoices | 4000 / 1500 |
| Self-cure AUC | 0.8173 |
| Self-cure calibration (ECE / Brier) | 0.018 / 0.1588 |
| Eval capacity (binding) | 200 collector-hours |
| Engine cash recovered | $1,665,446 |
| FIFO / Largest / Random | $565,742 / $733,603 / $454,269 |
| Skill vs best baseline | 127.0% |
| Lift vs FIFO | 2.9438x |
| Capture of movable-cash ceiling | 48.7% |
| Train time | 134 ms |
Official benchmark
_Not present in this run._ Drop data/official/history.csv (resolved invoices with worked,paid_within_horizon) and data/official/open.csv (live open invoices) to evaluate on real data with the identical checks above. Schema: see run-deploy-instructions.md.
Checks
| Check | Detail | Result |
|---|---|---|
| Logistic learner recovers a separable signal (acc > 0.95) | train acc=1 | PASS |
| AUC helper returns 1.0 for a perfectly ranked set | auc=1 | PASS |
| Potential outcomes monotone: y1 >= y0 for every invoice | 0 violations / 2000 | PASS |
| Capacity-constrained worklist never exceeds the hours budget | used 59.92h <= cap 60h, 59 invoices | PASS |
| [synthetic] Self-cure propensity ranks better than chance (AUC > 0.70) | AUC=0.8173 | PASS |
| [synthetic] Self-cure probabilities are calibrated (ECE < 0.05) | ECE=0.018, Brier=0.1588 | PASS |
| [synthetic] Engine beats best simple baseline by >40% cash recovered | engine=$1,665,446 vs best baseline=$733,603 (skill 127.0%) | PASS |
| [synthetic] Engine recovers >=2x the cash of FIFO at equal hours | 2.94x FIFO | PASS |
| [synthetic] Engine captures >35% of the movable-cash ceiling within budget | 48.7% of $3,417,896 using 200h | PASS |
| Reproducible: same seeds -> identical AUC | 0.8173 == 0.8173 | PASS |
| Trains the uplift model end-to-end in < 8 s | 134 ms on 4000 historical invoices | PASS |
| Engine emits a complete, budget-feasible worklist schema | rows=58 fields✓=true hours=80<=80 cash>=0=true | PASS |
| Official dataset evaluation (drop data/official/{history,open}.csv to enable) | official data not present — synthetic benchmark only | PASS |