Verify
VERIFY — Cash Recovery Engine
How verification runs and what each check asserts. Source: verify.mjs. One verification structure runs against two possible benchmarks (synthetic always; official auto-detected from data/official/).
How it runs
node verify.mjs:
- Runs structural checks on the learner, AUC helper, synthetic outcome
monotonicity, and optimizer feasibility.
- Trains the uplift T-learner on a 4,000-invoice synthetic history (seed 7).
- Scores a held-out 1,500-invoice test ledger (seed 101) the model never saw.
- Measures propensity ranking, calibration, and head-to-head cash recovery
under a binding 200-hour budget.
- Writes
verification-report.json/.mdand an evidence copy.
What each check asserts
| # | Check | Asserts |
|---|---|---|
| 1 | Logistic learner recovers a separable signal | The gradient-descent learner is correct (train acc > 0.95 on a known-separable set) |
| 2 | AUC helper returns 1.0 for perfect ranking | The evaluation metric itself is correct |
| 3 | Potential outcomes monotone (y1 ≥ y0) | The synthetic causal world is well-formed; uplift is non-negative |
| 4 | Worklist never exceeds the hours budget | The 0/1-knapsack optimizer is feasible |
| 5 | Self-cure AUC > 0.70 | The propensity model ranks payers above non-payers well above chance |
| 6 | Calibration ECE < 0.05 | Predicted probabilities match observed frequencies (decisions can trust them) |
| 7 | Engine beats strongest baseline by > 40% cash | Uplift-ranked allocation materially out-recovers largest/FIFO/random |
| 8 | Engine ≥ 2× FIFO cash at equal hours | The lift over the most common real-world worklist is large |
| 9 | Captures > 35% of the movable-cash ceiling | The budget is spent efficiently against the theoretical maximum |
| 10 | Reproducible: same seeds → identical AUC | Determinism (bit-for-bit) |
| 11 | Trains end-to-end in < 8 s | The method is fast/practical |
| 12 | Engine emits a complete, budget-feasible worklist | The end-to-end orchestration produces the deliverable schema |
| 13 | Official path (skip or run) | Real-data evaluation is wired; informative skip when no data present |
Why these checks are the right ones
The business claim is "recover more cash per collector-hour." Checks 5–6 establish the model *understands* who pays; checks 7–9 establish the optimizer *acts* on that understanding better than the conventional worklists; checks 3–4 establish the comparison is fair (well-formed world, feasible budget); checks 10–12 establish it is reproducible, fast, and emits a usable deliverable.
Thresholds
Thresholds are set conservatively below observed values so the suite is stable, not tuned to barely pass. Observed: AUC 0.817 (> 0.70), ECE 0.018 (< 0.05), skill +127% (> 40%), lift 2.94× (≥ 2×), capture 48.7% (> 35%).