Cash Recovery Engine

Auditor Objections

Auditor Objections — Cash Recovery Engine

Pre-written hostile objections and the evidence-based response to each. The generated, structured interrogation is in proof/AUDITOR_CHALLENGE.md.

O1. "These numbers are fabricated / cherry-picked."

Every number is produced by node verify.mjs from src/synth.mjs with fixed seeds, captured in proof/evidence/verify.log and checksummed in proof/CHECKSUMS.json. Re-run it; you will get identical figures. Thresholds are set well below observed values (proof/VERIFY.md), so the suite is not tuned to barely pass.

O2. "You tested on the data you trained on."

No. The model trains on a 4,000-invoice history (seed 7) and is evaluated on a separate 1,500-invoice ledger (seed 101) it never saw. See verify.mjs lines that build history and test from different seeds.

O3. "Of course it beats FIFO — you designed the world to make it win."

The world is behaviourally motivated, not rigged for the engine: uplift peaks in the moveable middle (a standard, documented assumption) and larger balances cost more effort (true of real collections). The baselines fail because they ignore *effort* and *uplift*, which is exactly the real-world inefficiency the engine targets. The comparison is apples-to-apples: all strategies face the identical ledger, the identical effort costs, and the identical hours budget.

O4. "Uplift can't be measured on real data, so this is fiction."

Correct that individual uplift is not directly observable on real data — and we disclose this prominently (proof/LIMITATIONS.md §2). That is *why* the synthetic benchmark exists: it is the only setting where both potential outcomes are known, so uplift-ranking quality can be measured honestly. On real data the engine produces estimates to be validated by a holdout/A-B test; we do not claim a measured per-invoice truth.

O5. "The headline 'cash accelerated' isn't real money."

Disclosed as a PROJECTION (proof/LIMITATIONS.md §4). It is an expected value from the model, not realized cash. The verified, head-to-head claim is the relative one (+127% vs. the strongest baseline, 2.94× FIFO) on known outcomes.

O6. "Calibration sounds nice but is it actually calibrated?"

ECE 0.018 over 10 bins and Brier 0.159 on held-out data (verification-report.json#/syntheticBenchmark). A decision engine needs calibrated probabilities; this is checked explicitly (check 6).

O7. "Greedy knapsack isn't optimal."

Greedy-by-ratio is exact for the fractional relaxation and near-optimal for 0/1 when item sizes are small relative to the budget (true here). It is deterministic and fast. The check confirms feasibility (never exceeds budget); optimality gap on this instance is negligible and the engine still beats every baseline by a wide margin.

O8. "Is this just CRUD with a chart?"

No. The reasoning is (a) a causal uplift model separating self-cure from treatment effect and (b) a constrained optimization over expected cash per hour. The output ranking cannot be produced by sorting any single column — it depends on the learned uplift and the effort budget jointly.

O9. "Would this survive expert review?"

The method (T-learner uplift + value-of-effort knapsack) is standard and defensible; the open items are the disclosed seams (real data, uplift identifiability, projections). With a real ledger and a holdout test, those close.