Executive Evidence
Executive Evidence — Cash Recovery Engine
Standard: PROOF_STANDARD = IRS_AUDITOR Certification: PROOF_INCOMPLETE (machine copy: proof/PROOF_DECISION.json). Why not CERTIFIED: the customer outcome is a cash-recovery worklist for a real receivables ledger; no real ledger is present in this workspace, so that outcome is a DISCLOSED_SEAM. All implemented checks pass and are reproducible.
This document answers the ten required IRS_AUDITOR questions. Every number has lineage in proof/CLAIM_EVIDENCE.json and verification-report.json.
1. What exactly is being claimed?
- A dependency-free receivables-prioritization engine: an uplift T-learner that
separates self-cure from pay-if-worked invoices, a capacity-constrained optimizer that maximizes expected cash per collector-hour, a four-way recovery segmentation, downloadable deliverables, and a live web interface.
- On a held-out synthetic AR benchmark (seed-fixed): 13/13 checks pass;
self-cure AUC 0.8173; calibration ECE 0.018 (Brier 0.1588); the engine recovers $1,665,446 vs. $733,603 for the strongest simple baseline (+127.0% skill) and 2.94× FIFO at equal hours; it captures 48.7% of the $3,417,896 movable-cash ceiling within a 200-hour budget.
2. What evidence supports each claim?
verification-report.json(+.md) and the raw run copy
proof/evidence/verify.log / proof/evidence/verification-report.json.
proof/CLAIM_EVIDENCE.jsonmaps every claim → source, method, command,
dataset, timestamp, artifact, status.
proof/EXECUTION_TRACE.jsonrecords each command, exit code, and stdout
sha256; proof/ARTIFACT_MANIFEST.json + proof/CHECKSUMS.json pin every file.
3. Can an independent engineer reproduce this claim?
Yes. proof/REPRODUCE.md gives exact commands; everything is seeded and dependency-free (Node 18+). node verify.mjs reproduces the table; node tools/forge-proof-verify.mjs --outcome delivery-package/cash-recovery-engine re-checks every checksum.
4. What assumptions were made?
- Payment is a probabilistic outcome driven by observable invoice/customer
features; a collector touch adds an uplift that is largest in the "moveable middle" (self-cure probability near 0.5) — this assumption is built into the synthetic world and is the thing the model must recover.
- Larger balances cost disproportionately more collector effort (negotiation,
approvals, disputes, legal); the simple baselines ignore effort.
- Potential outcomes are monotone (working an invoice never lowers its chance of
paying) — encoded via a shared threshold so individual uplift is non-negative.
- These are modelling assumptions, not measurements from a real ledger.
5. What limitations exist?
See proof/LIMITATIONS.md (authoritative). Headline: all metrics are synthetic; no real ledger; effort/timing are modelled; "cash accelerated" is a projection.
6. What seams exist? (DISCLOSED_SEAM)
- No real AR ledger present → no company-specific recovery figure produced.
- Individual treatment uplift is identifiable here only because the synthetic
world exposes both potential outcomes; on real data it is an estimate.
- Collector effort-hours and days-to-pay are modelled parameters.
7. What was actually executed?
node verify.mjs→ 13 structural + benchmark checks on synthetic data
(deterministic). Raw output: proof/evidence/verify.log.
node run.mjs→ trained the engine and emitted the worklist CSV, JSON, and
executive summary, plus the web-tool data snapshot.
8. What was inferred (not directly executed)?
- Real-world recovery is inferred to be unknown — not measured. The synthetic
skill bounds the engine's ranking/optimization correctness on the modelled problem and does not transfer numerically to a specific company's books.
- Real-data uplift quality is inferred from the control-AUC check path, not from
a live holdout/A-B test.
9. What remains unverified?
- Any recovery metric on a real ledger (no dataset present).
- Calibration and uplift estimates against real payment behaviour.
- Deployment, security, integration, monitoring, and operational behaviour (not
run, not claimed).
10. What evidence would invalidate the claim?
- A
node verify.mjsrun not yielding 13/13 or different numbers (drift/
environment difference).
- Editing
src/synth.mjsor any seed (synthetic numbers are conditional on them). - Treating synthetic skill as a real-company recovery figure (explicitly not
claimed).
- On real data: a holdout/A-B test in which the engine's queue does not beat the
business-as-usual queue on realized cash.
Pre-written hostile objections and responses: proof/AUDITOR_OBJECTIONS.md. The generated hostile interrogation: proof/AUDITOR_CHALLENGE.md.