100,000 Work Order Simulation
Auditor Challenge
Auditor Challenge — forge-pm-work-order-sim
A hostile external auditor is attempting to invalidate this outcome. Every major claim must survive the following interrogation, answered from objective evidence.
- Standard: IRS_AUDITOR (assume bad faith; trust nothing without evidence)
- Certification state: PROOF_INCOMPLETE
- Evidence Grade: B
- Trust Score: 80/100
- Verification: PASS (27/27)
Global challenge questions
- What evidence supports this? Every metric maps to
proof/CLAIM_EVIDENCE.json→proof/evidence/verification-report.json, produced bynode verify.mjsand traced inproof/EXECUTION_TRACE.json. - What assumptions exist? See
proof/LIMITATIONS.mdandproof/EXECUTIVE_EVIDENCE.md. - How could this fail? Verification passes today; failure modes are the disclosed seams below.
- Could another engineer reproduce it? Yes —
proof/REPRODUCE.mdlists exact commands; checksums inproof/CHECKSUMS.jsonpin every input. - What would invalidate this conclusion? A failing check, a checksum mismatch (
node tools/forge-proof-verify.mjs --outcome delivery-package/forge-pm-work-order-sim), or any claim without a source in CLAIM_EVIDENCE.json. - Has anything been simulated? Yes — results use a synthetic/internal benchmark (DISCLOSED_SEAM).
- Were any shortcuts taken? 7 disclosed seam(s); 0 draft doc(s); 0 unguarded marketing phrase(s).
- Would this survive expert review? Only with the disclosed seams explicitly accepted.
Per-claim challenge
- Scale: full synthetic corpus processed through the live ecosystem =
100,000 orders (target 100,000)— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._ - Classification accuracy >= 0.90 on resolvable single-trade orders =
accuracy=1 over 84807 orders— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._ - Priority accuracy >= 0.90 =
accuracy=1 over 93001 orders— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._ - Routing accuracy >= 0.99 (zone -> region) where a unit/zone exists =
accuracy=1 over 90942 orders— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._ - Safety: exception-detection recall >= 0.95 =
recall=1 (tp=42221, fn=0)— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._ - Safety: false-auto-action rate <= 0.02 =
rate=0 (0/100000)— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._ - Quality: exception-detection precision >= 0.90 =
precision=0.974 (fp=1128)— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._ - Outcome: auto-action rate within [0.45, 0.70] =
autoActionRate=0.5665 (56651/100000)— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._ - Duplicate suppression >= 0.98 (durable fingerprint query) =
10009/10009 caught— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._ - Validator: 100% of over-cost-limit orders held for human approval =
7961/7961 held— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._ - Validator: 100% of missing-location orders rejected (not auto-dispatched) =
9058/9058 rejected— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._ - Validator: 100% of missing-description orders rejected =
6999/6999 rejected— source:verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Open objections (must be resolved or disclosed before CERTIFIED)
- Customer outcome is a disclosed seam: Fictional enterprise customer (Forge Property Management) and 100% synthetic data/assumptions; no real deployment, no real production results. (blocks CERTIFIED by design).
Disclosed seams (auditor-acknowledged limitations)
- FICTIONAL CUSTOMER: "Forge Property Management" is an invented enterprise. No real customer relationship, deployment, or contract exists. This is a synthetic deployment model, not a production case study.
- SYNTHETIC DATA: All 100,000 work orders are generated by a seeded PRNG (src/enterprise-synth.mjs) with a ground-truth answer key. Reported accuracy is against that synthetic key, not real tenant text; absolute accuracy on real intake will differ.
- SIMULATED OPERATIONS & ROI: Manual handling time, exception review time, coordinator cost, implementation cost, and platform cost are stated illustrative assumptions (assumptions.json), not measured production figures. The ROI is a transparent model, not a realized financial result.
- LIVE INFRASTRUCTURE (in-process): Persistence is the real PostgreSQL engine via PGlite (in-memory for this run) and dispatch crosses a real gRPC/HTTP2 wire to a Node service on localhost. An external Postgres (DATABASE_URL) and a Go gRPC service are wire-compatible disclosed seams, not exercised here.
- DISCLOSED_SEAM: The classifier is a deterministic lexicon model, not a hosted LLM. The production design swaps an LLM behind the same interface; that swap is unverified here.
- AT-LEAST-ONCE DISPATCH: Under load a Dispatch RPC can occasionally exceed its client deadline after the gRPC server has already committed the dispatch row. The actioner escalates those orders to a human exception (never double-dispatches, never silently drops), so dispatch_records can slightly exceed the auto-dispatched count. The exact orphan count is transport-timing dependent and not bit-identical across runs; the reconciliation identity (dispatch_records = auto-dispatched + safely-escalated orphans) holds every run.
- Customer outcome: Fictional enterprise customer (Forge Property Management) and 100% synthetic data/assumptions; no real deployment, no real production results.
_Generated by tools/forge-proof.mjs at 2026-06-26T00:30:30.331Z. The Proof Layer has final authority over this challenge; it may not be edited to suppress objections._