100,000 Work Order Simulation

Auditor Challenge

Auditor Challenge — forge-pm-work-order-sim

A hostile external auditor is attempting to invalidate this outcome. Every major claim must survive the following interrogation, answered from objective evidence.

Standard: IRS_AUDITOR (assume bad faith; trust nothing without evidence)
Certification state: PROOF_INCOMPLETE
Evidence Grade: B
Trust Score: 80/100
Verification: PASS (27/27)

Global challenge questions

What evidence supports this? Every metric maps to proof/CLAIM_EVIDENCE.json → proof/evidence/verification-report.json, produced by node verify.mjs and traced in proof/EXECUTION_TRACE.json.
What assumptions exist? See proof/LIMITATIONS.md and proof/EXECUTIVE_EVIDENCE.md.
How could this fail? Verification passes today; failure modes are the disclosed seams below.
Could another engineer reproduce it? Yes — proof/REPRODUCE.md lists exact commands; checksums in proof/CHECKSUMS.json pin every input.
What would invalidate this conclusion? A failing check, a checksum mismatch (node tools/forge-proof-verify.mjs --outcome delivery-package/forge-pm-work-order-sim), or any claim without a source in CLAIM_EVIDENCE.json.
Has anything been simulated? Yes — results use a synthetic/internal benchmark (DISCLOSED_SEAM).
Were any shortcuts taken? 7 disclosed seam(s); 0 draft doc(s); 0 unguarded marketing phrase(s).
Would this survive expert review? Only with the disclosed seams explicitly accepted.

Per-claim challenge

Scale: full synthetic corpus processed through the live ecosystem = 100,000 orders (target 100,000) — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Classification accuracy >= 0.90 on resolvable single-trade orders = accuracy=1 over 84807 orders — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Priority accuracy >= 0.90 = accuracy=1 over 93001 orders — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Routing accuracy >= 0.99 (zone -> region) where a unit/zone exists = accuracy=1 over 90942 orders — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Safety: exception-detection recall >= 0.95 = recall=1 (tp=42221, fn=0) — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Safety: false-auto-action rate <= 0.02 = rate=0 (0/100000) — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Quality: exception-detection precision >= 0.90 = precision=0.974 (fp=1128) — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Outcome: auto-action rate within [0.45, 0.70] = autoActionRate=0.5665 (56651/100000) — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Duplicate suppression >= 0.98 (durable fingerprint query) = 10009/10009 caught — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Validator: 100% of over-cost-limit orders held for human approval = 7961/7961 held — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Validator: 100% of missing-location orders rejected (not auto-dispatched) = 9058/9058 rejected — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
Validator: 100% of missing-description orders rejected = 6999/6999 rejected — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._

Open objections (must be resolved or disclosed before CERTIFIED)

Customer outcome is a disclosed seam: Fictional enterprise customer (Forge Property Management) and 100% synthetic data/assumptions; no real deployment, no real production results. (blocks CERTIFIED by design).

Disclosed seams (auditor-acknowledged limitations)

FICTIONAL CUSTOMER: "Forge Property Management" is an invented enterprise. No real customer relationship, deployment, or contract exists. This is a synthetic deployment model, not a production case study.
SYNTHETIC DATA: All 100,000 work orders are generated by a seeded PRNG (src/enterprise-synth.mjs) with a ground-truth answer key. Reported accuracy is against that synthetic key, not real tenant text; absolute accuracy on real intake will differ.
SIMULATED OPERATIONS & ROI: Manual handling time, exception review time, coordinator cost, implementation cost, and platform cost are stated illustrative assumptions (assumptions.json), not measured production figures. The ROI is a transparent model, not a realized financial result.
LIVE INFRASTRUCTURE (in-process): Persistence is the real PostgreSQL engine via PGlite (in-memory for this run) and dispatch crosses a real gRPC/HTTP2 wire to a Node service on localhost. An external Postgres (DATABASE_URL) and a Go gRPC service are wire-compatible disclosed seams, not exercised here.
DISCLOSED_SEAM: The classifier is a deterministic lexicon model, not a hosted LLM. The production design swaps an LLM behind the same interface; that swap is unverified here.
AT-LEAST-ONCE DISPATCH: Under load a Dispatch RPC can occasionally exceed its client deadline after the gRPC server has already committed the dispatch row. The actioner escalates those orders to a human exception (never double-dispatches, never silently drops), so dispatch_records can slightly exceed the auto-dispatched count. The exact orphan count is transport-timing dependent and not bit-identical across runs; the reconciliation identity (dispatch_records = auto-dispatched + safely-escalated orphans) holds every run.
Customer outcome: Fictional enterprise customer (Forge Property Management) and 100% synthetic data/assumptions; no real deployment, no real production results.

_Generated by tools/forge-proof.mjs at 2026-06-26T00:30:30.331Z. The Proof Layer has final authority over this challenge; it may not be edited to suppress objections._