FICTIONAL / SYNTHETIC DEPLOYMENT MODEL. "Forge Property Management" is an invented customer. No real customer data was used and no real production results are claimed. All figures come from a deterministic synthetic corpus and stated illustrative assumptions.
100,000 Work Order Simulation

Executive Evidence

← Back to case study

Executive Evidence — forge-pm-work-order-sim

FICTIONAL / SYNTHETIC DEPLOYMENT MODEL. "Forge Property Management" is an
invented customer. No real customer data was used; no real production result is
claimed. The Proof Layer (IRS_AUDITOR) assigns the authoritative certification
state independently and may not be overridden here.

This answers the ten IRS_AUDITOR questions from objective evidence.

1. What exactly is being claimed?

That the existing Work Order Agent Ecosystem, run unmodified over a deterministic synthetic corpus of 100,000 property-management work orders, achieves the operational metrics recorded in verification-report.json (auto-action rate, classification/priority/routing accuracy, exception recall/precision, false-auto-action rate, duplicate suppression, SLA routing, audit completeness, idempotency), and that a transparent ROI model over stated assumptions yields the financial figures in evidence/roi.json. Nothing about a *real* deployment is claimed.

2. What evidence supports each claim?

verification-report.json (+ evidence/verification-results.json, evidence/simulation-results.json, evidence/roi.json, evidence/audit-trace-sample.json), produced by node verify.mjs and captured in proof/EXECUTION_TRACE.json and proof/evidence/verify.log. Every shipped file is checksummed in proof/CHECKSUMS.json.

3. Can an independent engineer reproduce this claim?

Yes. The corpus is seeded (20260625); proof/REPRODUCE.md gives exact commands. A corpus-fingerprint check asserts the inputs are byte-identical across runs.

4. What assumptions were made?

The customer, portfolio, and all data are invented. Manual handling time (8 min), exception review time (5 min), coordinator cost ($38/h), implementation cost, and platform cost are stated illustrative assumptions in assumptions.json — not measured figures.

5. What limitations exist?

Synthetic text is cleaner than real intake; in-process infrastructure (PGlite + loopback gRPC) stands in for external Postgres + a Go dispatch service; the classifier is a lexicon engine, not an LLM. See proof/LIMITATIONS.md.

6. What seams exist?

The fictional customer, the synthetic data, the simulated operational/financial assumptions, the in-process infrastructure, and the lexicon classifier are all disclosed seams (listed in verification-report.json#/disclosedSeams and proof/LIMITATIONS.md).

7. What was actually executed?

node verify.mjs generated 100,000 orders, booted a real PostgreSQL engine and a real gRPC dispatch service, and ran all four agents end-to-end over every order, performing real SQL writes and real gRPC round trips. Exit code and stdout hash are in proof/EXECUTION_TRACE.json.

8. What was inferred?

The ROI is *inferred* from the measured auto-action rate plus stated assumptions — it is a model, not a realized financial result. Real-world accuracy is inferred to differ from the synthetic answer key.

9. What remains unverified?

Anything about a real deployment: real-text accuracy, external Postgres / Go gRPC behaviour, LLM-classifier accuracy, enterprise non-functional controls, and any realized savings. None are claimed.

10. What evidence would invalidate this claim?

A failing MUST_PASS check, a checksum mismatch (node tools/forge-proof-verify.mjs --outcome delivery-package/forge-pm-work-order-sim), a corpus-fingerprint mismatch, or any metric without a source in proof/CLAIM_EVIDENCE.json.