100,000 Work Order Simulation

Impact Study

Forge Property Management — Work Order Automation Impact Study

## ⚠ FICTIONAL / SYNTHETIC DEPLOYMENT MODEL
"Forge Property Management" is an invented enterprise customer. This study is
a synthetic deployment model, not a real customer deployment, and it makes
no claim of real production results. Every operational number comes from a
deterministic synthetic corpus of 100,000 work orders; every financial number
comes from stated illustrative assumptions (see assumptions.json). The
methodology is fully transparent and reproducible.

_Generated from verification-report.json on 2026-06-26T00:25:09.291Z. Simulation-tier label: FICTIONAL_DEPLOYMENT_MODEL_CERTIFIED._

1. Executive summary

A fictional enterprise property manager — Forge Property Management, 42,000 units across Florida, Georgia, Texas, and North Carolina — currently routes **100,000 work orders a year through a 14-person coordination team that reviews every** order by hand (≈8 minutes each). This study simulates replacing that fully-manual triage with the Work Order Agent Ecosystem and measures the result against a known answer key.

Across the full 100,000-order synthetic corpus, run through the real ecosystem (real PostgreSQL + real gRPC dispatch):

56.6% of work orders were actioned automatically

(classified, routed, validated, and dispatched with no human touch).

27.3% were routed to humans as exceptions, and

16.1% were rejected back for missing information.

The safety-critical false-auto-action rate was 0.00%

on the synthetic key, with 100.0% exception recall.

100.0% of duplicate resubmissions were suppressed,

and 100.0% of orders carried a complete, append-only audit trail.

Under the stated assumptions, the model shows $369,392

in annual labor savings, a 8.69-month payback, and a 110.28% three-year ROI.

These are simulated figures intended to size the opportunity honestly — not a guarantee of real-world performance.

2. Baseline process (the fictional "before")

Attribute	Value
Annual work orders	100,000
Review model	100% manual — every order reviewed by a coordinator
Average handling time	8 minutes/order
Coordination team	14 people
Fully-loaded coordinator cost	$38/hour
Annual manual labor	13,333.33 hours = $506,667

Stated pain points: slow routing, duplicate tickets, inconsistent vendor assignment, SLA misses, and poor auditability — exactly the failure modes the ecosystem's validator and audit layer target.

3. Agent architecture

The simulation runs the unmodified Work Order Agent Ecosystem (delivery-package/work-order-agents): four agents in sequence, over real infrastructure.

Classifier — trade category + priority + field extraction with calibrated

confidence (deterministic lexicon engine; an LLM is a disclosed seam).

Router — table-driven routing to queue, crew, vendor tier, region, and SLA.
Validator — the safety boundary: required fields, confidence floors, cost

cap, and durable duplicate detection. Anything uncertain becomes a human exception.

Actioner — auto-dispatch vs. human exception vs. reject, with idempotent

gRPC dispatch and an append-only audit entry for every order.

Infrastructure for this run: database engine pglite-memory, gRPC dispatch at 127.0.0.1:55092.

4. Simulation methodology

Inputs: 100,000 synthetic work orders generated by a seeded PRNG

(seed 20260625) with a ground-truth answer key. Corpus fingerprint: 6cbb9cc1149dd85f1c4324db….

Execution: each order flows through classify → route → validate → action over

a real PostgreSQL engine and a real gRPC dispatch service.

Scoring: accuracy, recall, precision, and the false-auto-action rate are

measured against the answer key; persistence and idempotency are checked against the live database and the gRPC wire.

Determinism: the same seed reproduces the same corpus and therefore the same

metrics (asserted by a fingerprint check).

The corpus mix (operational realities modeled):

Pattern	Count	Models
clean	49,865	clear single-trade, valid → auto-dispatch
emergency	7,914	P1 safety case → escalated auto-dispatch
ambiguous	8,194	two trades, weak signal → human exception
highCost	7,961	over the auto-approval cap → human exception
duplicate	10,009	resubmission → suppressed, human exception
missingLoc	9,058	no resolvable unit/zone → rejected
missingField	6,999	description too short → rejected

5. Synthetic data disclosure

All data is machine-generated. No real customer, property, tenant, vendor, cost, or work order is represented. The full corpus is written to data/enterprise-work-orders.jsonl (90.4 MB, sha256 57504af7ffbbc0a9a22af2a2…); a 1,000-row sample is shipped in datasets/sample-1000.jsonl with a schema in datasets/dataset-card.md. Reported accuracy is against the synthetic answer key — real tenant text is messier, so absolute accuracy in a real deployment would differ.

6. Operational results

Metric	Result
Work orders processed	100,000
Classification accuracy	100.0%
Priority accuracy	100.0%
Routing accuracy (region)	100.0%
Auto-action rate	56.6%
Human exception rate	27.3%
Rejection rate	16.1%
Needs-review rate	27.3%
False-auto-action rate	0.00%
Exception precision / recall / F1	0.974 / 1 / 0.9868
Duplicate suppression	100.0%
SLA routing performance	100.0%
Emergency escalation	100.0%
Audit completeness	100.0%
Avg processing time	3.7161 ms/order (269/s)

Dispositions: 56,651 auto-dispatched, 27,292 human exceptions, 16,057 rejected. Persistence (real DB rows): work_orders 100,000, audit_log 100,000, dispatch_records 56,651.

7. Financial impact (illustrative ROI model)

All inputs are stated assumptions (assumptions.json); the only measured input is the auto-action rate above. Every line shows its arithmetic in evidence/roi.json.

Line	Value
Manual baseline labor	13,333.33 h → $506,667/yr
Exceptions still needing a human	43,350 orders × 5 min
Agent-assisted exception labor	3,612.5 h → $137,275/yr
Annual labor savings	$369,392
Coordinator hours recovered	9,720.83 h (4.67 FTE)
Coordinator capacity recovered	12.26 of 14 FTE
Implementation (one-time)	$185,000
Platform (annual)	$114,000
First-year net savings	$70,392
Payback period	8.69 months
3-year gross savings	$1,108,175
3-year total cost	$527,000
3-year net savings	$581,175
3-year ROI	110.28%

8. Risk controls

Conservative validator. Low-confidence, over-cost, duplicate, and missing-field

orders are never auto-actioned — they go to a human. The simulated false-auto-action rate is 0.00%.

Human-in-the-loop for exceptions. 43.4% of volume is

retained for human judgment by design.

Idempotent dispatch. Retries never double-dispatch (verified on the gRPC wire).
Malformed-payload rejection. The dispatch service rejects contract violations.
Determinism. Identical inputs always produce identical outputs.

9. Auditability

Every one of the 100,000 orders produces an append-only audit_log row (100.0% completeness) capturing the action, reason, and decision detail, plus a durable work_orders record and, for auto-dispatches, a dispatch_records row with a deterministic reference. A sample audit trail (auto-dispatched + exception) is exported to evidence/audit-trace-sample.json.

10. Limitations

This is a synthetic model. Key disclosed seams:

FICTIONAL CUSTOMER: "Forge Property Management" is an invented enterprise. No real customer relationship, deployment, or contract exists. This is a synthetic deployment model, not a production case study.
SYNTHETIC DATA: All 100,000 work orders are generated by a seeded PRNG (src/enterprise-synth.mjs) with a ground-truth answer key. Reported accuracy is against that synthetic key, not real tenant text; absolute accuracy on real intake will differ.
SIMULATED OPERATIONS & ROI: Manual handling time, exception review time, coordinator cost, implementation cost, and platform cost are stated illustrative assumptions (assumptions.json), not measured production figures. The ROI is a transparent model, not a realized financial result.
LIVE INFRASTRUCTURE (in-process): Persistence is the real PostgreSQL engine via PGlite (in-memory for this run) and dispatch crosses a real gRPC/HTTP2 wire to a Node service on localhost. An external Postgres (DATABASE_URL) and a Go gRPC service are wire-compatible disclosed seams, not exercised here.
DISCLOSED_SEAM: The classifier is a deterministic lexicon model, not a hosted LLM. The production design swaps an LLM behind the same interface; that swap is unverified here.
AT-LEAST-ONCE DISPATCH: Under load a Dispatch RPC can occasionally exceed its client deadline after the gRPC server has already committed the dispatch row. The actioner escalates those orders to a human exception (never double-dispatches, never silently drops), so dispatch_records can slightly exceed the auto-dispatched count. The exact orphan count is transport-timing dependent and not bit-identical across runs; the reconciliation identity (dispatch_records = auto-dispatched + safely-escalated orphans) holds every run.

See proof/LIMITATIONS.md for the full list.

11. Production-readiness roadmap

Integrate real work-order intake (portal/email/phone/IoT) in place of the

synthetic generator.

Move to an external managed PostgreSQL (DATABASE_URL) and a deployed gRPC

dispatch service (Go implementation of proto/dispatch.proto).

Connect real vendor/dispatch systems and the customer's SLA policy.
Optionally swap the lexicon classifier for an LLM behind the same interface and

re-verify accuracy on the customer's real text.

Add enterprise non-functional controls (identity/SSO, RBAC, tenant isolation,

audit retention, security/compliance review).

Run a shadow period against real volume, then a limited live pilot, before any

claim of real production performance.

12. Recommended rollout plan

Phase	Duration (illustrative)	Scope	Exit criteria
0 · Shadow	4–6 weeks	Agents score real orders; humans still action everything	Accuracy + false-auto-action measured on real text
1 · Assist	4–8 weeks	Agents recommend routing; humans approve	Coordinator time/order drops; exception quality holds
2 · Auto (low-risk)	8–12 weeks	Auto-dispatch only high-confidence, in-cost, single-trade orders	False-auto-action stays within target on real data
3 · Scale	ongoing	Expand auto-action coverage; humans focus on exceptions	Stable safety + audit metrics; realized savings tracked

Each phase is gated on real measured safety metrics — not on this simulation.

_This document is generated from the verified run. Re-run node verify.mjs then node build-deliverables.mjs to reproduce it. FICTIONAL / SYNTHETIC DEPLOYMENT MODEL — no real customer data was used._