FICTIONAL / SYNTHETIC DEPLOYMENT MODEL. "Forge Property Management" is an invented customer. No real customer data was used and no real production results are claimed. All figures come from a deterministic synthetic corpus and stated illustrative assumptions.
100,000 Work Order Simulation

Impact Study

← Back to case study

Forge Property Management — Work Order Automation Impact Study

## ⚠ FICTIONAL / SYNTHETIC DEPLOYMENT MODEL
"Forge Property Management" is an invented enterprise customer. This study is
a synthetic deployment model, not a real customer deployment, and it makes
no claim of real production results. Every operational number comes from a
deterministic synthetic corpus of 100,000 work orders; every financial number
comes from stated illustrative assumptions (see assumptions.json). The
methodology is fully transparent and reproducible.

_Generated from verification-report.json on 2026-06-26T00:25:09.291Z. Simulation-tier label: FICTIONAL_DEPLOYMENT_MODEL_CERTIFIED._


1. Executive summary

A fictional enterprise property manager — Forge Property Management, 42,000 units across Florida, Georgia, Texas, and North Carolina — currently routes **100,000 work orders a year through a 14-person coordination team that reviews every** order by hand (≈8 minutes each). This study simulates replacing that fully-manual triage with the Work Order Agent Ecosystem and measures the result against a known answer key.

Across the full 100,000-order synthetic corpus, run through the real ecosystem (real PostgreSQL + real gRPC dispatch):

  • 56.6% of work orders were actioned automatically

(classified, routed, validated, and dispatched with no human touch).

  • 27.3% were routed to humans as exceptions, and

16.1% were rejected back for missing information.

  • The safety-critical false-auto-action rate was 0.00%

on the synthetic key, with 100.0% exception recall.

  • 100.0% of duplicate resubmissions were suppressed,

and 100.0% of orders carried a complete, append-only audit trail.

  • Under the stated assumptions, the model shows $369,392

in annual labor savings, a 8.69-month payback, and a 110.28% three-year ROI.

These are simulated figures intended to size the opportunity honestly — not a guarantee of real-world performance.

2. Baseline process (the fictional "before")

AttributeValue
Annual work orders100,000
Review model100% manual — every order reviewed by a coordinator
Average handling time8 minutes/order
Coordination team14 people
Fully-loaded coordinator cost$38/hour
Annual manual labor13,333.33 hours = $506,667

Stated pain points: slow routing, duplicate tickets, inconsistent vendor assignment, SLA misses, and poor auditability — exactly the failure modes the ecosystem's validator and audit layer target.

3. Agent architecture

The simulation runs the unmodified Work Order Agent Ecosystem (delivery-package/work-order-agents): four agents in sequence, over real infrastructure.

  1. Classifier — trade category + priority + field extraction with calibrated

confidence (deterministic lexicon engine; an LLM is a disclosed seam).

  1. Router — table-driven routing to queue, crew, vendor tier, region, and SLA.
  2. Validator — the safety boundary: required fields, confidence floors, cost

cap, and durable duplicate detection. Anything uncertain becomes a human exception.

  1. Actioner — auto-dispatch vs. human exception vs. reject, with idempotent

gRPC dispatch and an append-only audit entry for every order.

Infrastructure for this run: database engine pglite-memory, gRPC dispatch at 127.0.0.1:55092.

4. Simulation methodology

  • Inputs: 100,000 synthetic work orders generated by a seeded PRNG

(seed 20260625) with a ground-truth answer key. Corpus fingerprint: 6cbb9cc1149dd85f1c4324db….

  • Execution: each order flows through classify → route → validate → action over

a real PostgreSQL engine and a real gRPC dispatch service.

  • Scoring: accuracy, recall, precision, and the false-auto-action rate are

measured against the answer key; persistence and idempotency are checked against the live database and the gRPC wire.

  • Determinism: the same seed reproduces the same corpus and therefore the same

metrics (asserted by a fingerprint check).

The corpus mix (operational realities modeled):

PatternCountModels
clean49,865clear single-trade, valid → auto-dispatch
emergency7,914P1 safety case → escalated auto-dispatch
ambiguous8,194two trades, weak signal → human exception
highCost7,961over the auto-approval cap → human exception
duplicate10,009resubmission → suppressed, human exception
missingLoc9,058no resolvable unit/zone → rejected
missingField6,999description too short → rejected

5. Synthetic data disclosure

All data is machine-generated. No real customer, property, tenant, vendor, cost, or work order is represented. The full corpus is written to data/enterprise-work-orders.jsonl (90.4 MB, sha256 57504af7ffbbc0a9a22af2a2…); a 1,000-row sample is shipped in datasets/sample-1000.jsonl with a schema in datasets/dataset-card.md. Reported accuracy is against the synthetic answer key — real tenant text is messier, so absolute accuracy in a real deployment would differ.

6. Operational results

MetricResult
Work orders processed100,000
Classification accuracy100.0%
Priority accuracy100.0%
Routing accuracy (region)100.0%
Auto-action rate56.6%
Human exception rate27.3%
Rejection rate16.1%
Needs-review rate27.3%
False-auto-action rate0.00%
Exception precision / recall / F10.974 / 1 / 0.9868
Duplicate suppression100.0%
SLA routing performance100.0%
Emergency escalation100.0%
Audit completeness100.0%
Avg processing time3.7161 ms/order (269/s)

Dispositions: 56,651 auto-dispatched, 27,292 human exceptions, 16,057 rejected. Persistence (real DB rows): work_orders 100,000, audit_log 100,000, dispatch_records 56,651.

7. Financial impact (illustrative ROI model)

All inputs are stated assumptions (assumptions.json); the only measured input is the auto-action rate above. Every line shows its arithmetic in evidence/roi.json.

LineValue
Manual baseline labor13,333.33 h → $506,667/yr
Exceptions still needing a human43,350 orders × 5 min
Agent-assisted exception labor3,612.5 h → $137,275/yr
Annual labor savings$369,392
Coordinator hours recovered9,720.83 h (4.67 FTE)
Coordinator capacity recovered12.26 of 14 FTE
Implementation (one-time)$185,000
Platform (annual)$114,000
First-year net savings$70,392
Payback period8.69 months
3-year gross savings$1,108,175
3-year total cost$527,000
3-year net savings$581,175
3-year ROI110.28%

8. Risk controls

  • Conservative validator. Low-confidence, over-cost, duplicate, and missing-field

orders are never auto-actioned — they go to a human. The simulated false-auto-action rate is 0.00%.

  • Human-in-the-loop for exceptions. 43.4% of volume is

retained for human judgment by design.

  • Idempotent dispatch. Retries never double-dispatch (verified on the gRPC wire).
  • Malformed-payload rejection. The dispatch service rejects contract violations.
  • Determinism. Identical inputs always produce identical outputs.

9. Auditability

Every one of the 100,000 orders produces an append-only audit_log row (100.0% completeness) capturing the action, reason, and decision detail, plus a durable work_orders record and, for auto-dispatches, a dispatch_records row with a deterministic reference. A sample audit trail (auto-dispatched + exception) is exported to evidence/audit-trace-sample.json.

10. Limitations

This is a synthetic model. Key disclosed seams:

  • FICTIONAL CUSTOMER: "Forge Property Management" is an invented enterprise. No real customer relationship, deployment, or contract exists. This is a synthetic deployment model, not a production case study.
  • SYNTHETIC DATA: All 100,000 work orders are generated by a seeded PRNG (src/enterprise-synth.mjs) with a ground-truth answer key. Reported accuracy is against that synthetic key, not real tenant text; absolute accuracy on real intake will differ.
  • SIMULATED OPERATIONS & ROI: Manual handling time, exception review time, coordinator cost, implementation cost, and platform cost are stated illustrative assumptions (assumptions.json), not measured production figures. The ROI is a transparent model, not a realized financial result.
  • LIVE INFRASTRUCTURE (in-process): Persistence is the real PostgreSQL engine via PGlite (in-memory for this run) and dispatch crosses a real gRPC/HTTP2 wire to a Node service on localhost. An external Postgres (DATABASE_URL) and a Go gRPC service are wire-compatible disclosed seams, not exercised here.
  • DISCLOSED_SEAM: The classifier is a deterministic lexicon model, not a hosted LLM. The production design swaps an LLM behind the same interface; that swap is unverified here.
  • AT-LEAST-ONCE DISPATCH: Under load a Dispatch RPC can occasionally exceed its client deadline after the gRPC server has already committed the dispatch row. The actioner escalates those orders to a human exception (never double-dispatches, never silently drops), so dispatch_records can slightly exceed the auto-dispatched count. The exact orphan count is transport-timing dependent and not bit-identical across runs; the reconciliation identity (dispatch_records = auto-dispatched + safely-escalated orphans) holds every run.

See proof/LIMITATIONS.md for the full list.

11. Production-readiness roadmap

  1. Integrate real work-order intake (portal/email/phone/IoT) in place of the

synthetic generator.

  1. Move to an external managed PostgreSQL (DATABASE_URL) and a deployed gRPC

dispatch service (Go implementation of proto/dispatch.proto).

  1. Connect real vendor/dispatch systems and the customer's SLA policy.
  2. Optionally swap the lexicon classifier for an LLM behind the same interface and

re-verify accuracy on the customer's real text.

  1. Add enterprise non-functional controls (identity/SSO, RBAC, tenant isolation,

audit retention, security/compliance review).

  1. Run a shadow period against real volume, then a limited live pilot, before any

claim of real production performance.

12. Recommended rollout plan

PhaseDuration (illustrative)ScopeExit criteria
0 · Shadow4–6 weeksAgents score real orders; humans still action everythingAccuracy + false-auto-action measured on real text
1 · Assist4–8 weeksAgents recommend routing; humans approveCoordinator time/order drops; exception quality holds
2 · Auto (low-risk)8–12 weeksAuto-dispatch only high-confidence, in-cost, single-trade ordersFalse-auto-action stays within target on real data
3 · ScaleongoingExpand auto-action coverage; humans focus on exceptionsStable safety + audit metrics; realized savings tracked

Each phase is gated on real measured safety metrics — not on this simulation.


_This document is generated from the verified run. Re-run node verify.mjs then node build-deliverables.mjs to reproduce it. FICTIONAL / SYNTHETIC DEPLOYMENT MODEL — no real customer data was used._