Verification Report
Verification Report — Safeguard Work-Order Agent Ecosystem
Strictness: IRS_AUDITOR | Proof status: external live infrastructure verified (external PostgreSQL server + Go gRPC service, restart/reconnect); inbound corpus is synthetic
Checks: PASS 23 / 23 (100%) | Evaluated on: synthetic corpus over external PostgreSQL + external Go gRPC | Generated: 2026-06-25T23:11:13.269Z
Infrastructure mode: external-postgres+external-grpc (database engine postgres, gRPC 127.0.0.1:50051, restart/reconnect tested: true).
Disclosed seams & limitations
- SIMULATED INPUT: Inbound work orders are synthetic and seeded (src/synth.mjs) with ground-truth labels. Reported accuracy is against that synthetic answer key, not Safeguard production data; absolute accuracy on real text will differ. This is the blocking gap for PRODUCTION_VALIDATED (no official/real benchmark, no independent reproduction, no external validation).
- DISCLOSED_SEAM: Persistence targets PostgreSQL. The Oracle path (named in the brief) is not implemented; an Oracle adapter behind the same repository interface would be required for an Oracle deployment.
- DISCLOSED_SEAM: The classifier is a deterministic lexicon model, NOT a hosted LLM. The production design swaps an LLM behind the same interface (specs/agent-classifier.md); that swap is unverified here.
- DISCLOSED_SEAM: No identity/auth, RBAC, tenant isolation, TLS, or security/compliance testing was performed on the HTTP/gRPC surfaces.
- DISCLOSED_SEAM: The React console (public/console.html) reimplements the agent heuristics client-side for demonstration; the verified system of record is the Node pipeline under src/.
What is verified
The four agents run end-to-end against an external PostgreSQL server (over TCP) and an external Go gRPC dispatch service (over the wire). Beyond accuracy and safety behaviour, the suite asserts the live HTTP API, live persistence, dispatch idempotency across the wire, malformed-request rejection by the Go server, and resilience: the client reconnects after the Go service restarts and data survives an external PostgreSQL server restart. Inbound volume is synthetic.
Benchmark (synthetic corpus)
| Metric | Value |
|---|---|
| Orders processed | 600 |
| Infrastructure mode | external-postgres+external-grpc |
| Classification accuracy | 98.9% |
| Priority accuracy | 96.4% |
| Region-routing accuracy | 100.0% |
| Exception precision / recall / F1 | 0.9736 / 1 / 0.9866 |
| False-auto-action rate | 0.00% |
| Automatic-action rate | 62.2% |
| Human-in-the-loop rate | 30.8% |
| End-to-end processing | 54005 ms for 600 orders |
Persistence (real DB row counts)
| Table | Rows |
|---|---|
| work_orders | 600 |
| audit_log | 600 |
| dispatch_records | 373 |
Disposition counts
| Disposition | Count |
|---|---|
| AUTO_DISPATCH | 373 |
| HUMAN_EXCEPTION | 185 |
| REJECTED | 42 |
Checks
| Check | Detail | Result |
|---|---|---|
| HTTP API: POST /work-orders ingests, persists, and is readable via GET (+audit) | health.ok, action=AUTO_DISPATCH, audit=1 | PASS |
| Classifier: category accuracy >= 0.90 on resolvable orders | accuracy=0.9887 over 530 orders | PASS |
| Classifier: priority accuracy >= 0.75 | accuracy=0.9642 | PASS |
| Router: region resolved correctly >= 0.99 where a zone exists | accuracy=1 | PASS |
| Safety: exception-detection recall >= 0.95 | recall=1 (tp=221, fn=0) | PASS |
| Safety: false-auto-action rate <= 0.02 | rate=0 (0/600) | PASS |
| Quality: exception-detection precision >= 0.90 | precision=0.9736 (fp=6) | PASS |
| Outcome: automatic-action rate >= 0.55 | autoActionRate=0.6217 (373/600) | PASS |
| Validator: 100% of missing-location orders blocked from auto-dispatch | 42/42 blocked | PASS |
| Validator: duplicate resubmissions detected via durable fingerprint query | 61/61 caught | PASS |
| Validator: over-cost-limit orders held for human approval | 48/48 held | PASS |
| External Go gRPC: dispatch service reachable over the wire (Health RPC) | health.ok=true @ 127.0.0.1:50051 | PASS |
| Persistence: every order persisted to external PostgreSQL work_orders | work_orders=600 == total 600 | PASS |
| Persistence: append-only audit_log has one row per order | audit_log=600 == total 600 | PASS |
| Persistence: dispatch_records == auto-dispatched count, all with refs | dispatch_records=373, refs=373, auto=373 | PASS |
| External Go gRPC: dispatch idempotent over the wire (no double-dispatch) | replays=50/50; dispatch_records 373->373 | PASS |
| External Go gRPC: malformed DispatchRequest rejected by the server | ok=false error=INVALID_PAYLOAD: does not match DispatchRequest | PASS |
| Persistence: record + audit trail readable back from the DB for a sample order | id=WO-100000 status=AUTO_DISPATCH audit=1 | PASS |
| Orchestrator: dispositions reconcile to total volume | 373+185+42=600==600 | PASS |
| Performance: 600 orders over external gRPC + Postgres in < 120000 ms | 54005 ms for 600 orders | PASS |
| Resilience: client reconnects after the Go gRPC service restarts (idempotency holds) | healthy=true, replay=true, refMatch=true, records 373->373 | PASS |
| Resilience: data durable across an external PostgreSQL server restart + reconnect | after restart+reconnect work_orders=600, dispatch_records=373, audit_log=600 | PASS |
| Reproducible: same seed -> identical metrics (independent stack) | classification 0.9887==0.9887, auto 0.6217==0.6217 | PASS |