Verify
VERIFY — Work-Order Agent Ecosystem
How verification runs and what each check asserts. Source: verify.mjs.
How it runs
In external mode (DATABASE_URL + DISPATCH_GRPC_URL set):
- Connects to an external PostgreSQL server (TCP) and an **external Go gRPC
dispatch service** (the wire); truncates tables for a clean run.
- Exercises the HTTP API (POST ingest → persist → GET read-back + audit).
generateCorpus({ n: 600, seed: 42 })builds a deterministic labeled corpus.processBatchruns all four agents end-to-end — duplicate detection is a
durable DB query; dispatch crosses the gRPC wire to the Go service; all state persists in Postgres.
- Asserts accuracy/safety, live persistence, idempotency over the wire,
malformed rejection, and resilience (restart the Go service → reconnect; restart Postgres → durable + reconnect).
- Re-runs the pipeline on an independent in-process stack to prove determinism.
In fallback mode (no env) the same logic runs against in-process PGlite + a Node gRPC server, plus an on-disk close/reopen durability check.
What each check asserts
| Group | Check | Asserts |
|---|---|---|
| HTTP | POST/GET /work-orders | live API ingests, persists, serves + audit |
| Accuracy | category ≥ 0.90 / priority ≥ 0.75 / region ≥ 0.99 | agent correctness |
| Safety | exception recall ≥ 0.95 | orders needing a human are caught |
| Safety | false-auto-action ≤ 0.02 | never auto-dispatch one that needed a human |
| Quality | exception precision ≥ 0.90 | clean orders rarely over-escalated |
| Outcome | auto-action ≥ 0.55 | human-in-the-loop reduced |
| Validator | missing-location / duplicate / over-cost handled 100% | rules hold |
| Live gRPC (Go) | Health reachable over the wire | |
| Live DB | every order in work_orders; one audit_log row each | |
| Live DB | dispatch_records == auto-dispatch count, all with refs | |
| Live gRPC (Go) | dispatch idempotent over the wire | |
| Live gRPC (Go) | malformed DispatchRequest rejected server-side | |
| Resilience | client reconnects after the Go service restarts | |
| Resilience | data durable across an external Postgres restart + reconnect | |
| Orchestration | dispositions reconcile to total | |
| Performance | 600 orders over external gRPC + Postgres < 120 s | |
| Determinism | same seed → identical metrics (independent stack) |
What a PASS means / does not mean
A PASS means the agent logic AND the external live infrastructure (external Postgres persistence + durability, external Go gRPC transport + idempotency + reconnect, HTTP API) behave correctly on synthetic inbound data. It does not mean accuracy on real Safeguard work orders, nor that an Oracle backend, LLM classifier, security controls, or HA were verified — see proof/LIMITATIONS.md. These gaps are why the state remains CERTIFIED and not PRODUCTION_VALIDATED.