Safeguard Work-Order Agent Ecosystem

Proof Report

Certification Report — Work-Order Agent Ecosystem

Strictness: IRS_AUDITOR

The certification state, Evidence Grade, and Trust Score are assigned by the
Forge Proof Layer, not by this document. The authoritative records are
proof/PROOF_DECISION.json, proof/PROOF_SCORECARD.json,
proof/EVIDENCE_GRADE.md, and proof/TRUST_SCORE.json. This report only
summarizes the basis for that decision.

Proof Layer decision (authoritative)

The Proof Layer assigned the certification state, Evidence Grade, and Trust Score recorded in proof/PROOF_DECISION.json, on the basis of: 23/23 external live verification checks passing (external PostgreSQL server + Go gRPC service + HTTP API + restart/reconnect), 0 unsupported claims, all 10 IRS_AUDITOR questions answered, a complete audit trail, and a synthetic benchmark with every claim traced to evidence.

This report distinguishes four things

1. Verified live infrastructure (real, external, exercised)

Four agents run end-to-end against a separate PostgreSQL 16 server (TCP)

and a separate Go gRPC dispatch service (the wire) — verify.mjs, 23/23.

Live HTTP API (ingest → persist → read-back + audit).
Live persistence: every order in work_orders, one audit_log row each,

dispatch_records matching auto-dispatches.

gRPC idempotency over the wire; malformed requests rejected by the Go server.
Resilience: client reconnects after the Go service restarts; data durable

across an external Postgres server restart + reconnect.

Safety: exception recall 1.0, false-auto-action rate 0.0%. Deterministic (seed 42).

2. Synthetic-data limitations

Agent accuracy (98.87% classification, etc.) is measured on a synthetic seeded corpus with a ground-truth answer key — not real Safeguard data. No claim is made about accuracy on real work orders.

3. Remaining production seams

Seam	Status
Inbound work-order data	synthetic seeded corpus only
Oracle persistence	not implemented (Postgres verified); needs an Oracle adapter
LLM classifier	not implemented (deterministic lexicon stand-in)
Security (auth/RBAC/tenant isolation/TLS/mTLS)	not implemented or tested
HA / multi-node / load / soak	not tested (single host)

Full list: proof/LIMITATIONS.md.

4. What would be required for customer deployment / `PRODUCTION_VALIDATED`

The IRS_AUDITOR standard reserves PRODUCTION_VALIDATED for CERTIFIED plus an official benchmark, independent reproduction, and external validation — none of which exist here (the data is synthetic). To get there: a labeled real Safeguard dataset + re-measured benchmark; **independent third-party reproduction; external validation; auth/RBAC/TLS + security review**; the Oracle adapter if required; an LLM-classifier decision; and HA/load/soak testing. See proof/LIMITATIONS.md.

Evidence index

verification-report.json / .md — the 23 checks + benchmark.
proof/EXECUTIVE_EVIDENCE.md — the ten IRS_AUDITOR questions answered.
proof/CLAIM_EVIDENCE.json — every claim → source/method/artifact.
proof/EXECUTION_TRACE.json — every command → exit code + output hash.
proof/CHECKSUMS.json — sha256 of every shipped file.
proof/LIMITATIONS.md / proof/AUDITOR_OBJECTIONS.md — seams + objections.

Reproduction

See proof/REPRODUCE.md. A stranger can run node verify.mjs, reproduce every number, and confirm the disclosed seams.