Safeguard Work-Order Agent Ecosystem

Proof Report

← Back to outcome

Certification Report — Work-Order Agent Ecosystem

Strictness: IRS_AUDITOR

The certification state, Evidence Grade, and Trust Score are assigned by the
Forge Proof Layer, not by this document. The authoritative records are
proof/PROOF_DECISION.json, proof/PROOF_SCORECARD.json,
proof/EVIDENCE_GRADE.md, and proof/TRUST_SCORE.json. This report only
summarizes the basis for that decision.

Proof Layer decision (authoritative)

The Proof Layer assigned the certification state, Evidence Grade, and Trust Score recorded in proof/PROOF_DECISION.json, on the basis of: 23/23 external live verification checks passing (external PostgreSQL server + Go gRPC service + HTTP API + restart/reconnect), 0 unsupported claims, all 10 IRS_AUDITOR questions answered, a complete audit trail, and a synthetic benchmark with every claim traced to evidence.

This report distinguishes four things

1. Verified live infrastructure (real, external, exercised)

  • Four agents run end-to-end against a separate PostgreSQL 16 server (TCP)

and a separate Go gRPC dispatch service (the wire) — verify.mjs, 23/23.

  • Live HTTP API (ingest → persist → read-back + audit).
  • Live persistence: every order in work_orders, one audit_log row each,

dispatch_records matching auto-dispatches.

  • gRPC idempotency over the wire; malformed requests rejected by the Go server.
  • Resilience: client reconnects after the Go service restarts; data durable

across an external Postgres server restart + reconnect.

  • Safety: exception recall 1.0, false-auto-action rate 0.0%. Deterministic (seed 42).

2. Synthetic-data limitations

Agent accuracy (98.87% classification, etc.) is measured on a synthetic seeded corpus with a ground-truth answer key — not real Safeguard data. No claim is made about accuracy on real work orders.

3. Remaining production seams

SeamStatus
Inbound work-order datasynthetic seeded corpus only
Oracle persistencenot implemented (Postgres verified); needs an Oracle adapter
LLM classifiernot implemented (deterministic lexicon stand-in)
Security (auth/RBAC/tenant isolation/TLS/mTLS)not implemented or tested
HA / multi-node / load / soaknot tested (single host)

Full list: proof/LIMITATIONS.md.

4. What would be required for customer deployment / PRODUCTION_VALIDATED

The IRS_AUDITOR standard reserves PRODUCTION_VALIDATED for CERTIFIED plus an official benchmark, independent reproduction, and external validation — none of which exist here (the data is synthetic). To get there: a labeled real Safeguard dataset + re-measured benchmark; **independent third-party reproduction; external validation; auth/RBAC/TLS + security review**; the Oracle adapter if required; an LLM-classifier decision; and HA/load/soak testing. See proof/LIMITATIONS.md.

Evidence index

  • verification-report.json / .md — the 23 checks + benchmark.
  • proof/EXECUTIVE_EVIDENCE.md — the ten IRS_AUDITOR questions answered.
  • proof/CLAIM_EVIDENCE.json — every claim → source/method/artifact.
  • proof/EXECUTION_TRACE.json — every command → exit code + output hash.
  • proof/CHECKSUMS.json — sha256 of every shipped file.
  • proof/LIMITATIONS.md / proof/AUDITOR_OBJECTIONS.md — seams + objections.

Reproduction

See proof/REPRODUCE.md. A stranger can run node verify.mjs, reproduce every number, and confirm the disclosed seams.