Safeguard Work-Order Agent Ecosystem

Limitations

LIMITATIONS

PROOF_STANDARD = IRS_AUDITOR. Every simulated, synthetic, mocked, skipped, or statically-checked element is listed here. Mirrored in verification-report.json (disclosedSeams) and proof/EXECUTIVE_EVIDENCE.md.

Verified live infrastructure (real, external, exercised in the run)

External PostgreSQL server. Verification runs against a separate

PostgreSQL 16 server (Docker, reached over TCP on host 127.0.0.1:5433), not an in-process engine. Real SQL, real tables (work_orders, audit_log, dispatch_records), real $1 parameterized queries.

External Go gRPC dispatch service. Dispatch crosses a real gRPC/HTTP2 wire

to a separate Go service (dispatch-service/, built from proto/dispatch.proto) running in its own container and talking to Postgres.

Real HTTP ingest API (server.mjs) — verified end-to-end.
Resilience verified: the client reconnects after the Go service restarts

(idempotency preserved), and data survives a full external PostgreSQL server restart + reconnect.

DISCLOSED_SEAM — what is NOT live / NOT proven

Inbound data is SYNTHETIC. All work orders are seeded (src/synth.mjs)

with a ground-truth answer key. Reported accuracy is against that key, not Safeguard production data. **This is the blocking gap for the PRODUCTION_VALIDATED state:** there is no official/real benchmark, no independent third-party reproduction, and no external validation. No claim is made about accuracy on real Safeguard work orders.

Oracle is not implemented. The brief names Oracle/Postgres; this build

verifies Postgres. An Oracle deployment needs an Oracle adapter behind the same repository interface (src/integrations/repository.mjs) — not included.

The classifier is a deterministic lexicon model, NOT an LLM. The

production design swaps an LLM behind the same classify() interface (specs/agent-classifier.md); that swap is unverified here.

No security posture. No identity/auth, RBAC, tenant isolation, TLS, rate

limiting, or security/compliance testing on the HTTP/gRPC surfaces. gRPC is insecure (no mTLS); Postgres uses a development credential.

**The React console (public/console.html) reimplements the agent heuristics

client-side for demonstration.** The verified system of record is the Node pipeline under src/.

Scope limitations (not defects)

Single-host, single-instance. No multi-node concurrency/HA/load test; the

performance number is one process against one Postgres + one Go service over loopback.

Restart resilience is process-level, exercised by restarting containers;

it is not a full failover/HA test.

What would be required for `PRODUCTION_VALIDATED` (and customer deployment)

Per the IRS_AUDITOR standard, PRODUCTION_VALIDATED requires CERTIFIED plus an official benchmark, independent reproduction, and external validation. This build does not have those. To earn it / deploy for a customer:

Replace synthetic inputs with a labeled real Safeguard work-order dataset

and re-measure accuracy/precision/recall (official benchmark).

Obtain independent reproduction (a third party runs the suite and matches

results) and external validation of the outcome.

Add auth/RBAC/tenant isolation, TLS/mTLS, secrets management, and a

security review of the HTTP/gRPC surfaces.

Provide the Oracle adapter if Oracle is the system of record.
Decide on and verify the LLM classifier (or keep the deterministic model

as the system of record).

Add HA/failover, load, and soak testing and production observability.