Safeguard Work-Order Agent Ecosystem

Limitations

← Back to outcome

LIMITATIONS

PROOF_STANDARD = IRS_AUDITOR. Every simulated, synthetic, mocked, skipped, or statically-checked element is listed here. Mirrored in verification-report.json (disclosedSeams) and proof/EXECUTIVE_EVIDENCE.md.

Verified live infrastructure (real, external, exercised in the run)

  • External PostgreSQL server. Verification runs against a separate

PostgreSQL 16 server (Docker, reached over TCP on host 127.0.0.1:5433), not an in-process engine. Real SQL, real tables (work_orders, audit_log, dispatch_records), real $1 parameterized queries.

  • External Go gRPC dispatch service. Dispatch crosses a real gRPC/HTTP2 wire

to a separate Go service (dispatch-service/, built from proto/dispatch.proto) running in its own container and talking to Postgres.

  • Real HTTP ingest API (server.mjs) — verified end-to-end.
  • Resilience verified: the client reconnects after the Go service restarts

(idempotency preserved), and data survives a full external PostgreSQL server restart + reconnect.

DISCLOSED_SEAM — what is NOT live / NOT proven

  1. Inbound data is SYNTHETIC. All work orders are seeded (src/synth.mjs)

with a ground-truth answer key. Reported accuracy is against that key, not Safeguard production data. **This is the blocking gap for the PRODUCTION_VALIDATED state:** there is no official/real benchmark, no independent third-party reproduction, and no external validation. No claim is made about accuracy on real Safeguard work orders.

  1. Oracle is not implemented. The brief names Oracle/Postgres; this build

verifies Postgres. An Oracle deployment needs an Oracle adapter behind the same repository interface (src/integrations/repository.mjs) — not included.

  1. The classifier is a deterministic lexicon model, NOT an LLM. The

production design swaps an LLM behind the same classify() interface (specs/agent-classifier.md); that swap is unverified here.

  1. No security posture. No identity/auth, RBAC, tenant isolation, TLS, rate

limiting, or security/compliance testing on the HTTP/gRPC surfaces. gRPC is insecure (no mTLS); Postgres uses a development credential.

  1. **The React console (public/console.html) reimplements the agent heuristics

client-side for demonstration.** The verified system of record is the Node pipeline under src/.

Scope limitations (not defects)

  1. Single-host, single-instance. No multi-node concurrency/HA/load test; the

performance number is one process against one Postgres + one Go service over loopback.

  1. Restart resilience is process-level, exercised by restarting containers;

it is not a full failover/HA test.

What would be required for PRODUCTION_VALIDATED (and customer deployment)

Per the IRS_AUDITOR standard, PRODUCTION_VALIDATED requires CERTIFIED plus an official benchmark, independent reproduction, and external validation. This build does not have those. To earn it / deploy for a customer:

  • Replace synthetic inputs with a labeled real Safeguard work-order dataset

and re-measure accuracy/precision/recall (official benchmark).

  • Obtain independent reproduction (a third party runs the suite and matches

results) and external validation of the outcome.

  • Add auth/RBAC/tenant isolation, TLS/mTLS, secrets management, and a

security review of the HTTP/gRPC surfaces.

  • Provide the Oracle adapter if Oracle is the system of record.
  • Decide on and verify the LLM classifier (or keep the deterministic model

as the system of record).

  • Add HA/failover, load, and soak testing and production observability.