Safeguard Work-Order Agent Ecosystem

Spec · Classifier

Agent Spec — Classifier

Status: implemented (deterministic engine) · LLM engine is a disclosed seam Source: src/agents/classifier.mjs Owner interface: classify(order) -> classification

Purpose

Turn a raw inbound work order (free text plus whatever structured fields arrived) into a structured, confidence-scored classification that the rest of the ecosystem can route, validate, and action.

Interface

classify(order: WorkOrder) -> {
  category:            Category        // one of schema.CATEGORIES
  categoryConfidence:  number          // [0,1]
  priority:            Priority        // P1..P4
  priorityConfidence:  number          // [0,1]
  extractedFields:     { zone?, poNumber?, callbackPhone?, estimatedCost? }
  scores:              Record<Category, number>   // per-category lexicon score
  reasoning:           string          // human-readable trace of the decision
}

WorkOrder minimally requires { id, rawText }. Optional: location, estimatedCost, submittedAt.

Method (engine v1 — deterministic lexicon scoring)

Normalize rawText (lowercase, strip punctuation, collapse whitespace).
Score the text against a weighted keyword lexicon per category.
Rank categories; the winner is the category, unless no keyword matched (then

GENERAL_MAINTENANCE at low confidence).

categoryConfidence is the margin of the top score over the runner-up,

squashed into [0,1]. A genuine score tie yields confidence 0.4, which is below the validator's auto-action threshold, so ambiguous orders deliberately become human exceptions.

Priority is derived from explicit signal terms: safety/emergency terms force

P1; urgency terms imply P2; low-priority terms imply P4; default P3.

Extract structured fields with explicit patterns (zone code, PO number,

callback phone, dollar estimate).

This engine is intentionally transparent: every classification is traceable to the exact tokens (scores, reasoning) that produced it.

DISCLOSED_SEAM — production LLM engine

The production design replaces engine v1 with an LLM classifier **behind this same interface** (same input, same output shape, same confidence contract). That swap is not implemented or verified here. Until it is, the deterministic engine is the system of record and is what the verification suite measures.

Confidence & escalation contract

Confidence is a real, calibrated margin — never a fabricated percentage. When

the engine cannot separate two categories, it reports low confidence rather than guessing.

Low confidence is a feature: the validator converts it into a human exception.

Verified behaviour (synthetic corpus, `verify.mjs`)

Property	Result
Category accuracy (resolvable orders)	98.98%
Priority accuracy	96.03%
Determinism (same seed → same output)	exact

All numbers are against a synthetic, labeled corpus (src/synth.mjs); see proof/LIMITATIONS.md.

Failure modes & how they are handled

No keywords match → GENERAL_MAINTENANCE at 0.1 confidence → exception.
Two trades tie → 0.4 confidence → exception.
Conflicting priority signals → highest-severity term wins (safety-first).