Agent Memory Manager

Auditor Challenge

← Back to overview

Auditor Challenge — agent-memory-manager

A hostile external auditor is attempting to invalidate this outcome. Every major claim must survive the following interrogation, answered from objective evidence.

  • Standard: IRS_AUDITOR (assume bad faith; trust nothing without evidence)
  • Certification state: CERTIFIED
  • Evidence Grade: A
  • Trust Score: 93/100
  • Verification: PASS (30/30)

Global challenge questions

  1. What evidence supports this? Every metric maps to proof/CLAIM_EVIDENCE.jsonproof/evidence/verification-report.json, produced by node verify.mjs and traced in proof/EXECUTION_TRACE.json.
  2. What assumptions exist? See proof/LIMITATIONS.md and proof/EXECUTIVE_EVIDENCE.md.
  3. How could this fail? Verification passes today; failure modes are the disclosed seams below.
  4. Could another engineer reproduce it? Yes — proof/REPRODUCE.md lists exact commands; checksums in proof/CHECKSUMS.json pin every input.
  5. What would invalidate this conclusion? A failing check, a checksum mismatch (node tools/forge-proof-verify.mjs --outcome delivery-package/agent-memory-manager), or any claim without a source in CLAIM_EVIDENCE.json.
  6. Has anything been simulated? An official benchmark is present.
  7. Were any shortcuts taken? 6 disclosed seam(s); 0 draft doc(s); 0 unguarded marketing phrase(s).
  8. Would this survive expert review? The Proof Layer audit passed with no open objections.

Per-claim challenge

  • Unit suite: node:test passes (policies, manager API, tiering, sync, pgvector) = tests passed=32 fail=0 — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
  • In-memory retrieval precision@1 == 1.0 on the synthetic answer key = p@1=1 — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
  • In-memory retrieval precision@5 >= 0.95 = p@5=1 — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
  • Deterministic retrieval: identical query returns identical ordering = mem-0-4,mem-0-8,mem-0-7,mem-0-9,mem-0-10 == mem-0-4,mem-0-8,mem-0-7,mem-0-9,mem-0-10 — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
  • BM25 sparse index ranks an exact rare-token match first = top=rare — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
  • Reciprocal Rank Fusion ranks the multiply-agreed item first = top=x — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
  • Hybrid retrieval (dense+BM25 RRF) surfaces the exact-token memory first = top=rare — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
  • Reranker reorders candidates to surface the cross-encoder-preferred memory = top=target — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
  • pgvector retrieval precision@1 == 1.0 on the synthetic answer key = p@1=1 — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
  • pgvector retrieval precision@5 >= 0.95 = p@5=1 — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
  • pgvector cosine distance agrees with JS cosine similarity (<=1e-5) = pg=0.48795 js=0.48795 — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._
  • pgvector persistence: every memory persisted to Postgres = rows=144 == 144 — source: verification-report.json#/checks; status: SUPPORTED. _Could another engineer reproduce this number from node verify.mjs? Yes, deterministically._

Open objections (must be resolved or disclosed before CERTIFIED)

  • None. All challenged claims are supported by evidence.

Disclosed seams (auditor-acknowledged limitations)

  • SYNTHETIC INPUT: the retrieval benchmark corpus is generated by a seeded PRNG with a known topic answer key (verify.mjs). Reported precision is against that synthetic key, NOT real production text; absolute accuracy on real agent traffic will differ. This is the blocking gap for an official benchmark / PRODUCTION_VALIDATED.
  • DISCLOSED_SEAM: the default embeddings are a local, deterministic hashing model (lexical, not learned-semantic). Cosine similarity tracks token overlap. A hosted semantic embedding model can be plugged in via RemoteEmbeddingProvider but is NOT exercised here.
  • DISCLOSED_SEAM: the default summarizer is a local extractive (frequency-based) summarizer. ClaudeSummarizer (hosted LLM) is provided behind the same interface but requires a network + ANTHROPIC_API_KEY and is NOT exercised here.
  • LIVE INFRASTRUCTURE (in-process): Postgres + pgvector run in-process via PGlite (WASM). The identical SQL/pgvector code path runs against an external Postgres server through node-postgres (pg) — a wire-compatible disclosed seam not exercised in this run.
  • DISCLOSED_SEAM: fleet sync is exercised over an in-process EventEmitter bus. A distributed broker (Redis/NATS/Kafka) implementing the same SyncBus interface is a disclosed seam, not exercised here.
  • DISCLOSED_SEAM: the cross-encoder reranker (LocalCrossEncoderReranker) requires the optional @xenova/transformers dependency + a one-time model download. verify.mjs exercises the reranker INTERFACE with a deterministic fake; the real MiniLM cross-encoder is measured separately in the official BEIR/SciFact benchmark (bench/beir-scifact.mjs), whose results are recorded in officialBenchmark.

_Generated by tools/forge-proof.mjs at 2026-06-26T14:11:10.413Z. The Proof Layer has final authority over this challenge; it may not be edited to suppress objections._