Agent Memory Manager

Verification Report

← Back to overview

Verification Report — Hierarchical Agent Memory Manager

Strictness: IRS_AUDITOR | Proof status: public API + policies verified over the in-memory store AND a real Postgres+pgvector engine (PGlite); inbound benchmark corpus is synthetic/seeded

Checks: PASS 30 / 30 (100%) | Generated: 2026-06-26T14:11:10.274Z

Infrastructure: database engine PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.74 (1092ec30a3fb1d46b1782ff1b4db5094d3d06ae5), 32-bit, pgvector 0.8.0, embeddings hashing-local, summarizer extractive-local.

Disclosed seams & limitations

  • SYNTHETIC INPUT: the retrieval benchmark corpus is generated by a seeded PRNG with a known topic answer key (verify.mjs). Reported precision is against that synthetic key, NOT real production text; absolute accuracy on real agent traffic will differ. This is the blocking gap for an official benchmark / PRODUCTION_VALIDATED.
  • DISCLOSED_SEAM: the default embeddings are a local, deterministic hashing model (lexical, not learned-semantic). Cosine similarity tracks token overlap. A hosted semantic embedding model can be plugged in via RemoteEmbeddingProvider but is NOT exercised here.
  • DISCLOSED_SEAM: the default summarizer is a local extractive (frequency-based) summarizer. ClaudeSummarizer (hosted LLM) is provided behind the same interface but requires a network + ANTHROPIC_API_KEY and is NOT exercised here.
  • LIVE INFRASTRUCTURE (in-process): Postgres + pgvector run in-process via PGlite (WASM). The identical SQL/pgvector code path runs against an external Postgres server through node-postgres (pg) — a wire-compatible disclosed seam not exercised in this run.
  • DISCLOSED_SEAM: fleet sync is exercised over an in-process EventEmitter bus. A distributed broker (Redis/NATS/Kafka) implementing the same SyncBus interface is a disclosed seam, not exercised here.
  • DISCLOSED_SEAM: the cross-encoder reranker (LocalCrossEncoderReranker) requires the optional @xenova/transformers dependency + a one-time model download. verify.mjs exercises the reranker INTERFACE with a deterministic fake; the real MiniLM cross-encoder is measured separately in the official BEIR/SciFact benchmark (bench/beir-scifact.mjs), whose results are recorded in officialBenchmark.

Synthetic retrieval benchmark

MetricValue
Memories / topics144 / 12 (seed 20260625)
In-memory precision@1 / @51 / 1
pgvector precision@1 / @51 / 1
Store top-1 agreement (in-mem vs pgvector)1
pgvector vs JS cosine abs error2e-8

Checks

CheckDetailResult
Unit suite: node:test passes (policies, manager API, tiering, sync, pgvector)tests passed=32 fail=0PASS
In-memory retrieval precision@1 == 1.0 on the synthetic answer keyp@1=1PASS
In-memory retrieval precision@5 >= 0.95p@5=1PASS
Deterministic retrieval: identical query returns identical orderingmem-0-4,mem-0-8,mem-0-7,mem-0-9,mem-0-10 == mem-0-4,mem-0-8,mem-0-7,mem-0-9,mem-0-10PASS
BM25 sparse index ranks an exact rare-token match firsttop=rarePASS
Reciprocal Rank Fusion ranks the multiply-agreed item firsttop=xPASS
Hybrid retrieval (dense+BM25 RRF) surfaces the exact-token memory firsttop=rarePASS
Reranker reorders candidates to surface the cross-encoder-preferred memorytop=targetPASS
pgvector retrieval precision@1 == 1.0 on the synthetic answer keyp@1=1PASS
pgvector retrieval precision@5 >= 0.95p@5=1PASS
pgvector cosine distance agrees with JS cosine similarity (<=1e-5)pg=0.48795 js=0.48795PASS
pgvector persistence: every memory persisted to Postgresrows=144 == 144PASS
Store parity: in-memory and pgvector agree on the top-1 topic for every query12/12PASS
summarizeThread produces a non-empty summary and persists a summary memorylen=200, persisted=true, type=summaryPASS
summarizeThread: summary is retrievable from the storefound=truePASS
getContextForTask respects the character budgetchars=226 <= 300(+1 line)PASS
getContextForTask prioritizes the summary memoryfirstType=summaryPASS
Time decay demotes a hot memory below the hot importance floorhot -> warmPASS
Hot tier capacity enforced (overflow demoted)hot=4 <= 4PASS
Cold tier eviction caps size and emits eviction metriccold=3 <= 3, evicted=5PASS
Fleet sync: fleet-scoped memory replicates to a peer nodereplicated=truePASS
Fleet sync: local-scoped memory does NOT replicatepeerHasPrivate=falsePASS
Metrics: counters and latency histograms are recordedstoreCounter=true, retrieveHistogram=truePASS
Reproducible: same seed regenerates an identical synthetic corpus6cd045a8ca54 == 6cd045a8ca54PASS
Official benchmark: full BEIR/SciFact corpus + test qrels evaluated with a real learned modelBEIR / SciFact docs=5183 queries=300 model=transformer:Xenova/all-MiniLM-L6-v2PASS
Official benchmark: BEIR/SciFact nDCG@10 reflects real semantic retrieval (>= 0.40, far above chance)nDCG@10=0.6927PASS
Official benchmark: BEIR/SciFact Recall@10 >= 0.40recall@10=0.8222PASS
Official benchmark: dataset checksums recorded for independent reproductioncorpus=dec31c8182f3PASS
Official benchmark: hybrid (dense+BM25 RRF) does not regress dense on nDCG@10 (same corpus/qrels)dense=0.6443 -> hybrid=0.6902 (delta=0.0459)PASS
Official benchmark: hybrid+cross-encoder rerank is competitive with hybrid on nDCG@10hybrid=0.6902 -> rerank=0.6927 (delta=0.0025)PASS