Agent Memory Manager

Verify

← Back to overview

VERIFY — agent-memory-manager

How verification runs and what each check asserts. The harness is verify.mjs; the unit suite is under test/ (compiled to dist/test).

How it runs

node verify.mjs performs, in order:

  1. Build — compiles TypeScript with tsc.
  2. Unit suite — runs node --test over the compiled tests and asserts zero

failures.

  1. Synthetic corpus — a seeded PRNG (mulberry32, seed 20260625) builds N

topics, each with a unique vocabulary, so same-topic texts share tokens (high cosine) and cross-topic texts do not. This yields a known retrieval answer key.

  1. Dual-store benchmark — the same corpus is loaded into the InMemoryStore

and a live PGlite Postgres+pgvector engine; retrieval precision@k is measured against the answer key on both.

  1. Policy/behaviour checks — decay demotion, tier eviction, fleet sync,

determinism, and metrics.

What each check asserts

CheckAssertion
Unit suite passesnode --test reports 0 failing tests (24 tests).
In-memory precision@1 == 1.0Top hit for each topic query belongs to that topic.
In-memory precision@5 ≥ 0.95≥95% of the top-5 belong to the queried topic.
Deterministic retrievalRe-running the same query yields identical ordering.
pgvector precision@1 / @5Same thresholds, computed over real pgvector.
pgvector vs JS cosine1 - (embedding <=> q) agrees with the JS cosine within 1e-5.
pgvector persistenceRow count in the DB equals the number of stored memories.
Store parityIn-memory and pgvector pick the same top-1 topic for every query.
summarizeThreadProduces a non-empty summary and persists a summary-typed memory that is retrievable.
getContextForTask budgetAssembled context respects the character budget and ranks the summary first.
Time decay demotionA hot memory whose decayed importance drops below the hot floor is demoted to warm.
Hot capacityAfter overflow, the hot tier holds ≤ its configured capacity.
Cold evictionThe cold tier is capped and memory_evict_total{tier=cold} is emitted.
Fleet syncFleet-scoped memories replicate to a peer; local-scoped ones do not.
MetricsCounters and latency histograms are recorded.
Reproducible corpusThe same seed regenerates an identical corpus fingerprint.

Outputs

  • verification-report.json / .md — machine- and human-readable results.
  • evidence/verification-results.json — copy of the report.
  • proof/evidence/verify.log — raw stdout captured by the Proof Layer.

A non-zero exit code is returned if any check fails.