Agent Memory Manager

Release Notes

← Back to overview

Release Notes

v1.1.0

Closes the highest-leverage retrieval gap from the v1 Frontier review: retrieval is now a modern two-stage hybrid pipeline.

Added

  • Hybrid retrieval (default). MemoryManager fuses dense vector search with

a new in-process BM25 lexical index (src/sparse.ts, Bm25Index) via Reciprocal Rank Fusion (reciprocalRankFusion). New options retrievalMode: 'hybrid' | 'dense', rrfK, plus per-call retrievalMode. BM25 recovers exact rare tokens (hostnames, error codes, ticket IDs) that dense embeddings blur. The index is maintained incrementally and is store-agnostic (works with any MemoryStore).

  • Cross-encoder reranking. New Reranker interface + LocalCrossEncoderReranker

(src/reranker.ts, MiniLM cross-encoder via transformers.js). The manager reranks the fused head (rerankTopK, default 50); per-call rerank: false opts out. The first-stage signal feeds relevance scoring via a new similarityOverride while raw cosine is preserved for tier promotion.

  • Official benchmark comparison. bench/beir-scifact.mjs now scores

dense vs hybrid vs hybrid+rerank on the full BEIR/SciFact corpus + qrels (same answer key), recording the measured nDCG@10 lift in the officialBenchmark.configs block of the verification report.

  • Tests: +8 node:test cases for BM25, RRF, hybrid retrieval, and reranking

(32 total). verify.mjs adds BM25/RRF/hybrid/reranker functional checks and a dense→hybrid→rerank non-regression assertion on the official benchmark.

Frontier

  • Frontier Layer re-rated to **Capability Grade C, Ambition 82.5/100, ceiling

coverage 50%** (from D / 79 / 37%): cap-hybrid-retrieval and cap-rerank move from disclosed seams to BUILT. The "frontier/best" claim remains gated until long-horizon memory + a multi-benchmark gauntlet ship.

Disclosed seams

  • The cross-encoder reranker requires the optional @xenova/transformers

dependency + a one-time model download; verify.mjs exercises the reranker interface with a deterministic fake, and the real model is measured in the separate BEIR/SciFact run (DISCLOSED_SEAM).

v1.0.0

First release of @forge/agent-memory-manager — hierarchical memory management for long-running AI agents and agent fleets.

Added

  • Tiered memory (MemoryTier, hot/warm/cold) with per-tier capacity, decay

half-life, and importance floor.

  • MemoryManager with the four-method API: store, retrieve,

summarizeThread, getContextForTask, plus maintain, stats, deleteKey.

  • AgentMemoryWrapper — agent-scoped facade with thread helpers and fleet

sharing.

  • Postgres + pgvector storage via PgVectorStore, usable with an embedded

PGlite engine (createPgliteMemoryStore) or an external Postgres server (pg), over an identical SQL/pgvector code path. InMemoryStore is the dependency-free default.

  • Policies: exponential time-based decay, relevance scoring (cosine

similarity + recency/importance/frequency/tag/type heuristics), and eviction (retention score blending LRU + decayed importance + frequency).

  • Summarization: local ExtractiveSummarizer default; ClaudeSummarizer

hook for Anthropic (network seam); RemoteEmbeddingProvider for hosted embeddings.

  • Fleet sync: SyncBus interface + InProcessSyncBus; fleet-scoped

replication with no-echo and last-writer-wins.

  • Observability: structured JSON logger and an in-memory metrics registry

(counters, gauges, latency histograms); stats() for live tier sizes.

  • Tests: 24 node:test cases; verify.mjs harness with a seeded synthetic

retrieval benchmark over both stores.

Verification & proof

  • verify.mjs: 20/20 checks PASS. In-memory and pgvector retrieval

precision@1 = 1.0, precision@5 ≈ 0.98 on the seeded answer key; pgvector vs JS cosine parity within 1e-5.

  • Proof Layer: state CERTIFIED, Evidence Grade B, Trust Score 80/100,

0 unsupported claims, 5 disclosed seams. See certification-report.md.

Known limitations / disclosed seams

Default embeddings are lexical (not learned-semantic); default summarizer is extractive; Postgres/pgvector and fleet sync run in-process during verification; the retrieval benchmark is synthetic. See proof/LIMITATIONS.md. Not claimed to be PRODUCTION_VALIDATED.