Release Notes
Release Notes
v1.1.0
Closes the highest-leverage retrieval gap from the v1 Frontier review: retrieval is now a modern two-stage hybrid pipeline.
Added
- Hybrid retrieval (default).
MemoryManagerfuses dense vector search with
a new in-process BM25 lexical index (src/sparse.ts, Bm25Index) via Reciprocal Rank Fusion (reciprocalRankFusion). New options retrievalMode: 'hybrid' | 'dense', rrfK, plus per-call retrievalMode. BM25 recovers exact rare tokens (hostnames, error codes, ticket IDs) that dense embeddings blur. The index is maintained incrementally and is store-agnostic (works with any MemoryStore).
- Cross-encoder reranking. New
Rerankerinterface +LocalCrossEncoderReranker
(src/reranker.ts, MiniLM cross-encoder via transformers.js). The manager reranks the fused head (rerankTopK, default 50); per-call rerank: false opts out. The first-stage signal feeds relevance scoring via a new similarityOverride while raw cosine is preserved for tier promotion.
- Official benchmark comparison.
bench/beir-scifact.mjsnow scores
dense vs hybrid vs hybrid+rerank on the full BEIR/SciFact corpus + qrels (same answer key), recording the measured nDCG@10 lift in the officialBenchmark.configs block of the verification report.
- Tests: +8
node:testcases for BM25, RRF, hybrid retrieval, and reranking
(32 total). verify.mjs adds BM25/RRF/hybrid/reranker functional checks and a dense→hybrid→rerank non-regression assertion on the official benchmark.
Frontier
- Frontier Layer re-rated to **Capability Grade C, Ambition 82.5/100, ceiling
coverage 50%** (from D / 79 / 37%): cap-hybrid-retrieval and cap-rerank move from disclosed seams to BUILT. The "frontier/best" claim remains gated until long-horizon memory + a multi-benchmark gauntlet ship.
Disclosed seams
- The cross-encoder reranker requires the optional
@xenova/transformers
dependency + a one-time model download; verify.mjs exercises the reranker interface with a deterministic fake, and the real model is measured in the separate BEIR/SciFact run (DISCLOSED_SEAM).
v1.0.0
First release of @forge/agent-memory-manager — hierarchical memory management for long-running AI agents and agent fleets.
Added
- Tiered memory (
MemoryTier, hot/warm/cold) with per-tier capacity, decay
half-life, and importance floor.
MemoryManagerwith the four-method API:store,retrieve,
summarizeThread, getContextForTask, plus maintain, stats, deleteKey.
AgentMemoryWrapper— agent-scoped facade with thread helpers and fleet
sharing.
- Postgres + pgvector storage via
PgVectorStore, usable with an embedded
PGlite engine (createPgliteMemoryStore) or an external Postgres server (pg), over an identical SQL/pgvector code path. InMemoryStore is the dependency-free default.
- Policies: exponential time-based decay, relevance scoring (cosine
similarity + recency/importance/frequency/tag/type heuristics), and eviction (retention score blending LRU + decayed importance + frequency).
- Summarization: local
ExtractiveSummarizerdefault;ClaudeSummarizer
hook for Anthropic (network seam); RemoteEmbeddingProvider for hosted embeddings.
- Fleet sync:
SyncBusinterface +InProcessSyncBus; fleet-scoped
replication with no-echo and last-writer-wins.
- Observability: structured JSON logger and an in-memory metrics registry
(counters, gauges, latency histograms); stats() for live tier sizes.
- Tests: 24
node:testcases;verify.mjsharness with a seeded synthetic
retrieval benchmark over both stores.
Verification & proof
verify.mjs: 20/20 checksPASS. In-memory and pgvector retrieval
precision@1 = 1.0, precision@5 ≈ 0.98 on the seeded answer key; pgvector vs JS cosine parity within 1e-5.
- Proof Layer: state
CERTIFIED, Evidence GradeB, Trust Score 80/100,
0 unsupported claims, 5 disclosed seams. See certification-report.md.
Known limitations / disclosed seams
Default embeddings are lexical (not learned-semantic); default summarizer is extractive; Postgres/pgvector and fleet sync run in-process during verification; the retrieval benchmark is synthetic. See proof/LIMITATIONS.md. Not claimed to be PRODUCTION_VALIDATED.