Which Memory Architecture Wins for LLM Agents: Vector, Graph, or Event Logs?
'Overview of six memory patterns for LLM agents across vector, graph, and event/log families, with practical tradeoffs for latency, hit rate, and failure modes.'
High-level tradeoffs
Modern multi-agent systems and agentic workflows depend heavily on memory design. When agents call tools, coordinate across sessions, and run long-lived workflows, the choice of memory model determines retrieval speed, reliability, and the ways failures manifest. Below I compare six common memory patterns organized into three families and describe latency profiles, hit-rate behavior, and failure modes.
Vector memory systems
Plain Vector RAG
What it is
Plain vector RAG encodes text fragments (messages, tool outputs, documents) with an embedding model and stores the vectors in an approximate nearest neighbor (ANN) index such as FAISS or HNSW. At query time, the query is embedded and the top-k nearest neighbors are returned and optionally reranked.
Latency profile
ANN indexes are optimized for sublinear scaling: graph-based structures like HNSW typically show near-logarithmic latency growth for fixed recall. In a tuned single-node setup, retrieval from millions of items is often a few tens of milliseconds plus reranking and LLM attention costs.
Hit-rate behavior
Vector RAG works well for local queries and when the required facts are captured in a small set of chunks whose embeddings align to the query model. It performs poorly on temporal queries, cross-session reasoning, and multi-hop relational questions.
Failure modes in multi-agent planning
- Lost constraints: top-k misses a global rule, producing invalid plans.
- Semantic drift: neighbors match the topic but not critical identifiers.
- Context dilution: many partially relevant chunks drown out the important bits.
When to use it
Good for single-agent or short-horizon tasks, Q&A over small to medium corpora, and as a semantic index layer rather than the final source of truth.
Tiered Vector Memory (MemGPT-style virtual context)
What it is
Tiered vector memory uses a small active working context plus a larger archive. The LLM or a controller manages what stays in the active context and what is archived, paging items in and out when needed.
Architecture and latency
Active context access is effectively free aside from attention cost. Archive accesses use vector search similar to plain RAG but are typically narrowed by task, session, or topic. Caching of hot entries reduces paging costs.
Hit-rate behavior and failure modes
This pattern boosts hit rate for frequently accessed items because they stay in the working set, but paging errors create a new failure surface: wrongly evicted items cause latent constraint loss. Per-agent divergence is a risk when every agent keeps its own working set over a shared archive.
When to use it
Useful for long conversations and workflows where unbounded context growth is unworkable and where investing in paging policies is possible.
Graph memory systems
Temporal Knowledge Graph Memory (Zep / Graphiti)
What it is
Temporal KGs model entities, events, and relations with timestamps and validity intervals. They combine conversational history and structured data to support temporal and cross-session reasoning.
Latency and hit rate
Graph queries often traverse small local neighborhoods, so latency scales with neighborhood size rather than full corpus size. Temporal and entity-centric queries benefit most; missing edges or timestamps directly reduce recall.
Failure modes
Stale edges, schema drift, and access-control partitions can cause planners to operate on incorrect or partial world models.
When to use it
Best for multi-agent coordination on shared entities, long-running tasks, and systems that can maintain ETL or streaming updates into the KG.
Knowledge-Graph RAG (GraphRAG)
What it is
GraphRAG constructs a knowledge graph over a corpus, runs hierarchical community detection, and stores summaries per community for retrieval. At query time, the system identifies relevant communities and passes summaries and supporting nodes to the LLM.
Latency, hit rate, and failure modes
Indexing is heavier but query-time costs can be competitive because only a small number of community summaries are retrieved. GraphRAG shines on multi-document, multi-hop queries but depends on extraction quality and risks over-summarization and traceability complexity.
When to use it
Suited for large documentation sets and root-cause analysis where one-time indexing and maintenance costs are acceptable.
Event and execution log systems
Execution logs and checkpoints (ALAS, LangGraph)
What they are
Execution logs and thread-scoped checkpoints treat actions and state transitions as the authoritative record. They record tool calls, inputs, outputs, and control-flow decisions and support replay, localized repair, and auditing.
Latency and hit rate
Reading recent log tails or checkpoints is cheap; analytics and global queries require indexing or offline processing. For questions about what actually happened, hit rate is near 100% when instrumentation and retention are correct.
Failure modes
Log bloat, partial instrumentation, and unsafe replay of side-effecting actions are the main risks. Transactional semantics, idempotency keys, and localized repair patterns help mitigate these issues.
When they are essential
Use when observability, auditability, and precise repro are required, and where automated repair or partial re-planning is needed.
Episodic long-term memory
What it is
Episodic memory stores cohesive episodes with task descriptions, actions (often linked to logs), outcomes, and metadata. Episodes can be searched by metadata and embeddings and distilled into higher-level patterns.
Latency and hit rate
Retrieval is two-stage: identify candidate episodes, then inspect their contents. This scales better than flat event search as history grows and improves recall for long-horizon reuse and pattern retrieval. Errors arise from poor episode boundaries or consolidation mistakes.
When to use it
Ideal for long-lived workflows, case-based reuse, and systems that incorporate episodes into training or adaptation loops.
Key takeaways
Memory is a systems problem. Vector stores provide fast, sublinear retrieval but struggle with temporal and relational reasoning. Graphs add explicit structure for entities and time, reducing some blind spots at the cost of schema and maintenance burden. Execution logs and checkpoints are the ground truth for actions and support replay and repair. In practice, robust agent architectures combine vector, graph, and event/episodic layers with clear responsibilities and known failure modes instead of relying on a single memory mechanism.
Сменить язык
Читать эту статью на русском