Which Memory Architecture Wins for LLM Agents: Vector, Graph, or Event Logs?

High-level tradeoffs

Modern multi-agent systems and agentic workflows depend heavily on memory design. When agents call tools, coordinate across sessions, and run long-lived workflows, the choice of memory model determines retrieval speed, reliability, and the ways failures manifest. Below I compare six common memory patterns organized into three families and describe latency profiles, hit-rate behavior, and failure modes.

Vector memory systems

Plain Vector RAG

What it is

Plain vector RAG encodes text fragments (messages, tool outputs, documents) with an embedding model and stores the vectors in an approximate nearest neighbor (ANN) index such as FAISS or HNSW. At query time, the query is embedded and the top-k nearest neighbors are returned and optionally reranked.

Latency profile

ANN indexes are optimized for sublinear scaling: graph-based structures like HNSW typically show near-logarithmic latency growth for fixed recall. In a tuned single-node setup, retrieval from millions of items is often a few tens of milliseconds plus reranking and LLM attention costs.

Hit-rate behavior

Vector RAG works well for local queries and when the required facts are captured in a small set of chunks whose embeddings align to the query model. It performs poorly on temporal queries, cross-session reasoning, and multi-hop relational questions.

Failure modes in multi-agent planning

Lost constraints: top-k misses a global rule, producing invalid plans.
Semantic drift: neighbors match the topic but not critical identifiers.
Context dilution: many partially relevant chunks drown out the important bits.

When to use it

Good for single-agent or short-horizon tasks, Q&A over small to medium corpora, and as a semantic index layer rather than the final source of truth.

Tiered Vector Memory (MemGPT-style virtual context)

What it is

Tiered vector memory uses a small active working context plus a larger archive. The LLM or a controller manages what stays in the active context and what is archived, paging items in and out when needed.

Architecture and latency

Active context access is effectively free aside from attention cost. Archive accesses use vector search similar to plain RAG but are typically narrowed by task, session, or topic. Caching of hot entries reduces paging costs.

Hit-rate behavior and failure modes

This pattern boosts hit rate for frequently accessed items because they stay in the working set, but paging errors create a new failure surface: wrongly evicted items cause latent constraint loss. Per-agent divergence is a risk when every agent keeps its own working set over a shared archive.

When to use it

Useful for long conversations and workflows where unbounded context growth is unworkable and where investing in paging policies is possible.

Graph memory systems

Temporal Knowledge Graph Memory (Zep / Graphiti)

What it is

Temporal KGs model entities, events, and relations with timestamps and validity intervals. They combine conversational history and structured data to support temporal and cross-session reasoning.

Latency and hit rate

Graph queries often traverse small local neighborhoods, so latency scales with neighborhood size rather than full corpus size. Temporal and entity-centric queries benefit most; missing edges or timestamps directly reduce recall.

Failure modes

Stale edges, schema drift, and access-control partitions can cause planners to operate on incorrect or partial world models.

When to use it

Best for multi-agent coordination on shared entities, long-running tasks, and systems that can maintain ETL or streaming updates into the KG.

Knowledge-Graph RAG (GraphRAG)

What it is

GraphRAG constructs a knowledge graph over a corpus, runs hierarchical community detection, and stores summaries per community for retrieval. At query time, the system identifies relevant communities and passes summaries and supporting nodes to the LLM.

Latency, hit rate, and failure modes

Indexing is heavier but query-time costs can be competitive because only a small number of community summaries are retrieved. GraphRAG shines on multi-document, multi-hop queries but depends on extraction quality and risks over-summarization and traceability complexity.

When to use it

Suited for large documentation sets and root-cause analysis where one-time indexing and maintenance costs are acceptable.

Event and execution log systems

Execution logs and checkpoints (ALAS, LangGraph)

What they are

Execution logs and thread-scoped checkpoints treat actions and state transitions as the authoritative record. They record tool calls, inputs, outputs, and control-flow decisions and support replay, localized repair, and auditing.

Latency and hit rate

Reading recent log tails or checkpoints is cheap; analytics and global queries require indexing or offline processing. For questions about what actually happened, hit rate is near 100% when instrumentation and retention are correct.

Failure modes

Log bloat, partial instrumentation, and unsafe replay of side-effecting actions are the main risks. Transactional semantics, idempotency keys, and localized repair patterns help mitigate these issues.

When they are essential

Use when observability, auditability, and precise repro are required, and where automated repair or partial re-planning is needed.

Episodic long-term memory

What it is

Episodic memory stores cohesive episodes with task descriptions, actions (often linked to logs), outcomes, and metadata. Episodes can be searched by metadata and embeddings and distilled into higher-level patterns.

Latency and hit rate

Retrieval is two-stage: identify candidate episodes, then inspect their contents. This scales better than flat event search as history grows and improves recall for long-horizon reuse and pattern retrieval. Errors arise from poor episode boundaries or consolidation mistakes.

When to use it

Ideal for long-lived workflows, case-based reuse, and systems that incorporate episodes into training or adaptation loops.

Key takeaways

Memory is a systems problem. Vector stores provide fast, sublinear retrieval but struggle with temporal and relational reasoning. Graphs add explicit structure for entities and time, reducing some blind spots at the cost of schema and maintenance burden. Execution logs and checkpoints are the ground truth for actions and support replay and repair. In practice, robust agent architectures combine vector, graph, and event/episodic layers with clear responsibilities and known failure modes instead of relying on a single memory mechanism.

Which Memory Architecture Wins for LLM Agents: Vector, Graph, or Event Logs?

Сменить язык