ReasoningBank: Google’s Memory Layer That Lets LLM Agents Learn and Self‑Evolve at Test Time
The shortcoming in agent memory
LLM agents perform multi-step tasks like web browsing, software debugging, and repository-level fixes, but they rarely retain reusable lessons from past runs. Typical memory designs store raw interaction logs or success-only flows. Those representations are brittle across environments and miss crucial signals in failures, where practical lessons often lie.
What ReasoningBank proposes
ReasoningBank reframes agent memory as compact, human-readable strategy items rather than raw trajectories. Each memory item includes a title, a one-line description, and content that captures actionable principles: heuristics, checks, and constraints. These are high-level reasoning strategies such as preferring account pages for user-specific data, verifying pagination, avoiding infinite scroll traps, or cross-checking state against the task specification.
The retrieve-inject-distill loop
The framework uses embedding-based retrieval. For a new task, the system retrieves the top-k relevant strategy items and injects them as system guidance for the agent. After execution, the agent’s new interaction trace is judged and distilled into additional strategy items, which get appended back into the memory. The loop is intentionally simple: retrieve → inject → judge → distill → append, so performance gains can be attributed to the abstraction of strategies rather than complex memory engineering.
Learning from failures
Unlike memory systems that keep only successful workflows, ReasoningBank encodes negative constraints extracted from failures. Examples include ‘do not rely on search when a site disables indexing’ or ‘confirm save state before navigation’. These negative rules help prevent repeated mistakes and make strategy items more transferable across domains and websites.
Memory-aware test-time scaling (MaTTS)
ReasoningBank is paired with Memory-aware test-time scaling. Test-time scaling alone runs more rollouts or refinements per task, but it becomes truly effective when the system learns from the extra trajectories. MaTTS integrates scaling with strategy memory in two modes:
- Parallel MaTTS: generate k rollouts in parallel and self-contrast them to refine strategy memory.
- Sequential MaTTS: iteratively self-refine a single trajectory and mine intermediate notes as memory signals.
The two-way synergy means richer exploration produces better memory, and better memory steers exploration toward promising branches. Empirically, MaTTS achieves more monotonic and stronger gains than vanilla best-of-N approaches that lack memory.
Empirical gains
On web and software-engineering benchmarks, ReasoningBank combined with MaTTS showed up to 34.2% relative improvement in task success versus no-memory baselines, and around 16% fewer interaction steps overall. The step reductions were largest on successful trials, suggesting the approach reduces redundant actions rather than causing premature aborts.
Integration in the agent stack
ReasoningBank is designed as a plug-in memory layer for interactive agents that already use ReAct-style decision loops or test-time scaling. It complements existing verifiers and planners by injecting distilled lessons at the prompt or system level. On web tasks it can work with BrowserGym, WebArena, or Mind2Web; for software tasks it layers on SWE-Bench-Verified setups.
Paper and resources
The research paper is available on arXiv: https://arxiv.org/pdf/2509.25140 For code, tutorials, and notebooks, check the project’s GitHub page and follow the authors on social platforms for updates.