Meta’s Metacognitive Reuse: Turning Repeated Thought into a Behavior Handbook to Cut Tokens by 46%

Meta researchers propose a technique that compresses recurring chain-of-thought reasoning into short, named procedures called behaviors, then reuses or distills them to make large-language-model reasoning far more token-efficient.

Why this matters

Long chain-of-thought (CoT) traces often re-derive the same subroutines—inclusion–exclusion, base conversions, common geometric steps—over and over. That redundancy increases output length, raises latency, and uses up budget that could be spent exploring novel reasoning paths. Meta frames the fix as procedural memory for LLMs: a compact, searchable handbook of how-to steps that the model can consult or internalize.

How the pipeline works

The system uses three roles to build and apply a behavior handbook:

Retrieval is topic-based for MATH and embedding-based (BGE-M3 + FAISS) for AIME. Prompts include explicit solution, reflection, behavior extraction, and behavior-conditioned inference (BCI) templates. In BCI, the model is instructed to reference behaviors explicitly, which yields short, structured derivations.

Modes of evaluation and use

Key results on MATH and AIME

Importantly, the improved generalization is not attributable to an easier training corpus: teacher correctness rates in original and behavior-conditioned training sets are similar, yet BC-SFT students generalize better.

What a behavior looks like

Behaviors are compact name→instruction pairs, from general strategies to concrete mathematical tools, for example:

During BCI, students explicitly cite behaviors as they use them, making traces auditable and compact.

Retrieval, cost and latency considerations

BCI adds input tokens (the behaviors), but those inputs are pre-computable and non-autoregressive. On some commercial APIs input tokens are billed cheaper than output tokens, so shrinking output length can lower cost and latency. BC-SFT removes retrieval entirely at test time by baking behavior usage into model weights.

Why this approach works and open questions

Storing procedural instructions complements retrieval-augmented generation’s declarative memory: behaviors capture how to reason, not just what to recall. By replacing verbose derivations with concise reusable steps, the model avoids re-derivation and can reallocate compute to novel subproblems. Behavior prompts bias the decoder toward efficient, correct trajectories, and BC-SFT internalizes those trajectories so models can invoke them implicitly.

Open engineering questions include scaling the handbook beyond math, organizing a growing behavior corpus, and maintaining quality and relevance as behaviors accumulate.

For more details see the paper: https://arxiv.org/pdf/2509.13237