Google Turns TimesFM Into a Few-Shot Time-Series Forecaster with In-Context Fine-Tuning

September 24, 2025 · 3 min

What problem this solves

Forecasting workflows often face a trade-off: deploy a model per dataset via supervised fine-tuning for top accuracy, or use a single foundation model zero-shot for simplicity but weaker domain fit. Google Research addresses that middle ground by teaching a single TimesFM checkpoint to adapt on the fly during inference. Instead of retraining per tenant or dataset, the model consumes a handful of related series as in-context examples and uses them to produce adapted forecasts.

How in-context fine-tuning works

The technique, called in-context fine-tuning or ICF, is a continued pretraining recipe applied to TimesFM. TimesFM itself is a patched, decoder-only transformer that tokenizes 32-point input patches and de-tokenizes 128-point outputs through a shared MLP head. For ICF, training sequences are constructed by interleaving the target history with multiple support series and separating them with a learnable boundary token. The objective remains next-token prediction, but the model learns to use cross-example causal attention so support series act as informative exemplars rather than noise.

What few-shot means in this setting

At inference, the user concatenates the target history with k support snippets from related series, each delimited by the separator token. The model has been trained to attend across these mixed sequences, enabling few-shot adaptation through prompt composition instead of parameter updates. This is analogous to few-shot prompting in large language models but applied to numeric time-series patches rather than text tokens.

Empirical performance versus supervised fine-tuning

On a 23-dataset out-of-domain benchmark, TimesFM with ICF (TimesFM-ICF) matches the accuracy of per-dataset supervised fine-tuning and improves over the base TimesFM by 6.8 percent in geometric mean of scaled MASE. There is an expected accuracy versus latency trade-off: more in-context examples generally improve forecasts but increase inference time. Ablations show that structured in-context examples outperform merely extending the context length without organization.

How this differs from Chronos-style approaches

Chronos and related work use discrete tokenization of values and have strong zero-shot performance along with fast variants. The key distinction of Google’s contribution is enabling a time-series foundation model to behave like an LLM few-shot learner: learning cross-series structure at inference through prompts. This bridges train-time adaptation and prompt-time adaptation for numeric forecasting.

Architectural specifics to watch

Important modifications that enable ICF include:

Learnable separator tokens to mark boundaries between series and examples
Causal self-attention applied across mixed histories and support examples
Continued patching strategy and shared MLP heads for encoding and decoding
Continued pretraining on interleaved target and support sequences so cross-example reasoning is learned before deployment

Practical implications

TimesFM-ICF makes a single pretrained checkpoint behave like a practical few-shot forecaster. For multi-tenant systems or deployments where per-dataset training loops are costly, selecting and curating support series becomes the primary control to tune model behavior. This can reduce MLOps overhead while maintaining fine-tuning level accuracy.