Google Turns TimesFM Into a Few-Shot Time-Series Forecaster with In-Context Fine-Tuning
What problem this solves
Forecasting workflows often face a trade-off: deploy a model per dataset via supervised fine-tuning for top accuracy, or use a single foundation model zero-shot for simplicity but weaker domain fit. Google Research addresses that middle ground by teaching a single TimesFM checkpoint to adapt on the fly during inference. Instead of retraining per tenant or dataset, the model consumes a handful of related series as in-context examples and uses them to produce adapted forecasts.
How in-context fine-tuning works
The technique, called in-context fine-tuning or ICF, is a continued pretraining recipe applied to TimesFM. TimesFM itself is a patched, decoder-only transformer that tokenizes 32-point input patches and de-tokenizes 128-point outputs through a shared MLP head. For ICF, training sequences are constructed by interleaving the target history with multiple support series and separating them with a learnable boundary token. The objective remains next-token prediction, but the model learns to use cross-example causal attention so support series act as informative exemplars rather than noise.
What few-shot means in this setting
At inference, the user concatenates the target history with k support snippets from related series, each delimited by the separator token. The model has been trained to attend across these mixed sequences, enabling few-shot adaptation through prompt composition instead of parameter updates. This is analogous to few-shot prompting in large language models but applied to numeric time-series patches rather than text tokens.
Empirical performance versus supervised fine-tuning
On a 23-dataset out-of-domain benchmark, TimesFM with ICF (TimesFM-ICF) matches the accuracy of per-dataset supervised fine-tuning and improves over the base TimesFM by 6.8 percent in geometric mean of scaled MASE. There is an expected accuracy versus latency trade-off: more in-context examples generally improve forecasts but increase inference time. Ablations show that structured in-context examples outperform merely extending the context length without organization.
How this differs from Chronos-style approaches
Chronos and related work use discrete tokenization of values and have strong zero-shot performance along with fast variants. The key distinction of Google’s contribution is enabling a time-series foundation model to behave like an LLM few-shot learner: learning cross-series structure at inference through prompts. This bridges train-time adaptation and prompt-time adaptation for numeric forecasting.
Architectural specifics to watch
Important modifications that enable ICF include:
- Learnable separator tokens to mark boundaries between series and examples
- Causal self-attention applied across mixed histories and support examples
- Continued patching strategy and shared MLP heads for encoding and decoding
- Continued pretraining on interleaved target and support sequences so cross-example reasoning is learned before deployment
Practical implications
TimesFM-ICF makes a single pretrained checkpoint behave like a practical few-shot forecaster. For multi-tenant systems or deployments where per-dataset training loops are costly, selecting and curating support series becomes the primary control to tune model behavior. This can reduce MLOps overhead while maintaining fine-tuning level accuracy.