Analog Foundation Models: Making LLMs Robust to In-Memory AI Noise
Why analog computing matters for large language models
Analog In-Memory Computing (AIMC) executes matrix-vector multiplications directly inside dense non-volatile memory arrays, removing the memory-to-compute transfers that define conventional GPU and TPU pipelines. That architectural change promises dramatic gains in throughput and power efficiency, and could enable foundation-scale models to run on compact, energy-efficient accelerators outside data centers.
The noise challenge that blocked wider adoption
AIMC systems face stochastic noise from device variability, DAC/ADC quantization, and runtime fluctuations. Unlike deterministic quantization on digital accelerators, analog errors are non-deterministic and can catastrophically degrade model accuracy. Prior work showed small networks like CNNs and RNNs could be adapted to tolerate analog noise, but large LLMs with billions of parameters typically collapsed under AIMC conditions.
How Analog Foundation Models (AFMs) tackle noise
Researchers at IBM and ETH Zürich developed a hardware-aware training pipeline to make LLMs resilient to analog execution. Key components of the AFM approach include:
- Noise injection during training to emulate AIMC randomness
- Iterative weight clipping to keep weight distributions within device limits
- Learned static input and output quantization ranges aligned with real hardware constraints
- Distillation from pre-trained LLMs using large synthetic corpora (20B tokens)
These techniques were implemented with AIHWKIT-Lightning and applied to models such as Phi-3-mini-4k-instruct and Llama-3.2-1B-Instruct. Under analog noise, AFMs maintained performance comparable to weight-quantized 4-bit / activation 8-bit baselines and outperformed quantization-aware training and post-training quantization approaches like SpinQuant across reasoning and factual benchmarks.
Benefits beyond analog hardware
An unexpected advantage is cross-compatibility with low-precision digital hardware. Because AFMs are trained to tolerate stochastic noise and clipping, they also handle simple post-training round-to-nearest quantization more robustly than existing methods. This makes AFMs useful both for future AIMC accelerators and for current commodity inference hardware that relies on low-precision arithmetic.
Scaling inference compute and remaining gaps
The team evaluated test-time compute scaling on the MATH-500 benchmark by generating multiple answers per query and selecting the best with a reward model. AFMs showed superior scaling behavior compared to QAT models, with the accuracy gap narrowing as more inference compute was used. Despite these advances, training AFMs is resource-intensive and some reasoning tasks, such as GSM8K, still exhibit accuracy differences compared to full-precision baselines.
What this means for the future of AIMC
This work delivers the first systematic demonstration that large LLMs can be adapted for AIMC without catastrophic loss in accuracy. By combining energy efficiency, noise robustness, and digital cross-compatibility, AFMs pave a practical path toward running foundation models beyond the limits of conventional GPUs, bringing the prospect of foundation-scale AI to edge and embedded devices.
Further details and experimental results are available in the paper (https://arxiv.org/pdf/2505.09663) and accompanying GitHub resources.