OpenTSLM Unveiled: Language Models Built for Medical Time-Series Data

October 11, 2025 · 3 min

A new capability for medical AI

Researchers from Stanford University and ETH Zurich, with contributions from Google Research and Amazon, introduced OpenTSLM, a family of Time-Series Language Models that bring native time-series reasoning to pretrained large language models. OpenTSLM is designed to interpret continuous medical signals such as ECGs, EEGs, and wearable sensor streams, addressing a long-standing blind spot in general LLMs.

The modality gap in medical diagnosis

Clinical practice depends on temporal patterns. Vital signs, waveform biomarkers, and other longitudinal signals must be understood as sequences rather than static snapshots. Conventional LLMs operate on discrete text tokens and struggle to capture the dense, high-frequency dynamics of time-series signals. Converting signals into text or static images has proven inefficient and lossy, making accurate medical interpretation difficult at scale.

Why vision-language models fall short

A frequent workaround is to render time series as plots and feed them to vision-language models. The OpenTSLM research shows this is ineffective for precise medical tasks. VLMs are optimized for natural images and object recognition rather than the sequential dependencies and fine-grained temporal features that matter in ECGs or EEGs. When high-frequency signals are rasterized into pixels, important temporal cues vanish, and VLMs fail to match the specificity required for diagnosis.

Native time-series as a modality: OpenTSLM approach

OpenTSLM embeds time series directly as a modality in pretrained LLM backbones such as Llama and Gemma, enabling natural language queries and chain-of-thought reasoning over raw sensor streams. The project paper is available at https://www.arxiv.org/abs/2510.02410.

The team explored two main architectures:

OpenTSLM-SoftPrompt (implicit modeling)

This design converts time-series data into learnable tokens that are mixed with text tokens as a form of soft prompting. It works for short sequences but scales poorly: longer inputs demand exponentially more memory, limiting practicality for long medical records or continuous monitoring.

OpenTSLM-Flamingo (explicit modeling)

The Flamingo-inspired variant models time series as a separate modality with a specialized encoder and a Perceiver Resampler to produce a fixed-size representation regardless of input length. That representation is fused with text via gated cross-attention. This explicit modeling keeps memory usage stable across long inputs. In training experiments on ECG data, the Flamingo variant used about 40 GB of VRAM versus 110 GB for the SoftPrompt variant on the same backbone, demonstrating far better scalability. Refer to the paper at https://www.arxiv.org/abs/2510.02410 for details.

Performance gains and benchmarks

OpenTSLM was evaluated on three new Chain-of-Thought style medical datasets created by the authors: HAR-CoT for activity recognition, Sleep-CoT for EEG sleep staging, and ECG-QA-CoT for ECG question answering. Results show large improvements over baselines and even over a frontier model like GPT-4o when that model processed data as text or images.

Key results include:

Sleep staging: OpenTSLM achieved 69.9% F1 versus 9.05% for the best fine-tuned text-only baseline.
Activity recognition: OpenTSLM scored 65.4% F1.

Even small OpenTSLM models with about 1 billion parameters substantially outperformed GPT-4o, highlighting that domain-adapted architectures can beat scale-only approaches and enabling more efficient, potentially on-device medical AI.

Clinical validation and interpretability

Trust is essential in medical AI. OpenTSLM produces human-readable rationales via chain-of-thought generation, making its reasoning accessible to clinicians. Expert review by five cardiologists at Stanford Hospital assessed the OpenTSLM-Flamingo model on ECG interpretations. The model produced correct or partially correct interpretations in 92.9% of cases and received positive marks for integrating clinical context in 85.1% of evaluations.

Broader implications and open science

By treating time series as a first-class modality, OpenTSLM paves the way for general-purpose TSLMs that can be applied beyond healthcare to finance, industrial monitoring, and more. To accelerate research, the teams from Stanford and ETH Zurich have open-sourced code, datasets, and model weights. See the paper at https://www.arxiv.org/abs/2510.02410 and the project GitHub for tutorials, code, and notebooks.