PadChest-GR Raises the Bar: Multimodal, Bilingual, Sentence-Level Grounding for Radiology AI

August 28, 2025 · 3 min

A multimodal radiology breakthrough

Recent progress in medical AI shows that model advances alone are not enough; the quality and structure of data are decisive. PadChest-GR is the result of a collaboration between Centaur.ai, Microsoft Research, and the University of Alicante. It is the first multimodal, bilingual, sentence-level dataset for grounded radiology reporting, designed to align textual findings with localized chest X-ray imagery so that each diagnostic claim can be linked to a visual region.

Why image-level labels fall short

Traditional medical imaging datasets often stop at image-level labels. An X-ray might be tagged as showing cardiomegaly or as normal, but such labels do not explain where or why a model reached that conclusion. Models trained on image-level annotations tend to be brittle and are prone to hallucinations, producing confident but unsupported findings or failing to localize pathology.

Grounded radiology reporting requires two complementary dimensions of annotation:

Spatial grounding: bounding boxes or region annotations that localize findings on the image.
Linguistic grounding: sentence-level text tied to specific regions, rather than generic labels.

This combined approach reduces ambiguity and makes model outputs interpretable in clinical terms.

Human-in-the-loop annotation at clinical scale

Building PadChest-GR demanded uncompromising annotation quality. Centaur.ai provided a HIPAA-compliant labeling platform that enabled trained radiologists at the University of Alicante to draw bounding boxes on thousands of chest X-rays and link each region to sentence-level findings in both Spanish and English. The project used consensus-driven quality control and adjudication to handle edge cases and align terminology across languages.

Key platform features that made this possible included multiple-annotator consensus and disagreement resolution, performance-weighted labeling that emphasizes high-agreement experts, support for DICOM and other medical imaging formats, multimodal workflows that combine images, text, and metadata, and full audit trails with version control and live quality monitoring.

What PadChest-GR contains

PadChest-GR extends the original PadChest dataset with robust spatial grounding and bilingual sentence-level alignment. Its main attributes are:

Multimodal integration of chest X-ray images and aligned textual observations.
Bilingual annotations in Spanish and English for broader applicability.
Sentence-level granularity connecting each finding to a specific sentence.
Visual explainability so models can point to the exact region that motivated a finding.

These features make PadChest-GR a landmark resource for developing radiology models that are both accurate and interpretable.

Outcomes and clinical implications

Grounded annotation enables models to produce claims that clinicians can verify visually, improving transparency and trust. By tying linguistic claims to image regions, PadChest-GR substantially reduces the risk of AI hallucinations and speculative outputs. Bilingual annotations expand the dataset’s utility for Spanish-speaking populations and international research.

The combination of domain experts, rigorous consensus processes, and a secure annotation platform allowed the team to deliver complex multimodal labels at scale without sacrificing quality.

Broader lessons for medical AI

PadChest-GR underscores a broader truth: better data enables better medical AI. In healthcare, where decisions have high stakes, fidelity of annotations and traceability matter as much as model architecture. Investments in expert labeling infrastructure, multilingual alignment, and spatial grounding are key to building trustworthy clinical AI.

Centaur.ai’s role and ecosystem

PadChest-GR sits within Centaur.ai’s larger mission to scale expert annotation across modalities. Centaur Labs has also developed a gamified DiagnosUs app to crowdsource annotations with performance-weighted scoring, and the company supports HIPAA- and SOC 2-compliant workflows across image, text, audio, and video data. Innovations like performance-weighted labeling ensure that higher-performing experts have greater influence on final labels, improving dataset reliability for downstream clinical and research use.

What this means going forward

By combining multilingual, sentence-level text with spatially grounded image annotations, PadChest-GR sets a new benchmark for radiology datasets. Its approach promotes interpretable, reliable, and clinically useful AI models and offers a practical template for future medical AI collaborations that prioritize data quality and transparency.