Mistral AI Launches OCR 3: Optimized Document Processing
Mistral AI introduces OCR 3, a powerful OCR model designed for structured document AI.
Records found: 12
Mistral AI introduces OCR 3, a powerful OCR model designed for structured document AI.
'DeepSeek demonstrated an OCR-based method that stores text as image tokens to pack more context into AI models while using fewer tokens. The approach could reduce compute needs and help models remember longer conversations.'
Glyph converts ultra-long text into page images processed by a VLM to achieve 3–4× effective token compression and roughly 4× faster prefill and decoding on 128K inputs.
'Baidu releases PaddleOCR-VL 0.9B, combining a NaViT-style native-resolution encoder with ERNIE-4.5-0.3B to deliver fast, accurate end-to-end parsing of multilingual documents into structured Markdown and JSON.'
'IBM released Granite-Docling-258M, a 258M-parameter open-source document AI that preserves layout and improves OCR, table, code, and equation extraction for enterprise pipelines.'
Hugging Face open-sourced FineVision — a 24M-sample multimodal dataset that boosts VLM performance across benchmarks while keeping data leakage minimal.
'Alibaba's Ovis2.5 (9B and 2B) advances multimodal AI with a native-resolution vision transformer and an optional thinking mode, achieving top scores for open-source models under 40B and improved OCR and chart understanding.'
'dots.ocr is an open-source 1.7B vision-language model that unifies layout detection and OCR to deliver state-of-the-art multilingual document parsing, including accurate table and formula extraction.'
'NuMind launched NuMarkdown-8B-Thinking, a reasoning-first OCR VLM that infers layout and outputs clean Markdown ideal for RAG and document archiving.'
GLM-4.1V-Thinking is a cutting-edge vision-language model that pushes the boundaries of multimodal reasoning, setting new standards across various challenging AI tasks.
ByteDance introduces Seed1.5-VL, a powerful vision-language model achieving state-of-the-art performance on numerous benchmarks, advancing multimodal AI understanding and reasoning.
Meta AI introduces Web-SSL, a family of large-scale visual self-supervised models trained without language supervision. These models achieve competitive results on multimodal benchmarks, challenging the need for language in vision learning.