FILTER MODE ACTIVE

#OCR

Records found: 12

#OCR19/12/2025

Mistral AI Launches OCR 3: Optimized Document Processing

Mistral AI introduces OCR 3, a powerful OCR model designed for structured document AI.

#OCR29/10/2025

DeepSeek Turns Text into Images to Fix AI’s Memory Problem

'DeepSeek demonstrated an OCR-based method that stores text as image tokens to pack more context into AI models while using fewer tokens. The approach could reduce compute needs and help models remember longer conversations.'

READ →

#OCR28/10/2025

Glyph Turns Pages into Tokens: 3–4× Visual Compression to Reach Million-Token Contexts

Glyph converts ultra-long text into page images processed by a VLM to achieve 3–4× effective token compression and roughly 4× faster prefill and decoding on 128K inputs.

READ →

#OCR17/10/2025

Baidu Launches PaddleOCR-VL (0.9B): NaViT-Style Vision-Language Model for Fast Multilingual Document Parsing

'Baidu releases PaddleOCR-VL 0.9B, combining a NaViT-style native-resolution encoder with ERNIE-4.5-0.3B to deliver fast, accurate end-to-end parsing of multilingual documents into structured Markdown and JSON.'

READ →

#OCR18/09/2025

IBM Launches Granite-Docling-258M — Compact Open-Source Document AI That Preserves Layout

'IBM released Granite-Docling-258M, a 258M-parameter open-source document AI that preserves layout and improves OCR, table, code, and equation extraction for enterprise pipelines.'

READ →

#OCR06/09/2025

FineVision: Hugging Face Releases 24M-Sample Open Dataset to Supercharge Vision-Language Models

Hugging Face open-sourced FineVision — a 24M-sample multimodal dataset that boosts VLM performance across benchmarks while keeping data leakage minimal.

READ →

#OCR18/08/2025

Ovis 2.5: Alibaba's Native-Resolution Multimodal LLMs Push Visual Reasoning Forward

'Alibaba's Ovis2.5 (9B and 2B) advances multimodal AI with a native-resolution vision transformer and an optional thinking mode, achieving top scores for open-source models under 40B and improved OCR and chart understanding.'

READ →

#OCR16/08/2025

dots.ocr: 1.7B Vision-Language Model Sets New Standard for Multilingual Document Parsing

'dots.ocr is an open-source 1.7B vision-language model that unifies layout detection and OCR to deliver state-of-the-art multilingual document parsing, including accurate table and formula extraction.'

READ →

#OCR11/08/2025

NuMind Unveils NuMarkdown-8B-Thinking: A Reasoning VLM That Turns Scanned Documents into Clean Markdown

'NuMind launched NuMarkdown-8B-Thinking, a reasoning-first OCR VLM that infers layout and outputs clean Markdown ideal for RAG and document archiving.'

READ →

#OCR18/07/2025

GLM-4.1V-Thinking: Breaking New Ground in Multimodal Reasoning and Understanding

GLM-4.1V-Thinking is a cutting-edge vision-language model that pushes the boundaries of multimodal reasoning, setting new standards across various challenging AI tasks.

READ →

#OCR15/05/2025

ByteDance Unveils Seed1.5-VL: A Breakthrough Vision-Language Model for Advanced Multimodal AI

ByteDance introduces Seed1.5-VL, a powerful vision-language model achieving state-of-the-art performance on numerous benchmarks, advancing multimodal AI understanding and reasoning.

READ →

#OCR24/04/2025

Meta AI Unveils Web-SSL: Scaling Vision Learning Without Language

Meta AI introduces Web-SSL, a family of large-scale visual self-supervised models trained without language supervision. These models achieve competitive results on multimodal benchmarks, challenging the need for language in vision learning.

READ →