<RETURN_TO_BASE

Mistral AI Launches OCR 3: Optimized Document Processing

Mistral AI introduces OCR 3, a powerful OCR model designed for structured document AI.

Overview of Mistral OCR 3

Mistral AI has released Mistral OCR 3, its latest optical character recognition service that powers the company’s Document AI stack. The model, named mistral-ocr-2512, is built to extract interleaved text and images from PDFs and other documents while preserving structure. This is offered at an aggressive price of $2 per 1,000 pages, with a 50% discount available when used through the Batch API.

What Mistral OCR 3 is Optimized for?

Mistral OCR 3 targets typical enterprise document workloads. The model is tuned for forms, scanned documents, complex tables, and handwriting, achieving a 74% overall win rate over Mistral OCR 2 across these categories using a fuzzy match metric against ground truth.

The model outputs markdown that preserves document layout, and when table formatting is enabled, it enriches the output with HTML-based table representations, facilitating content retrieval and analytics.

Role in Mistral Document AI

OCR 3 integrates within Mistral Document AI, combining OCR with structured data extraction and Document QnA. It powers the Document AI Playground in Mistral AI Studio, allowing users to upload PDFs or images and receive either clean text or structured JSON without coding.

Inputs, Outputs, And Structure

The OCR processor accepts multiple document formats through a single API. The document field can point to:

  • document_url for PDFs, pptx, docx, and more
  • image_url for formats such as png, jpeg, or avif
  • Uploaded or base64 encoded PDFs or images.

The response is a JSON object with a pages array containing details such as markdown strings, images, tables, detected hyperlinks, and more. This inventive structure significantly simplifies downstream data reconstruction.

Upgrades Over Mistral OCR 2

Mistral OCR 3 brings notable upgrades including:

  • Handwriting: Enhanced accuracy for cursive and mixed content annotations.
  • Forms: Improved detection of boxes, labels, and handwritten entries in complex layouts.
  • Scanned documents: Greater robustness against compression artifacts and background noise.
  • Complex tables: Advanced table structure reconstructions with appropriate HTML tags.

Pricing, Batch Inference, And Annotations

The OCR 3 model pricing is set at $2 per 1,000 pages for standard OCR and $3 per 1,000 annotated pages. The Batch Inference API reduces the effective cost to $1 per 1,000 pages for large-scale processing, making it a viable option for extensive document workflows.

Key Takeaways

  1. Model and Role: Mistral OCR 3, identified as mistral-ocr-2512, is the OCR service for Mistral’s Document AI stack.
  2. Accuracy Gains: Outperforms Mistral OCR 2 with a 74% win rate on various document types, setting a new standard for OCR technology.
  3. Structured Outputs for RAG: Extracts text and images, maintaining structure for downstream systems.
  4. API and Document Formats: Accessible via the /v1/ocr endpoint, supporting diverse document formats and optional extra features.
  5. Pricing and Batch Processing: Economical rates make it suitable for high-volume document processing tasks.
🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский