<RETURN_TO_BASE

DeepSeek-Prover-V2: Advancing Formal Theorem Proving with AI and Reinforcement Learning

DeepSeek-AI released DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving using subgoal decomposition and reinforcement learning, achieving state-of-the-art results on multiple formal reasoning benchmarks.

The Challenge of Formal Mathematical Reasoning in AI

Formal mathematical reasoning demands strict logical consistency, where every proof step must be precise, fully described, and verifiable by computational systems. Unlike informal problem solving that allows intuition and heuristics, formal theorem proving relies on frameworks such as Lean, Coq, and Isabelle that enforce logical soundness without omissions or assumptions. This presents a significant challenge for AI models, especially large language models (LLMs), which excel at generating coherent natural language but typically lack the rigor to produce verifiable formal proofs.

Bridging Informal and Formal Reasoning

Current language models generate human-like explanations and solve math problems informally but struggle to meet the structural precision required by formal logic systems. Humans can jump deductive steps intuitively, whereas proof assistants require a fully specified, unambiguous sequence of proof steps. The complexity grows with advanced theorems in domains like number theory or geometry, where exactness is crucial.

Existing Approaches and Their Limitations

Some approaches generate natural language proof sketches that are manually or semi-automatically translated into formal proofs, often by decomposing complex theorems into smaller subgoals or lemmas. Frameworks like “Draft, Sketch, and Prove” use language models to create proof outlines that are then formalized. Hierarchical reinforcement learning has also been applied to break down problems into layers. However, these methods often fail to produce fully verifiable formal proofs in systems like Lean or Coq, partly due to limited training data and poor learning signals from failed proof attempts.

Introducing DeepSeek-Prover-V2

Researchers at DeepSeek-AI developed DeepSeek-Prover-V2, a model designed to generate formal proofs by combining subgoal decomposition with reinforcement learning. The pipeline starts with DeepSeek-V3 breaking down a complex theorem into subgoals, each represented as a “have” statement in Lean 4 with placeholders for incomplete proofs. A 7B-parameter prover model then attempts to complete each subgoal. Once all subgoals are resolved, the system synthesizes them into a complete formal proof alongside the original natural language reasoning, creating a rich synthetic dataset for reinforcement learning without using any human-annotated proofs.

Technical Innovations

  • Cold-start pipeline: DeepSeek-V3 generates natural language proof sketches that are transformed into formal theorems with unresolved parts.
  • Recursive subgoal solving: The 7B prover solves subgoals recursively, reducing computational costs while maintaining rigor.
  • Curriculum learning: Training tasks increase in complexity over time using two types of subgoal theorems — dependent and independent of preceding subgoals.
  • Consistency-based reward: Reinforcement learning enforces alignment between proof sketches and formal solutions, improving accuracy.

Performance Highlights

  • Achieved an 88.9% pass rate on the MiniF2F-test benchmark (Pass@8192), outperforming previous models like Kimina-Prover (82.0%) and Geodel-Prover (64.7%).
  • Solved 49 out of 658 challenging problems from PutnamBench.
  • Addressed 6 out of 15 AIME 2024–2025 problems on the new ProverBench dataset containing 325 formalized problems.
  • Demonstrated strong generalization and formal verifiability, competitive even against DeepSeek-V3’s natural-language reasoning.

Key Takeaways

DeepSeek-Prover-V2 pushes the frontier of AI-driven formal theorem proving by unifying natural language proof sketches with formal proof construction. Its curriculum-guided reinforcement learning and synthetic data bootstrapping offer a scalable approach to developing rigorously verifiable AI proofs. This advancement marks a significant step towards integrating AI fluency in informal reasoning with the strict demands of formal verification.

For more details, explore the research paper and GitHub repository, and follow DeepSeek-AI on Twitter, Telegram, and LinkedIn.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский