ByteDance Unveils ProtoReasoning: Boosting Large Language Model Generalization with Logic-Based Prototypes

The Importance of Cross-Domain Reasoning in Large Language Models

Recent advancements in large language models (LLMs), particularly those trained with Long Chain-of-Thought (CoT) techniques, have demonstrated remarkable generalization across various domains. Models trained on specific tasks such as mathematics or coding surprisingly perform well on unrelated tasks like logical puzzles or creative writing. This suggests that these models internalize core reasoning patterns, called abstract reasoning prototypes, which transcend individual domains and help the model focus on the underlying cognitive processes rather than task-specific presentations.

From Chain-of-Thought to Reinforcement Learning in Reasoning

The landscape of LLM reasoning training has evolved from simple Chain-of-Thought and supervised fine-tuning to reinforcement learning (RL) approaches. Models such as DeepSeek-R1 and Seed-Thinking-v1.5 apply RL with verifiable rewards based on accuracy against ground-truth answers. This allows the models to explore complex reasoning pathways, learn from mistakes, and iteratively refine their solutions. The novel concept of "reasoning prototypes" emerges here, aiming to capture core thinking patterns that enable superior generalization across diverse domains.

Introducing ProtoReasoning: Structured Reasoning Using Prolog and PDDL

Researchers from ByteDance Seed and Shanghai Jiao Tong University introduced ProtoReasoning, a framework that enhances LLMs' reasoning capabilities through structured prototype representations like Prolog (for logic) and PDDL (for planning). The framework automates the translation of problems into these formal languages, verifies solutions with interpreters, and synthesizes scalable problem sets without manual labeling. Training models on these prototypes led to significant performance gains: logical reasoning improved by 4.7%, planning by 6.3%, general reasoning by 4.0%, and mathematics by 1.0%. Crucially, this structured training fosters better generalization across related tasks, confirming the value of abstract reasoning patterns.

Architecture of ProtoReasoning: Prototype Constructor and Verification System

ProtoReasoning consists of two main components: a Prototype Constructor that converts natural language problems into formal representations, and a Verification System that validates solution correctness. For Prolog, a four-step pipeline generates diverse logic problems verified with SWI-Prolog. Planning tasks use PDDL to build plan generation, completion, and reordering problems, validated with the VAL tool. The training process includes teacher model distillation, difficulty-based sampling, and filtering to ensure high-quality fine-tuning data, enhancing model robustness and generalization.

Evaluation Results: Enhanced Reasoning and Planning Performance

Using a 150-billion parameter Mixture-of-Experts model (with 15B active parameters), ProtoReasoning was tested on curated Prolog and PDDL datasets. The framework consistently improved performance on logical reasoning, planning, and general benchmarks like MMLU and AIME 2024. Ablation studies comparing Prolog-based training to natural language (NL) versions on matched datasets showed that both formats significantly outperformed the baseline, with Prolog nearly matching NL performance. This indicates that structured prototype training is applicable to natural language tasks, although explicit reasoning methods such as chain-of-thought remain essential. Categories with fewer samples showed limited gains, highlighting the need for sufficient data.

Implications and Future Directions

ProtoReasoning supports the hypothesis that abstract reasoning prototypes enable LLMs to generalize better across domains. Training on structured formats like Prolog and PDDL improves logical reasoning, planning, and problem-solving abilities. While these results are promising, the theoretical understanding of reasoning prototypes requires further formalization. Future research aims to mathematically define these concepts and validate findings using open-source models and datasets.

For more details, check out the original research paper. Credit goes to the ByteDance Seed and Shanghai Jiao Tong University researchers. Stay connected via Twitter, the ML SubReddit community, and our newsletter.