Skywork AI Unveils R1V2: A Breakthrough in Multimodal Reasoning with Hybrid Reinforcement Learning

Addressing the Challenge in Multimodal AI

Recent progress in multimodal artificial intelligence has exposed a significant challenge: balancing strong specialized reasoning with broad generalization across various tasks. Models emphasizing "slow-thinking," like OpenAI-o1 and Gemini-Thinking, have achieved advances in deliberate analytical reasoning but often at the cost of weaker general visual understanding and higher rates of visual hallucinations. This tradeoff presents a key obstacle as the AI community strives to build general-purpose AI systems.

Introducing Skywork R1V2

Skywork AI has launched Skywork R1V2, a next-generation multimodal reasoning model engineered to systematically tackle the reasoning-generalization tradeoff. Building on its predecessor Skywork R1V, the R1V2 model integrates a hybrid reinforcement learning framework that combines reward-model guidance with structured rule-based signals. Unlike traditional approaches relying on teacher-student distillation, R1V2 learns directly from multimodal interactions. This innovation is openly accessible through its release on Hugging Face, supporting reproducibility.

Technical Innovations

Skywork R1V2 employs Group Relative Policy Optimization (GRPO) together with a Selective Sample Buffer (SSB) to enhance training stability and efficiency. GRPO facilitates relative evaluation of candidate responses within the same query group; however, convergence issues can weaken learning signals. The SSB counters this by caching informative samples, ensuring continuous access to valuable gradients.

Additionally, the model uses Mixed Preference Optimization (MPO), blending reward-model-based preferences and rule-based constraints. This hybrid approach improves step-by-step reasoning quality while maintaining consistency in general perception tasks. The training uses a modular design, with lightweight adapters bridging a frozen Intern ViT-6B vision encoder and a pretrained language model, preserving reasoning skills while optimizing multimodal alignment.

Performance Highlights

Skywork R1V2 delivers strong results across various reasoning and multimodal benchmarks. On textual reasoning tasks, it scores 78.9% on AIME2024, 63.6% on LiveCodeBench, 73.2% on LiveBench, 82.9% on IFEVAL, and 66.3% on BFCL, outperforming its predecessor and rivaling much larger models like Deepseek R1 with 671 billion parameters.

In multimodal evaluations, R1V2 achieves 73.6% on MMMU, 74.0% on MathVista, 62.6% on OlympiadBench, 49.0% on MathVision, and 52.0% on MMMU-Pro. It consistently surpasses open-source baselines of similar or larger size, such as Qwen2.5-VL-72B and QvQ-Preview-72B, especially in tasks demanding structured problem-solving across text and images.

Comparisons with proprietary models show R1V2 narrowing the performance gap, outperforming Claude 3.5 Sonnet and Gemini 2 Flash on key multimodal benchmarks like MMMU and MathVista. The model also reduces hallucination rates significantly to 8.7% through calibrated reinforcement strategies, preserving factual accuracy alongside complex reasoning.

Qualitative Insights

Assessments reveal that R1V2 demonstrates methodical decomposition and verification in complex scientific and mathematical problems, reflecting thoughtful cognitive processes. This systematic problem-solving aligns closely with reflective reasoning patterns.

Future Prospects

Skywork R1V2 sets a new benchmark in multimodal reasoning by effectively blending hybrid reinforcement learning techniques. The model’s design tackles challenges such as vanishing learning advantages and balances optimization signals to boost both specialized reasoning and general multimodal understanding.

With leading scores on OlympiadBench and MMMU, R1V2 offers a robust open-source foundation. Skywork AI aims to further enhance general visual understanding capabilities while retaining the sophisticated reasoning strengths developed in R1V2.

For more details, explore the paper and model on Hugging Face, and stay connected via their Twitter, Telegram Channel, and LinkedIn Group. Join their growing ML community on Reddit for ongoing updates.