OpenAI Unveils Reinforcement Fine-Tuning on o4-mini for Advanced Custom AI Models

Introducing Reinforcement Fine-Tuning (RFT)

OpenAI has introduced Reinforcement Fine-Tuning (RFT) on its o4-mini reasoning model, a novel approach that enhances the customization of foundation models for specialized tasks. Unlike traditional supervised fine-tuning, RFT leverages reinforcement learning principles, enabling developers to define custom objectives and reward functions that guide the model's improvement with precision.

How RFT Works

RFT moves beyond labeled data by incorporating a task-specific grader—a function that evaluates and scores model outputs according to defined criteria. This reward signal trains the model to generate outputs that better align with desired behaviors, especially valuable in subjective or complex tasks where clear ground truth is lacking. For example, in medical explanations, while optimal phrasing may be subjective, a grader can assess clarity, accuracy, and completeness to guide learning.

Why Choose the o4-mini Model?

The o4-mini, released in April 2025, is a compact yet powerful reasoning model optimized for both text and image inputs. It excels in structured reasoning and chain-of-thought prompts, making it ideal for multitask applications. By applying RFT to o4-mini, OpenAI offers a lightweight model that can be finely tuned for domain-specific reasoning tasks, maintaining computational efficiency suitable for real-time use.

Real-World Applications of RFT on o4-mini

Early adopters have demonstrated significant improvements using RFT:

Accordance AI enhanced tax analysis accuracy by 39% using rule-based grading for compliance.
Ambience Healthcare improved ICD-10 medical coding accuracy by 12 points over physician labels.
Harvey, a legal AI startup, raised citation extraction F1 scores by 20%, matching GPT-4o performance with lower latency.
Runloop increased valid Stripe API code generation by 12% through syntax-based grading.
Milo improved scheduling output quality on complex prompts by 25 points.
SafetyKit boosted content moderation F1 accuracy from 86% to 90% by enforcing detailed policy compliance. These cases highlight RFT's ability to tailor models precisely for diverse, high-stakes applications.

Getting Started with RFT on o4-mini

To use RFT, developers must:

Design a Grading Function: Create a Python function scoring outputs between 0 and 1, reflecting task preferences like correctness or tone.
Prepare a Dataset: Gather diverse, challenging prompts representative of the target domain.
Launch Training: Use OpenAI’s fine-tuning API or dashboard to start RFT training with adjustable settings.
Evaluate and Iterate: Monitor reward trends and refine grading to optimize model performance.

Comprehensive documentation and examples are available in OpenAI’s RFT guide.

Access and Pricing

RFT is accessible to verified organizations. Training costs are $100 per active training hour. If hosted OpenAI models (e.g., GPT-4o) are used for grading, token usage fees apply separately. Organizations sharing datasets for research benefit from a 50% discount on training costs.

A New Era for Custom AI Model Training

RFT marks a significant evolution in foundation model adaptation by enabling feedback-driven learning aligned with specific real-world goals. With its availability on the o4-mini model, developers gain unprecedented control over fine-tuning not just language but reasoning capabilities, paving the way for more reliable and efficient AI deployments.