Microsoft Launches OptiMind: AI Model for Optimization

What OptiMind Is And What It Outputs

OptiMind-SFT is a specialized 20B parameter Mixture of Experts model in the GPT OSS transformer family. About 3.6B parameters are active per token, making inference costs lower while maintaining high capacity. It supports a context length of 128,000 tokens, allowing for long specifications and multi-step reasoning within a single request.

The model takes a natural language description of an optimization problem as its input. The output includes a mathematical formulation and executable Python code using GurobiPy, defining decision variables, constraints, objectives, and executing the Gurobi solver to print optimal results.

OptiMind serves as a formulation layer between domain experts and standard MILP solvers, generating the MILP that the solver will optimize without replacing it.

Architecture, Training Setup, And Datasets

The base model is openai/gpt-oss-20b, fine-tuned into microsoft/OptiMind-SFT using cleaned optimization datasets. This Mixture of Experts transformer architecture activates a subset of experts per token and is released under the MIT license.

Training employs 8 NVIDIA B200 GPUs. Inference and evaluation utilize 8 NVIDIA H100 GPUs, and reported fine-tuning time is approximately 8 hours. For operational needs, a minimum of 32 GB GPU memory on A100, H100, or B200 hardware is recommended.

Supervised fine-tuning relies on cleaned versions of OR Instruct and OptMATH Train. Testing incorporates expert-validated and re-cleaned versions of IndustryOR, Mamo Complex, and OptMATH, addressing hard formulation tasks where existing models often yield 20-50% accuracy.

Class Based Error Analysis And Data Cleaning

A key concept in OptiMind is the integration of optimization expertise with LLM training. Problems from OR-Instruct and OptMATH are classified into 53 seed classes, such as set cover, flow shop scheduling, and the traveling salesman problem.

For each class, the training team samples problems using the gpt-oss-20b-base model and identifies instances where outputs contradict ground truth. Experts analyze these, creating short descriptions of errors and suggestions for prevention. These hints include correct constraints and modeling techniques, such as the right Miller Tucker Zemlin constraints for TSP.

A semi-automated pipeline regenerates solutions with a larger model using class-specific hints, applies majority voting for quality improvement, and removes inconsistent items. Ambiguous statements are addressed, yielding a cleaned training corpus aligned with accurate mathematical formulations.

Inference Pipeline, Hints, And Test Time Scaling

At inference time, OptiMind operates as a multi-stage system rather than a single prompt. The initial pipeline classifies each test instance into one of the 53 optimization classes. It enhances prompts with associated error summaries and hints.

The model generates a reasoning trace, mathematical formulation, and GurobiPy code. With additional compute, self-consistency methods can be applied using majority voting, generating multiple candidate scripts and selecting the most frequent solution within numerical tolerances.

Multi-turn correction can also be activated, allowing the system to run generated code, capture logs or errors, and revise the formulation and code in iterative cycles, despite resulting in higher latency.

Quantitative Gains On Optimization Benchmarks

On cleaned benchmarks like IndustryOR, Mamo-Complex, and OptMATH, the OptiMind framework significantly enhances accuracy. The fine-tuned model's formulation accuracy improves by 20.7 percent, with additional enhancements from test time scaling techniques such as self-consistency and multi-turn feedback.

OptiMind surpasses the gpt-oss-20b-base model and outperforms other open-source models, achieving performance level competitive with proprietary models like GPT-o4 mini and GPT-5.

These results stem from meticulous training and testing data cleaning. Many perceived model errors derived from missing data or ambiguous descriptions, and relabeling can elevate apparent accuracy from about 40-60% to the 70-90% range on adjusted datasets.

Key Takeaways

OptiMind is a 20B parameter Mixture of Experts transformer that converts natural language optimization problems into mathematical formulations and executable GurobiPy code.
The model is fine-tuned from openai/gpt-oss-20b on cleaned datasets like OR-Instruct, evaluated on expert-validated benchmarks including IndustryOR.
It applies class-based error analysis and expert hints for 53 classes, reducing common errors in generated MILPs.
The framework enhances formulation accuracy by 20.7 percent, achieving competitive performance through test time scaling methods.
OptiMind-SFT is available on Hugging Face and Azure AI Foundry, enabling integration into decision support systems across various industries.