OpenAI Unveils GPT-5.1: Adaptive Reasoning, Personalized Tones, and Tougher Safety

OpenAI has rolled out GPT-5.1 as an incremental upgrade in the GPT-5 family, introducing two primary variants and several product-level changes focused on adaptive reasoning, clearer explanations, and improved tone control and safety.

Model lineup and how it's positioned

GPT-5.1 ships with two core variants: GPT-5.1 Instant and GPT-5.1 Thinking. GPT-5.1 Instant is the default conversational model in ChatGPT, described as the most-used option with a warmer default tone and better instruction following. GPT-5.1 Thinking is the advanced reasoning variant, exposing explicit thinking time and adapting that time more precisely to the complexity of the query. A routing layer, GPT-5.1 Auto, continues to direct traffic between variants so end users generally do not need to pick models manually.

GPT-5.1 Instant: adaptive reasoning for everyday chat

GPT-5.1 Instant preserves low latency for typical chat interactions while adding adaptive reasoning. For straightforward prompts the model performs a quick, shallow internal pass and returns a fast response. For harder prompts — for example multi-step math or coding tasks — it allocates more internal compute before answering. This adaptive behavior improves benchmark performance on tasks like AIME 2025 and Codeforces compared to earlier GPT-5 Instant versions, while remaining responsive for casual use.

Instruction following is emphasized: GPT-5.1 Instant is more reliable at obeying explicit constraints such as 'always respond with 6 words', and it keeps such constraints across turns. That reliability is useful for structured outputs, message templates, or chained tools that expect bounded-length responses.

The combination of adaptive compute and stricter instruction adherence makes GPT-5.1 Instant a predictable front end for many agent workflows where most calls are simple but a tail of requests require deeper reasoning.

GPT-5.1 Thinking: dynamic compute allocation

GPT-5.1 Thinking refines the GPT-5 Thinking approach by tightening how thinking time is allocated. It adapts internal thinking time to prompt complexity: on a representative distribution of ChatGPT tasks at Standard thinking time, GPT-5.1 Thinking is roughly 2x faster than GPT-5 Thinking on the quickest tasks, and roughly 2x slower on the slowest tasks.

This dynamic allocation is useful when a single model must handle both light and heavy queries: quick queries avoid paying for long chains of thought, while difficult reasoning and planning tasks receive more internal steps without introducing new API surface.

GPT-5.1 Thinking also produces responses with less jargon and fewer undefined terms than GPT-5 Thinking, reducing interpretation overhead and making it more suitable as an interactive tutor for topics like statistics, algorithms, or system design.

On the API, GPT-5.1 Instant is exposed as gpt-5.1-chat-latest and GPT-5.1 Thinking as gpt-5.1. Both variants include adaptive reasoning by default.

Personalization and tone control

ChatGPT gains a more explicit personalization layer. Users can pick a base style such as Default, Professional, Friendly, Candid, Quirky, Efficient, Nerdy, or Cynical; these presets apply across models including GPT-5.1. OpenAI is also experimenting with finer-grained sliders for conciseness, warmth, and scannability, plus controls for emoji frequency.

ChatGPT can suggest changes to these preferences inside conversations when it detects repeated tone requests. Preferences now apply immediately to both new and ongoing chats, a change from earlier behavior where updates only affected new conversations.

Safety metrics and preparedness classification

GPT-5.1 reuses the GPT-5 safety framework and provides updated baseline metrics. Both GPT-5.1 Instant and GPT-5.1 Thinking use the same class of mitigations described in the GPT-5 system card, including filters for disallowed content, routing for sensitive cases, and policy-aligned refusals.

On Production Benchmarks for disallowed content, gpt-5.1-instant improves over gpt-5-instant across listed categories. Example scores: not_unsafe reaches 0.918 for illicit or violent content and 0.897 for hate content. For jailbreak robustness measured with the StrongReject evaluation, gpt-5.1-instant hits a not_unsafe score of 0.976 compared to 0.850 for gpt-5-instant and 0.683 for an earlier reference. gpt-5.1-thinking scores 0.967, close to gpt-5-thinking at 0.974.

What this means for users and developers

GPT-5.1 is presented as an in-generation upgrade within the GPT-5 family rather than a new generation. Developers and product teams get a default, low-latency chat model that adapts compute to prompt difficulty and a thinking model that allocates time dynamically. Persistent personalization and finer tone controls shift some of the burden of prompt engineering into user settings. Finally, the release strengthens safety baselines and jailbreak robustness, while maintaining the existing GPT-5 mitigation strategies.