<RETURN_TO_BASE

Inside GPT-5: Practical Guide to Verbosity, Function Calls, CFG and Minimal Reasoning

'A compact developer guide to GPT-5 features including verbosity control, free-form function calling, CFG enforcement, and minimal reasoning with illustrative code samples.'

Installing the libraries

Start by installing the Python packages used in the examples and set your API key in the environment.

!pip install pandas openai
import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')

Verbosity parameter

The Verbosity parameter controls how long and detailed GPT-5s replies are without having to change the prompt.

  • low 1 short, concise responses with minimal extra text.
  • medium (default) 1 balanced detail and clarity.
  • high 1 very detailed answers for explanations or teaching.

The example below runs the same prompt with three verbosity settings and collects token usage and outputs.

from openai import OpenAI
import pandas as pd
from IPython.display import display
 
client = OpenAI()
 
question = "Write a poem about a detective and his first solve"
 
data = []
 
for verbosity in ["low", "medium", "high"]:
    response = client.responses.create(
        model="gpt-5-mini",
        input=question,
        text={"verbosity": verbosity}
    )
 
    # Extract text
    output_text = ""
    for item in response.output:
        if hasattr(item, "content"):
            for content in item.content:
                if hasattr(content, "text"):
                    output_text += content.text
 
    usage = response.usage
    data.append({
        "Verbosity": verbosity,
        "Sample Output": output_text,
        "Output Tokens": usage.output_tokens
    })
 
# Create DataFrame
df = pd.DataFrame(data)
 
# Display nicely with centered headers
pd.set_option('display.max_colwidth', None)
styled_df = df.style.set_table_styles(
    [
        {'selector': 'th', 'props': [('text-align', 'center')]},  # Center column headers
        {'selector': 'td', 'props': [('text-align', 'left')]}     # Left-align table cells
    ]
)
 
display(styled_df)

In the example the output tokens scale roughly linearly with verbosity: low (731) 1 medium (1017) 1 high (1263).

Free-form function calling

GPT-5 can emit raw text payloads intended to be executed directly by target tools or runtimes, such as Python scripts, SQL queries, or shell commands. Unlike some earlier tool-call formats that used structured JSON wrappers, GPT-5s free-form outputs make it easier to feed the models result straight into an external runtime.

Use cases include:

  • Code sandboxes (Python, C++, Java, etc.)
  • SQL query generation for databases
  • Shell command generation (Bash)
  • Configuration or code generation tools

Example: GPT-5 generating a Python snippet to count vowels and compute a cube.

from openai import OpenAI
 
client = OpenAI()
 
response = client.responses.create(
    model="gpt-5-mini",
    input="Please use the code_exec tool to calculate the cube of the number of vowels in the word 'pineapple'",
    text={"format": {"type": "text"}},
    tools=[
        {
            "type": "custom",
            "name": "code_exec",
            "description": "Executes arbitrary python code",
        }
    ]
)
 
print(response.output[1].input)

This output demonstrates GPT-5 producing an executable Python snippet that can be fed directly into a Python runtime without extra parsing.

Context-Free Grammar (CFG)

A Context-Free Grammar (CFG) defines production rules for a language. Using a CFG you can strictly constrain model output to match a syntax (for example, a valid SQL statement, JSON document, or a specific code pattern). GPT-5 shows improved adherence to CFG constraints, producing outputs that exactly match the grammar specification when configured.

First, an unconstrained example (older model) that may emit extra prose around the answer:

from openai import OpenAI
import re
 
client = OpenAI()
 
email_regex = r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$"
 
prompt = "Give me a valid email address for John Doe. It can be a dummy email"
 
# No grammar constraints -- model might give prose or invalid format
response = client.responses.create(
    model="gpt-4o",  # or earlier
    input=prompt
)
 
output = response.output_text.strip()
print("GPT Output:", output)
print("Valid?", bool(re.match(email_regex, output)))

Now a grammar-constrained call to GPT-5 that uses a regex grammar to enforce a strict email format:

from openai import OpenAI
 
client = OpenAI()
 
email_regex = r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$"
 
prompt = "Give me a valid email address for John Doe. It can be a dummy email"
 
response = client.responses.create(
    model="gpt-5",  # grammar-constrained model
    input=prompt,
    text={"format": {"type": "text"}},
    tools=[
        {
            "type": "custom",
            "name": "email_grammar",
            "description": "Outputs a valid email address.",
            "format": {
                "type": "grammar",
                "syntax": "regex",
                "definition": email_regex
            }
        }
    ],
    parallel_tool_calls=False
)
 
print("GPT-5 Output:", response.output[1].input)

In practice, GPT-4-style responses sometimes include extra natural language that breaks strict syntactic validation (for instance, "Sure, heres a test email: johndoe@example.com"), while GPT-5 can produce the exact token sequence that matches the grammar (for example, john.doe@example.com).

Minimal reasoning

Minimal reasoning reduces the models internal reasoning tokens, lowering latency and speeding up time-to-first-token for deterministic or trivial tasks like classification, formatting, or extraction. The trade-off is that complex multi-step reasoning may need more effort.

Example: a fast odd/even classifier with minimal reasoning.

import time
from openai import OpenAI
 
client = OpenAI()
 
prompt = "Classify the given number as odd or even. Return one word only."
 
start_time = time.time()  # Start timer
 
response = client.responses.create(
    model="gpt-5",
    input=[
        { "role": "developer", "content": prompt },
        { "role": "user", "content": "57" }
    ],
    reasoning={
        "effort": "minimal"  # Faster time-to-first-token
    },
)
 
latency = time.time() - start_time  # End timer
 
# Extract model's text output
output_text = ""
for item in response.output:
    if hasattr(item, "content"):
        for content in item.content:
            if hasattr(content, "text"):
                output_text += content.text
 
print("--------------------------------")
print("Output:", output_text)
print(f"Latency: {latency:.3f} seconds")

Minimal reasoning is well-suited for short rewrites, extraction, formatting and quick classifications when you need low latency.

Next steps and resources

The examples above show how GPT-5 extends developer control via verbosity tuning, free-form function outputs, strict grammar enforcement and low-latency minimal reasoning. Explore the full code samples and repositories for deeper experimentation, and adapt the patterns to your toolchains and runtimes.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский