How to Trace OpenAI Agent Interactions Seamlessly with MLflow

Managing OpenAI Agent Workflows with MLflow

MLflow is an open-source platform designed to manage and track machine learning experiments efficiently. When combined with the OpenAI Agents SDK, MLflow can automatically log every interaction and API call made by the agents. This includes capturing tool usage, input and output messages, and intermediate decision steps, which is invaluable for debugging, performance analysis, and ensuring reproducibility.

Benefits for Multi-Agent Systems

This integration is particularly useful when building multi-agent systems where agents collaborate or dynamically call functions. MLflow tracks runs end-to-end, making it easier to monitor complex interactions between agents.

Setting Up Dependencies

To get started, install the required libraries:

pip install openai-agents mlflow pydantic pydotenv

You need an OpenAI API key, which can be generated at https://platform.openai.com/settings/organization/api-keys. New users may need to add billing details and pay a minimum of $5 to activate API access. Store your API key in a .env file like this:

OPENAI_API_KEY = <YOUR_API_KEY>

Replace <YOUR_API_KEY> with your actual key.

Example 1: Multi-Agent System for Coding and Cooking Queries

In the script multi_agent_demo.py, a simple multi-agent assistant routes user queries to either a coding expert or a cooking expert. MLflow’s openai.autolog() function is enabled to automatically trace and log all agent interactions, including inputs, outputs, and handoffs.

The tracking URI is set to a local folder (./mlruns), and all logs are organized under the experiment named “Agent-Coding-Cooking”.

import mlflow, asyncio
from agents import Agent, Runner
import os
from dotenv import load_dotenv
load_dotenv()

mlflow.openai.autolog()                           # Auto-trace every OpenAI call
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent-Coding-Cooking")

coding_agent = Agent(name="Coding agent",
                     instructions="You only answer coding questions.")

cooking_agent = Agent(name="Cooking agent",
                      instructions="You only answer cooking questions.")

triage_agent = Agent(
    name="Triage agent",
    instructions="If the request is about code, handoff to coding_agent; "
                 "if about cooking, handoff to cooking_agent.",
    handoffs=[coding_agent, cooking_agent],
)

async def main():
    res = await Runner.run(triage_agent,
                           input="How do I boil pasta al dente?")
    print(res.final_output)

if __name__ == "__main__":
    asyncio.run(main())

Viewing Logs in MLflow UI

Run the following command to start the MLflow UI server:

mlflow ui

It will typically be accessible at http://localhost:5000. The UI provides a detailed trace from the initial user input through agent routing to the final response, which helps in debugging and optimizing workflows.

Example 2: Guardrail-Protected Customer Support Agent

This example shows a customer support agent protected by guardrails to prevent answering medical-related questions. A separate guardrail agent detects medical symptom inquiries and blocks such requests.

import mlflow, asyncio
from pydantic import BaseModel
from agents import (
    Agent, Runner,
    GuardrailFunctionOutput, InputGuardrailTripwireTriggered,
    input_guardrail, RunContextWrapper)

from dotenv import load_dotenv
load_dotenv()

mlflow.openai.autolog()
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent-Guardrails")

class MedicalSymptons(BaseModel):
    medical_symptoms: bool
    reasoning: str


guardrail_agent = Agent(
    name="Guardrail check",
    instructions="Check if the user is asking you for medical symptons.",
    output_type=MedicalSymptons,
)


@input_guardrail
async def medical_guardrail(
    ctx: RunContextWrapper[None], agent: Agent, input
) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, input, context=ctx.context)

    return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.medical_symptoms,
    )


agent = Agent(
    name="Customer support agent",
    instructions="You are a customer support agent. You help customers with their questions.",
    input_guardrails=[medical_guardrail],
)


async def main():
    try:
        await Runner.run(agent, "Should I take aspirin if I'm having a headache?")
        print("Guardrail didn't trip - this is unexpected")

    except InputGuardrailTripwireTriggered:
        print("Medical guardrail tripped")


if __name__ == "__main__":
    asyncio.run(main())

MLflow automatically traces the entire process, including guardrail activation and reasoning, allowing full visibility into safety mechanisms.

Viewing Guardrail Logs

Use the same mlflow ui command to inspect the guardrail interactions visually. You can see the flagged input and the guardrail agent’s reasoning for blocking the request.

This integration of OpenAI Agents SDK with MLflow provides powerful tools to monitor, debug, and ensure safe operation of intelligent multi-agent systems.