SDialog: Build, Simulate and Inspect LLM Conversations End-to-End

SDialog is an open-source Python toolkit that standardizes the full conversational pipeline for LLM-based agents, from persona and agent definitions to large-scale simulation, evaluation, and mechanistic interpretability.

A unified Dialog schema and simple API

At the heart of SDialog is a consistent Dialog schema with JSON import and export. The library provides higher-level abstractions for personas, agents, orchestrators, generators, and datasets. With a few lines of configuration—for example via sdialog.config.llm—developers can declare personas, instantiate Agent objects, and call generators such as DialogGenerator or PersonaDialogGenerator to synthesize complete conversations ready for training or evaluation.

Persona-driven multi-agent simulation

Personas are first-class citizens in SDialog. They capture stable traits, goals, and speaking styles so you can simulate role-specific interactions like a doctor–patient consultation or scenario-driven flows where each participant follows constraints and objectives across many turns. PersonaDialogGenerator lets you create multi-turn, role-consistent dialogs suitable for both task-oriented datasets and more exploratory scenario simulations.

Orchestration as a pipeline

Orchestrators sit between your Agent objects and the LLM backend to provide composable control. A concise pattern in the toolkit is agent = agent | orchestrator, turning orchestration into a pipeline. Components such as SimpleReflexOrchestrator can inspect each turn and inject policies, enforce constraints, or trigger tools based on the full dialogue state. More advanced setups use persistent instructions and LLM judges to monitor safety, topic drift, or compliance and adapt future turns accordingly.

Rich evaluation stack

The sdialog.evaluation module exposes multiple evaluators and judge components like LLMJudgeRealDialog, LinguisticFeatureScore, FrequencyEvaluator, and MeanEvaluator. Evaluators plug into a DatasetComparator that accepts reference and candidate dialog sets, computes metrics, aggregates scores, and produces tables or plots. This makes it straightforward to compare prompts, backends, or orchestration strategies with consistent quantitative criteria rather than relying only on manual inspection.

Mechanistic interpretability and steering

SDialog includes tools for mechanistic inspection: the Inspector in sdialog.interpretability registers PyTorch forward hooks on internal modules (for example model.layers.15.post_attention_layernorm) and records per-token activations during generation. After a run you can index activation buffers, inspect shapes, and search for system instructions with utilities like find_instructs. The DirectionSteerer converts discovered directions into control signals, allowing targeted nudges of model behavior by modifying activations during specific tokens to reduce undesirable traits or push toward a desired style.

Integrations and deployment

The toolkit supports multiple LLM backends—OpenAI, Hugging Face, Ollama, and AWS Bedrock—behind a unified configuration interface. Dialogs can be loaded from or exported to Hugging Face datasets using helpers such as Dialog.from_huggingface. The sdialog.server module can expose agents via an OpenAI-compatible REST API through Server.serve, enabling connections from tools like Open WebUI without custom protocol work.

Audio rendering and multimodal testing

SDialog also offers sdialog.audio utilities with a to_audio pipeline that converts text turns into speech, manages pauses, and can simulate room acoustics. The same Dialog objects therefore drive text-based analysis, model training, and audio-based testing for speech systems, providing a single representation that spans text and audio workflows.

SDialog aims to be a modular, extensible framework for persona-driven simulation, precise orchestration, quantitative evaluation, and mechanistic interpretability, all centered on a consistent Dialog schema. Check the project repository and documentation for tutorials, notebooks, and examples.