Introducing LLMRouter: Optimize LLM Inference Smartly

The Challenge of Model Selection

LLMRouter is an open-source routing library from the U Lab at the University of Illinois Urbana-Champaign that treats model selection as a first-class system problem. Positioned between applications and a pool of LLMs, it dynamically selects a model for each query based on task complexity, quality targets, and cost, delivered through a unified Python API and CLI. The project includes over 16 routing models, a data generation pipeline across 11 benchmarks, and a plugin system for custom routers.

Router Families and Supported Models

LLMRouter organizes its routing algorithms into four families: Single-Round Routers, Multi-Round Routers, Personalized Routers, and Agentic Routers.
Single round routers encompass models like knnrouter, svmrouter, and mlprouter, implementing various strategies such as k-nearest neighbors, support vector machines, and automatic model mixing.

Multi-round routing is facilitated through router_r1, a pre-trained instance that formulates multi-LLM routing as a sequential decision process, interspersing internal reasoning with model calls via reinforcement learning.

Personalized routing is managed by gmtrouter, which represents user interactions as a heterogeneous graph, learning user-specific preferences to enhance accuracy.

Agentic routers extend routing to multi-step reasoning workflows. The knnmultiroundrouter and llmmultiroundrouter models enable reasoning over multi-turn traces without a separate training loop.

Data Generation Pipeline for Routing Datasets

LLMRouter provides a comprehensive data generation pipeline that converts standard benchmarks and LLM outputs into routing datasets. It efficiently processes 11 benchmarks including MMLU and GSM8K, running through three stages to extract queries, build embeddings, and evaluate responses.

This pipeline outputs various configuration files and routing entries while allowing full YAML configuration, simplifying dataset management.

Chat Interface and Plugin System

For interactive usage, llmrouter chat offers a Gradio-based chat frontend that can be customized to fit different users’ needs. The plugin system permits custom routers, enhancing the library's flexibility.

Key Takeaways

Routing as a First-Class Abstraction: Model selection is centralized as a cost and quality aware prediction task.
Four Router Families: Standardizes 16+ routers into single round, multi round, personalized, and agentic categories.
Multi-Round RL Routing: router_r1 optimizes performance using reinforcement learning.
Graph-Based Personalization: gmtrouter achieves significant accuracy gains over non-personalized models.
End-to-End Pipeline: Offers a complete pipeline, chat UI, and extensible plugin system for custom routers.