Building Production-Grade AutoML Pipelines with AutoGluon

Overview

In this tutorial, we build a production-grade tabular machine learning pipeline using AutoGluon, processing a real-world mixed-type dataset from raw ingestion to deployment-ready artifacts.

Setting Up the Environment

We start by installing the required libraries:

!pip -q install -U "autogluon==1.5.0" "scikit-learn>=1.3" "pandas>=2.0" "numpy>=1.24"

And configuring necessary imports:

import os, time, json, warnings
warnings.filterwarnings("ignore")
 
import numpy as np
import pandas as pd
 
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, log_loss, accuracy_score, classification_report
from autogluon.tabular import TabularPredictor

Preparing the Dataset

Using fetch_openml(), we load a real-world dataset:

df = fetch_openml(data_id=40945, as_frame=True).frame
 
target = "survived"
df[target] = df[target].astype(int)
 
drop_cols = [c for c in ["boat", "body", "home.dest"] if c in df.columns]
df = df.drop(columns=drop_cols, errors="ignore")

We validate the dataset and perform a stratified train-test split:

train_df, test_df = train_test_split(
   df,
   test_size=0.2,
   random_state=42,
   stratify=df[target],
)

Model Initialization

We check for GPU availability to select the training preset:

def has_gpu():
   try:
       import torch
       return torch.cuda.is_available()
   except Exception:
       return False
 
presets = "extreme" if has_gpu() else "best_quality"
 
predictor = TabularPredictor(
   label=target,
   eval_metric="roc_auc",
   path="/content/autogluon_titanic_advanced",
   verbosity=2
)

Training the Model

Next, we fit the predictor:

start = time.time()
predictor.fit(
   train_data=train_df,
   presets=presets,
   time_limit=7 * 60,
   num_bag_folds=5,
   num_stack_levels=2,
   refit_full=False
)
train_time = time.time() - start
print(f"\nTraining done in {train_time:.1f}s with presets='{presets}'")

Evaluating the Model

Using the test set, we evaluate our model and generate key metrics:

lb = predictor.leaderboard(test_df, silent=True)
print("
=== Leaderboard (top 15) ===")
display(lb.head(15))
 
proba = predictor.predict_proba(test_df)
pred = predictor.predict(test_df)

Analyzing Model Behavior

We can perform group analysis and feature importance evaluation:

fi = predictor.feature_importance(test_df, silent=True)
print("\n=== Feature importance (top 20) ===")
display(fi.head(20))

Optimizing for Inference

We can collapse bagged models for faster inference:

refit_map = predictor.refit_full()
print(f"\nrefit_full completed in {t_refit:.1f}s")
 
lb_full = predictor.leaderboard(test_df, silent=True)
print("
=== Leaderboard after refit_full (top 15) ===")
display(lb_full.head(15))

Conclusion

By implementing an end-to-end workflow with AutoGluon, we can efficiently manage raw tabular data, ensuring it is ready for production with high performance and reliability.