Mastering Feature Interactions in ML Models with SHAP-IQ and Shapley Interaction Indices

Understanding Feature Interactions with SHAP-IQ

The SHAP-IQ package enhances traditional Shapley values by enabling the analysis of feature interactions within machine learning models using Shapley Interaction Indices (SII). While classic Shapley values explain individual feature contributions, they overlook how features interact, such as how longitude and latitude together influence house prices. SHAP-IQ separates individual effects from interactions to offer deeper insights.

Installing Dependencies

To follow along, install necessary packages:

!pip install shapiq overrides scikit-learn pandas numpy

Loading and Preparing Data

This tutorial uses the Bike Sharing dataset from OpenML. After loading, the data is split into training and testing sets.

import shapiq
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
import numpy as np
 
# Load data
X, y = shapiq.load_bike_sharing(to_numpy=True)
 
# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Model Training and Evaluation

A RandomForestRegressor is trained and evaluated using standard metrics.

model = RandomForestRegressor()
model.fit(X_train, y_train)
 
y_pred = model.predict(X_test)
 
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)
 
print(f"R² Score: {r2:.4f}")
print(f"Mean Absolute Error: {mae:.4f}")
print(f"Root Mean Squared Error: {rmse:.4f}")

Setting Up the SHAP-IQ Explainer

The TabularExplainer computes Shapley interaction values considering interactions up to 4 features (max_order=4), revealing complex feature group effects.

explainer = shapiq.TabularExplainer(
    model=model,
    data=X,
    index="k-SII",
    max_order=4
)

Explaining a Single Instance

We select a test instance (index 100) to generate local explanations, showing true and predicted values, along with feature inputs.

instance_id = 100
x_explain = X_test[instance_id]
y_true = y_test[instance_id]
y_pred = model.predict(x_explain.reshape(1, -1))[0]
print(f"Instance {instance_id}, True Value: {y_true}, Predicted Value: {y_pred}")
for i, feature in enumerate(feature_names):
    print(f"{feature}: {x_explain[i]}")

Computing Interaction Values

Using the explainer, Shapley interaction values are computed for the selected instance, capturing individual and combined feature effects.

interaction_values = explainer.explain(X[100], budget=256)
print(interaction_values)

First-Order Shapley Values

To analyze individual feature contributions only, set max_order=1 in TreeExplainer. This yields standard Shapley values without interaction terms.

explainer = shapiq.TreeExplainer(model=model, max_order=1, index="SV")
si_order = explainer.explain(x=x_explain)
si_order

Visualizing with Waterfall Chart

The Waterfall chart breaks down the prediction into feature contributions, starting from a baseline. Positive contributions (e.g., Weather, Humidity) increase the prediction, while others (e.g., Temperature, Year) decrease it.

si_order.plot_waterfall(feature_names=feature_names, show=True)

The chart clarifies which features drive the model’s prediction and in what direction, offering valuable insight into model behavior.

For full code and additional tutorials, visit the project’s GitHub page and community channels.