Implementing Privacy-Preserving Federated Fraud Detection
Simulate fraud detection using Federated Learning without heavy frameworks.
Overview
In this tutorial, we demonstrate how we simulate a privacy-preserving fraud detection system using Federated Learning without relying on heavyweight frameworks or complex infrastructure. We build a clean, CPU-friendly setup that mimics ten independent banks, each training a local fraud-detection model on its own highly imbalanced transaction data.
Coordination of Local Updates
We coordinate these local updates through a simple FedAvg aggregation loop, allowing us to improve a global model while ensuring that no raw transaction data ever leaves a client. Alongside this, we integrate OpenAI to support post-training analysis and risk-oriented reporting, demonstrating how federated learning outputs can be translated into decision-ready insights.
Setting Up the Environment
We set up the execution environment and import all required libraries for data generation, modeling, evaluation, and reporting. We also fix random seeds to ensure our federated simulation remains deterministic.
!pip -q install torch scikit-learn numpy openai
import time, random, json, os, getpass
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score, average_precision_score, accuracy_score
from openai import OpenAI
SEED = 7
random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)
DEVICE = torch.device("cpu")
print("Device:", DEVICE) Data Generation and Splitting
We generate a highly imbalanced, credit-card-like fraud dataset and split it into training and test sets. We standardize the server-side data and prepare a global test loader that enables consistent evaluation of the aggregated model after each federated round.
X, y = make_classification(
n_samples=60000,
n_features=30,
n_informative=18,
n_redundant=8,
weights=[0.985, 0.015],
class_sep=1.5,
flip_y=0.01,
random_state=SEED
)
X = X.astype(np.float32)
y = y.astype(np.int64)
X_train_full, X_test, y_train_full, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=SEED
)
server_scaler = StandardScaler()
X_train_full_s = server_scaler.fit_transform(X_train_full).astype(np.float32)
X_test_s = server_scaler.transform(X_test).astype(np.float32)
test_loader = DataLoader(
TensorDataset(torch.from_numpy(X_test_s), torch.from_numpy(y_test)),
batch_size=1024,
shuffle=False
)Simulating Non-IID Behavior
We simulate realistic non-IID behavior by partitioning the training data across ten clients using a Dirichlet distribution. Each simulated bank operates on its own locally scaled data.
def dirichlet_partition(y, n_clients=10, alpha=0.35):
classes = np.unique(y)
idx_by_class = [np.where(y == c)[0] for c in classes]
client_idxs = [[] for _ in range(n_clients)]
for idxs in idx_by_class:
np.random.shuffle(idxs)
props = np.random.dirichlet(alpha * np.ones(n_clients))
cuts = (np.cumsum(props) * len(idxs)).astype(int)
prev = 0
for cid, cut in enumerate(cuts):
client_idxs[cid].extend(idxs[prev:cut].tolist())
prev = cut
return [np.array(ci, dtype=np.int64) for ci in client_idxs]Model Definition and Training
We define the neural network used for fraud detection along with utility functions for training and evaluation. The local optimization ensures efficient updates to client models.
class FraudNet(nn.Module):
def __init__(self, in_dim):
super().__init__()
self.net = nn.Sequential(
nn.Linear(in_dim, 64),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(64, 32),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(32, 1)
)
def forward(self, x):
return self.net(x).squeeze(-1)Federated Learning Process
We orchestrate the federated learning process by iteratively training local client models and aggregating their parameters using FedAvg, evaluating the global model after each round to monitor convergence.
for r in range(1, ROUNDS + 1):
client_weights, client_sizes = [], []
for cid in range(NUM_CLIENTS):
local = FraudNet(X_train_full.shape[1])
set_weights(local, global_weights)
train_local(local, client_loaders[cid][0], LR)
client_weights.append(get_weights(local))
client_sizes.append(len(client_loaders[cid][0].dataset))
global_weights = fedavg(client_weights, client_sizes)
set_weights(global_model, global_weights)
metrics = evaluate(global_model, test_loader)
print(f"Round {r}: {metrics}")Generating Risk Reports with OpenAI
We transform the results into concise analytical reports using OpenAI. The results summarize performance, risks, and recommended next steps to aid decision-making.
if OPENAI_API_KEY:
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
client = OpenAI()
resp = client.responses.create(model="gpt-5.2", input=prompt)
print(resp.output_text)Through this framework, we have shown a robust method for implementing federated fraud detection that is both private and applicable in real-world scenarios.
Сменить язык
Читать эту статью на русском