<RETURN_TO_BASE

Interactive E-commerce Analytics with PyGWalker: Build an End-to-End Dashboard

'Step-by-step guide to generate a rich e-commerce dataset and use PyGWalker to create an interactive analytics dashboard that reveals trends, correlations, and segment insights.'

Overview

This tutorial demonstrates how to build an end-to-end interactive analytics dashboard using PyGWalker with pandas. We generate a realistic e-commerce dataset with time, demographic, and marketing features, prepare multiple analytical views, and launch an interactive PyGWalker interface to explore patterns, correlations, and trends via drag-and-drop visualizations.

Environment and imports

Install the required packages and import the libraries used throughout the walkthrough.

!pip install pygwalker pandas numpy scikit-learn
 
 
import pandas as pd
import numpy as np
import pygwalker as pyg
from datetime import datetime, timedelta

Generating a rich e-commerce dataset

We define a function that simulates transactions across categories, products, customer segments, regions, and time. The synthetic data includes seasonal factors, discounts, marketing channels, and customer satisfaction to make analytical exploration informative.

def generate_advanced_dataset():
   np.random.seed(42)
   start_date = datetime(2022, 1, 1)
   dates = [start_date + timedelta(days=x) for x in range(730)]
   categories = ['Electronics', 'Clothing', 'Home & Garden', 'Sports', 'Books']
   products = {
       'Electronics': ['Laptop', 'Smartphone', 'Headphones', 'Tablet', 'Smartwatch'],
       'Clothing': ['T-Shirt', 'Jeans', 'Dress', 'Jacket', 'Sneakers'],
       'Home & Garden': ['Furniture', 'Lamp', 'Rug', 'Plant', 'Cookware'],
       'Sports': ['Yoga Mat', 'Dumbbell', 'Running Shoes', 'Bicycle', 'Tennis Racket'],
       'Books': ['Fiction', 'Non-Fiction', 'Biography', 'Science', 'History']
   }
   n_transactions = 5000
   data = []
   for _ in range(n_transactions):
       date = np.random.choice(dates)
       category = np.random.choice(categories)
       product = np.random.choice(products[category])
       base_prices = {
           'Electronics': (200, 1500),
           'Clothing': (20, 150),
           'Home & Garden': (30, 500),
           'Sports': (25, 300),
           'Books': (10, 50)
       }
       price = np.random.uniform(*base_prices[category])
       quantity = np.random.choice([1, 1, 1, 2, 2, 3], p=[0.5, 0.2, 0.15, 0.1, 0.03, 0.02])
       customer_segment = np.random.choice(['Premium', 'Standard', 'Budget'], p=[0.2, 0.5, 0.3])
       age_group = np.random.choice(['18-25', '26-35', '36-45', '46-55', '56+'])
       region = np.random.choice(['North', 'South', 'East', 'West', 'Central'])
       month = date.month
       seasonal_factor = 1.0
       if month in [11, 12]:
           seasonal_factor = 1.5
       elif month in [6, 7]:
           seasonal_factor = 1.2
       revenue = price * quantity * seasonal_factor
       discount = np.random.choice([0, 5, 10, 15, 20, 25], p=[0.4, 0.2, 0.15, 0.15, 0.07, 0.03])
       marketing_channel = np.random.choice(['Organic', 'Social Media', 'Email', 'Paid Ads'])
       base_satisfaction = 4.0
       if customer_segment == 'Premium':
           base_satisfaction += 0.5
       if discount > 15:
           base_satisfaction += 0.3
       satisfaction = np.clip(base_satisfaction + np.random.normal(0, 0.5), 1, 5)
       data.append({
           'Date': date, 'Category': category, 'Product': product, 'Price': round(price, 2),
           'Quantity': quantity, 'Revenue': round(revenue, 2), 'Customer_Segment': customer_segment,
           'Age_Group': age_group, 'Region': region, 'Discount_%': discount,
           'Marketing_Channel': marketing_channel, 'Customer_Satisfaction': round(satisfaction, 2),
           'Month': date.strftime('%B'), 'Year': date.year, 'Quarter': f'Q{(date.month-1)//3 + 1}'
       })
   df = pd.DataFrame(data)
   df['Profit_Margin'] = round(df['Revenue'] * (1 - df['Discount_%']/100) * 0.3, 2)
   df['Days_Since_Start'] = (df['Date'] - df['Date'].min()).dt.days
   return df

This function seeds randomness for reproducibility and creates 5,000 transactions over two years with features useful for analysis: revenue, profit margin, discount rates, satisfaction, month/year/quarter, and categorical breakdowns.

Generate and inspect the data

Create the dataset and print a brief overview to validate size, date range, total revenue, and columns.

print("Generating advanced e-commerce dataset...")
df = generate_advanced_dataset()
print(f"\nDataset Overview:")
print(f"Total Transactions: {len(df)}")
print(f"Date Range: {df['Date'].min()} to {df['Date'].max()}")
print(f"Total Revenue: ${df['Revenue'].sum():,.2f}")
print(f"\nColumns: {list(df.columns)}")
print("\nFirst few rows:")
print(df.head())

Inspecting the output confirms the dataset structure and that it contains rich features for visualization and segmentation.

Prepare aggregated views for analysis

Compute daily sales trends, category-level summaries, and segment-region metrics to feed into visual exploration.

daily_sales = df.groupby('Date').agg({
   'Revenue': 'sum', 'Quantity': 'sum', 'Customer_Satisfaction': 'mean'
}).reset_index()
 
 
category_analysis = df.groupby('Category').agg({
   'Revenue': ['sum', 'mean'], 'Quantity': 'sum', 'Customer_Satisfaction': 'mean', 'Profit_Margin': 'sum'
}).reset_index()
category_analysis.columns = ['Category', 'Total_Revenue', 'Avg_Order_Value',
                            'Total_Quantity', 'Avg_Satisfaction', 'Total_Profit']
 
 
segment_analysis = df.groupby(['Customer_Segment', 'Region']).agg({
   'Revenue': 'sum', 'Customer_Satisfaction': 'mean'
}).reset_index()
 
 
print("\n" + "="*50)
print("DATASET READY FOR PYGWALKER VISUALIZATION")
print("="*50)

These aggregations help create ready-to-visualize tables: time series for revenue, category KPIs, and cross-tab views for segment and regional performance.

Launch PyGWalker and explore interactively

Use PyGWalker to open an interactive canvas where you can drag fields onto axes, switch visual encodings, and filter subsets to uncover insights.

print("\n Launching PyGWalker Interactive Interface...")
walker = pyg.walk(
   df,
   spec="./pygwalker_config.json",
   use_kernel_calc=True,
   theme_key='g2'
)
 
 
print("\n PyGWalker is now running!")
print(" Try creating these visualizations:")
print("   - Revenue trend over time (line chart)")
print("   - Category distribution (pie chart)")
print("   - Price vs Satisfaction scatter plot")
print("   - Regional sales heatmap")
print("   - Discount effectiveness analysis")

Once the PyGWalker UI is running, try combining dimensions and measures to spot seasonality, top-performing categories, correlations between price and satisfaction, and regional differences in revenue and satisfaction.

Exploration tips and next steps

  • Use time filters and rolling aggregations to reveal short-term promotions or seasonal spikes.
  • Segment by customer type to compare revenue per segment and satisfaction patterns.
  • Cross-reference marketing channel with conversion proxies like revenue and quantity to evaluate channel effectiveness.
  • Export charts or use snapshots to include visual findings in reports or presentations.

This workflow highlights how PyGWalker simplifies interactive, visual data discovery with familiar pandas DataFrames and minimal setup.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский