<RETURN_TO_BASE

Multi-Agent Multi-Omics Pipeline: From Synthetic Data to Pathway-Aware Insights

'A step-by-step tutorial describing a multi-agent pipeline that generates coherent synthetic multi-omics data, performs statistical and network analyses, and outputs pathway-aware interpretations and drug repurposing predictions.'

Building an integrated multi-agent pipeline for multi-omics interpretation

This tutorial walks through a modular system that integrates transcriptomics, proteomics, and metabolomics using a chain of specialist agents. Each agent focuses on a specific task: generating coherent synthetic data, running statistical tests, inferring networks and master regulators, performing pathway-aware enrichment, proposing drug repurposing candidates, and producing an interpretable report with AI-generated hypotheses.

Biological reference data and setup

The pipeline begins by defining pathway maps, gene interactions, and drug target mappings. These reference resources are used across agents for network traversal, pathway scoring, and drug scoring. The code below sets up these definitions and imports the required libraries.

import numpy as np
import pandas as pd
from collections import defaultdict, deque
from dataclasses import dataclass
from typing import Dict, List, Tuple
import warnings
warnings.filterwarnings('ignore')
 
 
PATHWAY_DB = {
   'Glycolysis': {'genes': ['HK2', 'PFKM', 'PKM', 'LDHA', 'GAPDH', 'ENO1'],
                  'metabolites': ['Glucose', 'G6P', 'F16BP', 'Pyruvate', 'Lactate'], 'score': 0},
   'TCA_Cycle': {'genes': ['CS', 'IDH1', 'IDH2', 'OGDH', 'SDHA', 'MDH2'],
                 'metabolites': ['Citrate', 'Isocitrate', 'α-KG', 'Succinate', 'Malate'], 'score': 0},
   'Oxidative_Phosphorylation': {'genes': ['NDUFA1', 'NDUFB5', 'COX5A', 'COX7A1', 'ATP5A1', 'ATP5B'],
                                  'metabolites': ['ATP', 'ADP', 'NAD+', 'NADH'], 'score': 0},
   'Fatty_Acid_Synthesis': {'genes': ['ACACA', 'FASN', 'SCD1', 'ACLY'],
                            'metabolites': ['Malonyl-CoA', 'Palmitate', 'Oleate'], 'score': 0},
   'Fatty_Acid_Oxidation': {'genes': ['CPT1A', 'ACOX1', 'HADHA', 'ACADM'],
                            'metabolites': ['Acyl-CoA', 'Acetyl-CoA'], 'score': 0},
   'Amino_Acid_Metabolism': {'genes': ['GOT1', 'GOT2', 'GLUD1', 'BCAT1', 'BCAT2'],
                             'metabolites': ['Glutamate', 'Glutamine', 'Alanine', 'Aspartate'], 'score': 0},
   'Pentose_Phosphate': {'genes': ['G6PD', 'PGD', 'TKTL1'],
                         'metabolites': ['R5P', 'NADPH'], 'score': 0},
   'Cell_Cycle_G1S': {'genes': ['CCND1', 'CDK4', 'CDK6', 'RB1', 'E2F1'], 'metabolites': [], 'score': 0},
   'Cell_Cycle_G2M': {'genes': ['CCNB1', 'CDK1', 'CDC25C', 'WEE1'], 'metabolites': [], 'score': 0},
   'Apoptosis': {'genes': ['BCL2', 'BAX', 'BID', 'CASP3', 'CASP8', 'CASP9'], 'metabolites': [], 'score': 0},
   'mTOR_Signaling': {'genes': ['MTOR', 'RPTOR', 'RICTOR', 'AKT1', 'TSC1', 'TSC2'],
                      'metabolites': ['Leucine', 'ATP'], 'score': 0},
   'HIF1_Signaling': {'genes': ['HIF1A', 'EPAS1', 'VEGFA', 'SLC2A1'], 'metabolites': ['Lactate'], 'score': 0},
   'p53_Signaling': {'genes': ['TP53', 'MDM2', 'CDKN1A', 'BAX'], 'metabolites': [], 'score': 0},
   'PI3K_AKT': {'genes': ['PIK3CA', 'AKT1', 'AKT2', 'PTEN', 'PDK1'], 'metabolites': [], 'score': 0},
}
 
 
GENE_INTERACTIONS = {
   'HK2': ['PFKM', 'HIF1A', 'MTOR'], 'PFKM': ['PKM', 'HK2'], 'PKM': ['LDHA', 'HIF1A'],
   'MTOR': ['AKT1', 'HIF1A', 'TSC2'], 'HIF1A': ['VEGFA', 'SLC2A1', 'PKM', 'LDHA'],
   'TP53': ['MDM2', 'CDKN1A', 'BAX', 'CASP3'], 'AKT1': ['MTOR', 'TSC2', 'MDM2'],
   'CCND1': ['CDK4', 'RB1'], 'CDK4': ['RB1'], 'RB1': ['E2F1'],
}
 
 
DRUG_TARGETS = {
   'Metformin': ['NDUFA1'], 'Rapamycin': ['MTOR'], '2-DG': ['HK2'],
   'Bevacizumab': ['VEGFA'], 'Palbociclib': ['CDK4', 'CDK6'], 'Nutlin-3': ['MDM2']
}
 
 
@dataclass
class OmicsProfile:
   transcriptomics: pd.DataFrame
   proteomics: pd.DataFrame
   metabolomics: pd.DataFrame
   metadata: Dict

Generating coherent synthetic multi-omics data

To validate the downstream logic and to demonstrate the pipeline end-to-end, the AdvancedOmicsGenerator creates synthetic transcriptomic, proteomic, and metabolomic matrices that follow biologically plausible trends across timepoints. The generator injects pathway-specific shifts and constructs protein abundance using transcript values plus noise.

class AdvancedOmicsGenerator:
   @staticmethod
   def generate_coherent_omics(n_samples=30, n_timepoints=4, noise=0.2):
       genes = list(set(g for p in PATHWAY_DB.values() for g in p['genes']))
       metabolites = list(set(m for p in PATHWAY_DB.values() for m in p['metabolites'] if m))
       proteins = [f"P_{g}" for g in genes]
       n_control = n_samples // 2
       samples_per_tp = n_samples // n_timepoints
       trans = np.random.randn(len(genes), n_samples) * noise + 10
       metab = np.random.randn(len(metabolites), n_samples) * noise + 5
       for tp in range(n_timepoints):
           start_idx = n_control + tp * samples_per_tp
           end_idx = start_idx + samples_per_tp
           progression = (tp + 1) / n_timepoints
           for i, gene in enumerate(genes):
               if gene in PATHWAY_DB['Glycolysis']['genes']:
                   trans[i, start_idx:end_idx] += np.random.uniform(1.5, 3.5) * progression
               elif gene in PATHWAY_DB['Oxidative_Phosphorylation']['genes']:
                   trans[i, start_idx:end_idx] -= np.random.uniform(1, 2.5) * progression
               elif gene in PATHWAY_DB['Cell_Cycle_G1S']['genes'] + PATHWAY_DB['Cell_Cycle_G2M']['genes']:
                   trans[i, start_idx:end_idx] += np.random.uniform(1, 2) * progression
               elif gene in PATHWAY_DB['HIF1_Signaling']['genes']:
                   trans[i, start_idx:end_idx] += np.random.uniform(2, 4) * progression
               elif gene in PATHWAY_DB['p53_Signaling']['genes']:
                   trans[i, start_idx:end_idx] -= np.random.uniform(0.5, 1.5) * progression
           for i, met in enumerate(metabolites):
               if met in ['Lactate', 'Pyruvate', 'G6P']:
                   metab[i, start_idx:end_idx] += np.random.uniform(1.5, 3) * progression
               elif met in ['ATP', 'Citrate', 'Malate']:
                   metab[i, start_idx:end_idx] -= np.random.uniform(1, 2) * progression
               elif met in ['NADPH']:
                   metab[i, start_idx:end_idx] += np.random.uniform(1, 2) * progression
       prot = trans * 0.8 + np.random.randn(*trans.shape) * (noise * 2)
       conditions = ['Control'] * n_control + [f'Disease_T{i//samples_per_tp}' for i in range(n_samples - n_control)]
       trans_df = pd.DataFrame(trans, index=genes, columns=[f"S{i}_{c}" for i, c in enumerate(conditions)])
       prot_df = pd.DataFrame(prot, index=proteins, columns=trans_df.columns)
       metab_df = pd.DataFrame(metab, index=metabolites, columns=trans_df.columns)
       metadata = {'conditions': conditions, 'n_timepoints': n_timepoints}
       return OmicsProfile(trans_df, prot_df, metab_df, metadata)

Statistical agent: differential and temporal analysis

The StatisticalAgent runs simple but robust comparisons across control and disease samples. It computes fold changes, t-statistics, approximate p-values and FDR, and marks significantly dysregulated features. It also fits low-degree polynomials to capture temporal trends across timepoints.

class StatisticalAgent:
   @staticmethod
   def differential_analysis(data_df, control_samples, disease_samples):
       control = data_df[control_samples]
       disease = data_df[disease_samples]
       fc = (disease.mean(axis=1) - control.mean(axis=1))
       pooled_std = np.sqrt((control.var(axis=1) + disease.var(axis=1)) / 2)
       t_stat = fc / (pooled_std + 1e-6)
       p_values = 2 * (1 - np.minimum(np.abs(t_stat) / (np.abs(t_stat).max() + 1e-6), 0.999))
       sorted_pvals = np.sort(p_values)
       ranks = np.searchsorted(sorted_pvals, p_values) + 1
       fdr = p_values * len(p_values) / ranks
       return pd.DataFrame({'log2FC': fc, 't_stat': t_stat, 'p_value': p_values,
           'FDR': np.minimum(fdr, 1.0), 'significant': (np.abs(fc) > 1.0) & (fdr < 0.05)}).sort_values('log2FC', ascending=False)
 
 
   @staticmethod
   def temporal_analysis(data_df, metadata):
       timepoints = metadata['n_timepoints']
       samples_per_tp = data_df.shape[1] // (timepoints + 1)
       trends = {}
       for gene in data_df.index:
           means = []
           for tp in range(timepoints):
               start = samples_per_tp + tp * samples_per_tp
               end = start + samples_per_tp
               means.append(data_df.iloc[:, start:end].loc[gene].mean())
           if len(means) > 1:
               x = np.arange(len(means))
               coeffs = np.polyfit(x, means, deg=min(2, len(means)-1))
               trends[gene] = {'slope': coeffs[0] if len(coeffs) > 1 else 0, 'trajectory': means}
       return trends

Network analysis and causal inference

NetworkAnalysisAgent discovers master regulators by traversing the gene interaction graph and scoring nodes by their ability to influence significantly dysregulated genes. It also attempts simple causal inference by checking coherence between transcript and protein fold changes and linking enzymes to metabolite changes within pathways.

class NetworkAnalysisAgent:
   def __init__(self, interactions):
       self.graph = interactions
   def find_master_regulators(self, diff_genes):
       sig_genes = diff_genes[diff_genes['significant']].index.tolist()
       impact_scores = {}
       for gene in sig_genes:
           if gene in self.graph:
               downstream = self._bfs_downstream(gene, max_depth=2)
               sig_downstream = [g for g in downstream if g in sig_genes]
               impact_scores[gene] = {
                   'downstream_count': len(downstream),
                   'sig_downstream': len(sig_downstream),
                   'score': len(sig_downstream) / (len(downstream) + 1),
                   'fc': diff_genes.loc[gene, 'log2FC']
               }
       return sorted(impact_scores.items(), key=lambda x: x[1]['score'], reverse=True)
   def _bfs_downstream(self, start, max_depth=2):
       visited, queue = set(), deque([(start, 0)])
       downstream = []
       while queue:
           node, depth = queue.popleft()
           if depth >= max_depth or node in visited:
               continue
           visited.add(node)
           if node in self.graph:
               for neighbor in self.graph[node]:
                   if neighbor not in visited:
                       downstream.append(neighbor)
                       queue.append((neighbor, depth + 1))
       return downstream
   def causal_inference(self, diff_trans, diff_prot, diff_metab):
       causal_links = []
       for gene in diff_trans[diff_trans['significant']].index:
           gene_fc = diff_trans.loc[gene, 'log2FC']
           protein = f"P_{gene}"
           if protein in diff_prot.index:
               prot_fc = diff_prot.loc[protein, 'log2FC']
               correlation = np.sign(gene_fc) == np.sign(prot_fc)
               if correlation and abs(prot_fc) > 0.5:
                   causal_links.append(('transcription', gene, protein, gene_fc, prot_fc))
           for pathway, content in PATHWAY_DB.items():
               if gene in content['genes']:
                   for metab in content['metabolites']:
                       if metab in diff_metab.index and diff_metab.loc[metab, 'significant']:
                           metab_fc = diff_metab.loc[metab, 'log2FC']
                           causal_links.append(('enzymatic', gene, metab, gene_fc, metab_fc))
       return causal_links

Topology-weighted pathway enrichment

The PathwayEnrichmentAgent evaluates pathways by combining the magnitude of dysregulation, the network centrality of dysregulated genes, and metabolite evidence. This yields pathway scores that favor coherent, central perturbations that connect across omics layers.

class PathwayEnrichmentAgent:
   def __init__(self, pathway_db, interactions):
       self.pathway_db = pathway_db
       self.interactions = interactions
   def topology_weighted_enrichment(self, diff_genes, diff_metab, network_agent):
       enriched = {}
       for pathway, content in self.pathway_db.items():
           sig_genes = [g for g in content['genes'] if g in diff_genes.index and diff_genes.loc[g, 'significant']]
           weighted_score = 0
           for gene in sig_genes:
               base_score = abs(diff_genes.loc[gene, 'log2FC'])
               downstream = network_agent._bfs_downstream(gene, max_depth=1)
               centrality = len(downstream) / 10
               weighted_score += base_score * (1 + centrality)
           sig_metabs = [m for m in content['metabolites'] if m in diff_metab.index and diff_metab.loc[m, 'significant']]
           metab_score = sum(abs(diff_metab.loc[m, 'log2FC']) for m in sig_metabs)
           total_score = (weighted_score + metab_score * 2) / max(len(content['genes']) + len(content['metabolites']), 1)
           if total_score > 0.5:
               enriched[pathway] = {'score': total_score, 'genes': sig_genes, 'metabolites': sig_metabs,
                   'gene_fc': {g: diff_genes.loc[g, 'log2FC'] for g in sig_genes},
                   'metab_fc': {m: diff_metab.loc[m, 'log2FC'] for m in sig_metabs},
                   'coherence': self._pathway_coherence(sig_genes, diff_genes)}
       return enriched
   def _pathway_coherence(self, genes, diff_genes):
       if len(genes) < 2:
           return 0
       fcs = [diff_genes.loc[g, 'log2FC'] for g in genes]
       same_direction = sum(1 for fc in fcs if np.sign(fc) == np.sign(fcs[0]))
       return same_direction / len(genes)

Drug repurposing and hypothesis generation

The DrugRepurposingAgent scores drugs based on whether their targets are significantly dysregulated and whether those targets overlap with identified master regulators. The AIHypothesisEngine synthesizes temporal trends, master regulators, enriched pathways, causal links and drug scores into a readable report and a short list of biological hypotheses.

class DrugRepurposingAgent:
   def __init__(self, drug_db):
       self.drug_db = drug_db
 
 
   def predict_drug_response(self, diff_genes, master_regulators):
       predictions = []
       for drug, targets in self.drug_db.items():
           score = 0
           affected_targets = []
           for target in targets:
               if target in diff_genes.index:
                   fc = diff_genes.loc[target, 'log2FC']
                   is_sig = diff_genes.loc[target, 'significant']
                   if is_sig:
                       drug_benefit = -fc if fc > 0 else 0
                       score += drug_benefit
                       affected_targets.append((target, fc))
                   if target in [mr[0] for mr in master_regulators[:5]]:
                       score += 2
           if score > 0:
               predictions.append({
                   'drug': drug,
                   'score': score,
                   'targets': affected_targets,
                   'mechanism': 'Inhibition of upregulated pathway'
               })
       return sorted(predictions, key=lambda x: x['score'], reverse=True)
 
 
class AIHypothesisEngine:
   def generate_comprehensive_report(self, omics_data, analysis_results):
       report = ["="*80, "ADVANCED MULTI-OMICS INTERPRETATION REPORT", "="*80, ""]
       trends = analysis_results['temporal']
       top_trends = sorted(trends.items(), key=lambda x: abs(x[1]['slope']), reverse=True)[:5]
       report.append("  TEMPORAL DYNAMICS ANALYSIS:")
       for gene, data in top_trends:
           direction = "↑ Increasing" if data['slope'] > 0 else "↓ Decreasing"
           report.append(f"  {gene}: {direction} (slope: {data['slope']:.3f})")
       report.append("\n  MASTER REGULATORS (Top 5):")
       for gene, data in analysis_results['master_regs'][:5]:
           report.append(f"  • {gene}: Controls {data['sig_downstream']} dysregulated genes (FC: {data['fc']:+.2f}, Impact: {data['score']:.3f})")
       report.append("\n ENRICHED PATHWAYS:")
       for pathway, data in sorted(analysis_results['pathways'].items(), key=lambda x: x[1]['score'], reverse=True):
           report.append(f"\n{pathway} (Score: {data['score']:.3f}, Coherence: {data['coherence']:.2f})")
           report.append(f"    Genes: {', '.join(data['genes'][:6])}")
           if data['metabolites']:
               report.append(f"    Metabolites: {', '.join(data['metabolites'][:4])}")
       report.append("\n CAUSAL RELATIONSHIPS (Top 10):")
       for link_type, source, target, fc1, fc2 in analysis_results['causal'][:10]:
           report.append(f"  {source} →[{link_type}]→ {target} (FC: {fc1:+.2f}{fc2:+.2f})")
       report.append("\n DRUG REPURPOSING PREDICTIONS:")
       for pred in analysis_results['drugs'][:5]:
           report.append(f"  • {pred['drug']} (Score: {pred['score']:.2f})")
           report.append(f"    Targets: {', '.join([f'{t[0]}({t[1]:+.1f})' for t in pred['targets']])}")
       report.append("\n AI-GENERATED BIOLOGICAL HYPOTHESES:\n")
       for i, hyp in enumerate(self._generate_advanced_hypotheses(analysis_results), 1):
           report.append(f"{i}. {hyp}\n")
       report.append("="*80)
       return "\n".join(report)
 
 
   def _generate_advanced_hypotheses(self, results):
       hypotheses = []
       pathways = results['pathways']
       if 'Glycolysis' in pathways and 'Oxidative_PhosphORYlation' in pathways:
           glyc = pathways['Glycolysis']['score']
           oxphos = pathways['Oxidative_PhosPHORYlation']['score']
           if glyc > oxphos * 1.5:
               hypotheses.append(
                   "WARBURG EFFECT DETECTED: Aerobic glycolysis upregulation with oxidative phosphorylation suppression suggests metabolic reprogramming driven by HIF1A."
               )
       if 'Cell_Cycle_G1S' in pathways and 'mTOR_Signaling' in pathways:
           hypotheses.append(
               "PROLIFERATIVE SIGNATURE: Cell-cycle activation with mTOR signaling indicates anabolic reprogramming; dual CDK4/6 and mTOR inhibition may be effective."
           )
       if results['master_regs']:
           top_mr = results['master_regs'][0]
           hypotheses.append(
               f"UPSTREAM REGULATOR: {top_mr[0]} controls {top_mr[1]['sig_downstream']} dysregulated genes; targeting this node can propagate network-wide correction."
           )
       trends = results['temporal']
       progressive = [g for g, d in trends.items() if abs(d['slope']) > 0.5]
       if len(progressive) > 5:
           hypotheses.append(
               f"PROGRESSIVE DYSREGULATION: {len(progressive)} genes show strong temporal shifts, indicating evolving pathology and benefit from early pathway intervention."
           )
       if 'HIF1_Signaling' in pathways:
           hypotheses.append(
               "HYPOXIA RESPONSE: HIF1 signaling suggests oxygen-poor microenvironment; anti-angiogenic strategies may normalize perfusion."
           )
       if 'p53_Signaling' in pathways:
           hypotheses.append(
               "TUMOR SUPPRESSOR LOSS: p53 pathway suppression suggests benefit from MDM2 inhibition if TP53 is wild-type."
           )
       return hypotheses if hypotheses else ["Complex multi-factorial dysregulation detected."]

End-to-end orchestration

Finally, the orchestrator runs the generator and all agents in sequence, collects results and uses the hypothesis engine to assemble a human-readable report. The orchestration demonstrates how modular components can be combined for practical analysis workflows.

def run_advanced_omics_interpretation():
   print(" Initializing Advanced Multi-Agent Omics System...\n")
   omics = AdvancedOmicsGenerator.generate_coherent_omics()
   print(" Generated multi-omics dataset")
   stat_agent = StatisticalAgent()
   control_samples = [c for c in omics.transcriptomics.columns if 'Control' in c]
   disease_samples = [c for c in omics.transcriptomics.columns if 'Disease' in c]
   diff_trans = stat_agent.differential_analysis(omics.transcriptomics, control_samples, disease_samples)
   diff_prot = stat_agent.differential_analysis(omics.proteomics, control_samples, disease_samples)
   diff_metab = stat_agent.differential_analysis(omics.metabolomics, control_samples, disease_samples)
   temporal = stat_agent.temporal_analysis(omics.transcriptomics, omics.metadata)
   network_agent = NetworkAnalysisAgent(GENE_INTERACTIONS)
   master_regs = network_agent.find_master_regulators(diff_trans)
   causal_links = network_agent.causal_inference(diff_trans, diff_prot, diff_metab)
   pathway_agent = PathwayEnrichmentAgent(PATHWAY_DB, GENE_INTERACTIONS)
   enriched = pathway_agent.topology_weighted_enrichment(diff_trans, diff_metab, network_agent)
   drug_agent = DrugRepurposingAgent(DRUG_TARGETS)
   drug_predictions = drug_agent.predict_drug_response(diff_trans, master_regs)
   results = {
       'temporal': temporal,
       'master_regs': master_regs,
       'causal': causal_links,
       'pathways': enriched,
       'drugs': drug_predictions
   }
   hypothesis_engine = AIHypothesisEngine()
   report = hypothesis_engine.generate_comprehensive_report(omics, results)
   print(report)
   return omics, results
 
 
if __name__ == "__main__":
   omics_data, analysis = run_advanced_omics_interpretation()

What this pipeline delivers

  • Reproducible synthetic multi-omics datasets that reflect pathway-specific dynamics across timepoints
  • Differential and temporal statistics at transcript, protein and metabolite level
  • Network-based master regulator prioritization and simple causal links across omics layers
  • Topology-aware pathway enrichment that weights gene dysregulation by network centrality and metabolite evidence
  • A drug repurposing scoring heuristic that prioritizes agents targeting upregulated and central nodes
  • An automatic hypothesis report that combines all evidence into actionable biological narratives

This modular design is adaptable for real datasets: swap the synthetic generator for experimental matrices, tune thresholds and pathway resources, and reuse the same agent chain to produce integrated, interpretable analysis outputs.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский