Harnessing GPT-4o-mini to Build a Medical Knowledge Graph from Unstructured Data

Leveraging LLMs for Knowledge Graph Construction

Large Language Models (LLMs) like GPT-4o-mini have revolutionized the extraction of meaningful information from unstructured documents. Unlike traditional NLP techniques, LLMs provide enhanced context awareness and accuracy, especially when dealing with messy or natural language data. This tutorial demonstrates how to create a Knowledge Graph from a raw medical patient log using Python, Mirascope, and OpenAI's GPT-4o-mini.

Setting Up the Environment

First, install the necessary dependencies including Mirascope for interfacing with OpenAI, matplotlib for visualization, and networkx for graph handling:

!pip install "mirascope[openai]" matplotlib networkx

You also need an OpenAI API key to access GPT-4o-mini. Generate your key at the OpenAI platform and configure it in your environment:

import os
from getpass import getpass
os.environ["OPENAI_API_KEY"] = getpass('Enter OpenAI API Key: ')

Defining the Knowledge Graph Schema

A structured schema is essential to represent entities and their relationships extracted from text. Using Pydantic models, define nodes representing entities (like "Doctor" or "Medication"), edges representing relationships, and the overall KnowledgeGraph container:

from pydantic import BaseModel
 
class Edge(BaseModel):
    source: str
    target: str
    relationship: str
 
class Node(BaseModel):
    id: str
    type: str
    properties: dict | None = None
 
class KnowledgeGraph(BaseModel):
    nodes: list[Node]
    edges: list[Edge]

Sample Patient Log

The input data is an unstructured patient log describing events and symptoms related to a patient named Mary:

patient_log = """
Mary called for help at 3:45 AM, reporting that she had fallen while going to the bathroom. This marks the second fall incident within a week. She complained of dizziness before the fall.
 
Earlier in the day, Mary was observed wandering the hallway and appeared confused when asked basic questions. She was unable to recall the names of her medications and asked the same question multiple times.
 
Mary skipped both lunch and dinner, stating she didn't feel hungry. When the nurse checked her room in the evening, Mary was lying in bed with mild bruising on her left arm and complained of hip pain.
 
Vital signs taken at 9:00 PM showed slightly elevated blood pressure and a low-grade fever (99.8°F). Nurse also noted increased forgetfulness and possible signs of dehydration.
 
This behavior is similar to previous episodes reported last month.
"""

Extracting the Knowledge Graph Using GPT-4o-mini

The core of this method is a function decorated to call the GPT-4o-mini model, which processes the patient log and returns nodes and edges structured as a KnowledgeGraph instance. The system prompt instructs the model to identify entities, symptoms, events, and their relationships:

from mirascope.core import openai, prompt_template
 
@openai.call(model="gpt-4o-mini", response_model=KnowledgeGraph)
@prompt_template(
    """
    SYSTEM:
    Extract a knowledge graph from this patient log.
    Use Nodes to represent people, symptoms, events, and observations.
    Use Edges to represent relationships like "has symptom", "reported", "noted", etc.
 
    The log:
    {log_text}
 
    Example:
    Mary said help, I've fallen.
    Node(id="Mary", type="Patient", properties={{}})
    Node(id="Fall Incident 1", type="Event", properties={{"time": "3:45 AM"}})
    Edge(source="Mary", target="Fall Incident 1", relationship="reported")
    """
)
def generate_kg(log_text: str) -> openai.OpenAIDynamicConfig:
    return {"log_text": log_text}
 
kg = generate_kg(patient_log)
print(kg)

Querying the Knowledge Graph

Once the graph is constructed, a function can accept natural language questions and use the graph to provide answers, further leveraging the LLM's reasoning capabilities:

@openai.call(model="gpt-4o-mini")
@prompt_template(
    """
    SYSTEM:
    Use the knowledge graph to answer the user's question.
 
    Graph:
    {knowledge_graph}
 
    USER:
    {question}
    """
)
def run(question: str, knowledge_graph: KnowledgeGraph): ...
 
question = "What health risks or concerns does Mary exhibit based on her recent behavior and vitals?"
print(run(question, kg))

Visualizing the Knowledge Graph

Visualization helps in understanding the patient data relationships clearly. Using matplotlib and networkx, the graph is rendered interactively:

import matplotlib.pyplot as plt
import networkx as nx
 
def render_graph(kg: KnowledgeGraph):
    G = nx.DiGraph()
 
    for node in kg.nodes:
        G.add_node(node.id, label=node.type, **(node.properties or {}))
 
    for edge in kg.edges:
        G.add_edge(edge.source, edge.target, label=edge.relationship)
 
    plt.figure(figsize=(15, 10))
    pos = nx.spring_layout(G)
    nx.draw_networkx_nodes(G, pos, node_size=2000, node_color="lightgreen")
    nx.draw_networkx_edges(G, pos, arrowstyle="->", arrowsize=20)
    nx.draw_networkx_labels(G, pos, font_size=12, font_weight="bold")
    edge_labels = nx.get_edge_attributes(G, "label")
    nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_color="blue")
    plt.title("Healthcare Knowledge Graph", fontsize=15)
    plt.show()
 
render_graph(kg)

This approach showcases the power of LLMs in converting unstructured medical notes into actionable insights through structured graphs, enhancing data analysis and decision-making in healthcare.