Building an AI Agent for Live Python Execution with Automated Result Validation

Harnessing AI for Dynamic Python Code Execution and Validation

This tutorial demonstrates how to create an advanced AI Agent that writes, executes, and validates Python code live. By integrating LangChain's ReAct agent framework with Anthropic's Claude API, the system generates Python code, executes it in real-time, captures outputs, maintains execution state, and automatically validates results against expected criteria or test cases. This cycle of "write → run → validate" enables reliable computational workflows for analysis, algorithms, and machine learning pipelines.

Installing Required Libraries

!pip install langchain langchain-anthropic langchain-core anthropic

The setup includes the LangChain framework, Anthropic API integration, and core utilities for agent orchestration and Claude API communication.

Core Components: Python REPL Tool

The PythonREPLTool class provides a stateful Python REPL environment. It executes arbitrary code snippets, captures stdout/stderr outputs, stores execution history, and returns detailed feedback including code, output, errors, and return values.

Automated Result Validation

The ResultValidator class leverages the Python REPL to generate and run custom validation routines. It supports:

Mathematical result validation (checking numeric properties such as count, min/max values, sums)
Data analysis validation (verifying variable existence, types, and structure)
Algorithm correctness validation using test cases

These validations produce summaries of pass/fail statuses, ensuring reliability of computations.

Integrating with LangChain Tools

The REPL and validator are wrapped as LangChain Tool objects, named python_repl and result_validator. The AI agent can invoke these tools to execute code and validate results automatically within its reasoning loop.

Prompt for Agent Behavior

A custom prompt template directs the Claude AI to:

Analyze the question
Choose actions (running code, validating results)
Observe and iterate
Provide a fully validated final answer

This enforces a disciplined approach to problem solving with live code execution and validation.

AdvancedClaudeCodeAgent Class

This class encapsulates the entire system, initializing the Claude LLM, setting up the agent with tools and prompt, and providing methods:

run(query): submits natural language queries, returns validated answers
validate_last_result(): manual validation hooks
get_execution_summary(): summary of all code executions

Example Usage

The agent is demonstrated with examples including prime number analysis, sales data analytics, algorithm implementation with tests, and a machine learning pipeline with cross-validation. Each example runs code live, validates results, and prints detailed output.

Execution Summary

After running queries, the agent provides a summary of total executions, successful runs, failures, and error details, illustrating the robust "write → run → validate" workflow.

This advanced agent combines generative AI with precise computational control, enabling trustworthy, reproducible data analysis and algorithm development.

Check out the Notebook on GitHub for the full implementation and follow the community for updates.