Introducing Confucius Code Agent: Scalable Software Engineering AI

The Evolution of AI in Software Engineering

How far can a mid-sized language model go if the real innovation moves from the backbone into the agent scaffold and tool stack? Meta and Harvard researchers have released the Confucius Code Agent, an open-sourced AI software engineer built on the Confucius SDK designed for industrial-scale software repositories and long-running sessions. The system targets real GitHub projects, complex test toolchains at evaluation time, and reproducible results on benchmarks like SWE Bench Pro and SWE Bench Verified, while exposing the full scaffold for developers.

Confucius Code Agent

Confucius SDK: Targeting Developer Experience

The Confucius SDK is an agent development platform that prioritizes scaffolding as a central design challenge. It organizes around three axes: Agent Experience, User Experience, and Developer Experience.

Agent Experience: Controls what the model perceives, including context layout, working memory, and tool results.
User Experience: Focuses on producing readable traces, code diffs, and safeguards to aid human engineers.
Developer Experience: Concentrates on observability, configuration, and debugging of the agent itself.

The SDK incorporates three core mechanisms: a unified orchestrator with hierarchical working memory, a persistent note-taking system, and a modular extension interface for tools. A meta agent automates the synthesis and refinement of agent configurations through a build, test, improve loop.

SDK Framework

Hierarchical Working Memory: Enhancing Coding Tasks

Real software tasks on SWE Bench Pro often require reasoning over numerous files and interactions. The orchestrator in the Confucius SDK maintains hierarchical working memory, summarizing past steps and preserving essential context for future interactions. This design ensures the model operates within context limits while keeping vital artifacts like patches and error logs accessible.

Learning Through Persistent Notes

The note-taking system employs a dedicated agent to generate structured Markdown notes from execution traces, capturing task-specific strategies and repository conventions, thereby creating reusable long-term memory across sessions. Research showed that utilizing notes effectively reduced turns and improved Resolve@1 from 53.0 to 54.4.

Modular Extensions for Enhanced Tool Use

Confucius SDK exposes tools as extensions, allowing for customized state management and prompt wiring. Investigations into advanced tool configurations revealed that enhancing tool handling significantly increases Resolve@1 scores, highlighting the importance of strategic tool sequence.

Modular Extensions

Meta-Agent: Revolutionizing Agent Architecture

The SDK's meta agent iteratively suggests configurations based on natural language specifications and refines them through a feedback cycle. This innovative process turns agent engineering into a guided optimization problem facilitated by LLMs.

Performance Metrics on SWE Bench Pro

Evaluation on SWE Bench Pro involves modifying real repositories across 731 GitHub issues. The results for Resolve@1 scores are:

Claude 4 Sonnet with Confucius Code Agent: 52.7
Claude 4.5 Sonnet with SWE Agent: 43.6

These outcomes reveal that strong scaffolding can surpass performance capabilities of higher-tier models when paired with weaker scaffolds.

Key Takeaways

Scaffolding vs. Model Size: Effective scaffolding allows models like Claude 4.5 Sonnet to excel in performance.
Memory Architecture: Hierarchical working memory is vital for tasks that span multiple files.
Persistence in Note-Taking: Structured notes function as effective cross-session memory.
Impact of Tool Configuration: How tools are sequenced significantly influences success rates.
Automated Agent Design: The meta-agent offers a streamlined approach to agent configuration and refinement.

Conclusion

The Confucius Code Agent is a groundbreaking step in leveraging agent scaffolding for enhanced software engineering capabilities. By integrating various strategies, it sets a new standard for future developments in AI-driven coding solutions.