Efficient Agents: How to Slash AI Agent Costs Without Losing Performance

The rising cost of AI agents

AI agents that solve multi-step tasks rely on large language models (LLMs) and tool use, but their running costs have ballooned. Many state-of-the-art systems require hundreds of API calls per task, making large-scale deployment expensive for businesses and researchers alike. The OPPO AI Agent Team set out to quantify where those costs come from and propose a practical, lower-cost alternative.

Measuring what matters: cost-of-pass

The study introduces a clear metric called cost-of-pass: the total cost to generate a correct answer for a task. This metric combines token costs (what you pay for input and output) with the model's success rate on the first try. It highlights the trade-off between raw accuracy and economic efficiency: a model that is slightly less accurate but far cheaper per successful pass can be the better choice at scale.

What drives agent costs

Backbone model choice

High-performing models like Claude 3.7 Sonnet score very well on accuracy (61.82% on a tough benchmark) but come at a premium: about $3.54 per successful task. GPT-4.1 achieves 53.33% accuracy but costs only $0.98 per successful task. For simpler needs, models such as Qwen3-30B-A3B can cut costs dramatically—down to roughly $0.13 for basic tasks.

Planning and scaling strategies

More internal planning steps or brute-force scaling tricks (for example, Best-of-N) increase compute and API usage quickly. These approaches often yield diminishing returns: they burn a lot of resources for modest improvements in accuracy.

Tool usage patterns

Agents that invoke browsers, search engines, and other tools can access up-to-date information, but fancy browser maneuvers (page-up/page-down, deep navigation) add cost with little benefit. Broad, simple searching across a handful of high-quality sources tends to be more cost-effective.

Memory design

Keeping memory lean—tracking only essential actions and observations—proved to be the best trade-off. Adding complex memory modules made agents slower and more expensive without notable gains in effectiveness.

The Efficient Agents blueprint

OPPO's Efficient Agents framework prescribes a pragmatic combination of choices:

Use a capable but not overly expensive backbone (for example, GPT-4.1).
Limit planning steps to avoid unnecessary computation.
Search broadly but keep browser/tool actions simple.
Keep memory modules minimal and focused on recent actions and observations.

Applied together, these decisions produced agents that deliver about 96.7% of the performance of top open-source competitors (like OWL) while costing less than three-quarters as much—a roughly 28.4% reduction in cost without sacrificing much performance.

Practical implications

This work reframes the discussion about AI agents: efficiency and practicality matter as much as raw capability. For teams deploying agents at scale, the study suggests measuring cost-of-pass and tuning the model, planning depth, tool use, and memory design to optimize that metric. Because the Efficient Agents framework is open-source, organizations can experiment with the approach immediately and adapt it to their workloads.