Gemini 2.5 'Computer Use' Preview: A Browser-Focused Model That Executes UI Actions

October 8, 2025 · 3 min

Gemini 2.5 ‘Computer Use’ is a new Google AI model variant designed to plan and execute real user interface actions inside a live browser. Intended for web automation and UI testing, it is available in public preview through Google AI Studio and Vertex AI and exposes a constrained action API that loops with a client-side executor.

What the model does

The model generates function-style action commands such as click_at, type_text_at, and drag_and_drop. A developer-provided executor (for example Playwright or Browserbase) runs those actions, captures updated screenshots or URLs, and feeds the results back to the model. Execution continues until the task completes or a safety rule intervenes.

Action space and extensibility

Out of the box, the model supports 13 predefined UI actions:

open_web_browser
wait_5_seconds
go_back
go_forward
search
navigate
click_at
hover_at
type_text_at
key_combination
scroll_document
scroll_at
drag_and_drop

Developers can add custom functions for non-browser environments or mobile interactions, such as open_app, long_press_at, or go_home, while retaining the same looped control flow.

How integration works

Clients call a new computer_use tool that returns structured function calls. The client executes the suggested action, captures the resulting state (screenshot, URL), and provides that observation back to the model. This loop repeats step-by-step, allowing the agent to plan multi-step workflows while the executor enforces real-world constraints and environment fidelity.

Performance and measured results

Google reports human-judged gains on standard web and mobile control benchmarks. Key measurements include:

Online-Mind2Web (official): 69.0% pass@1 by majority-vote human judgments, validated by benchmark organizers.
Browserbase matched harness: Google lists 65.7% (OM2W) and 79.9% (WebVoyager) in Browserbase runs, claiming leading accuracy and latency under identical constraints.
Latency/quality trade-off: approximately 70%+ accuracy at around 225 seconds median latency on the Browserbase OM2W harness, per Google figures.
AndroidWorld mobile generalization: 69.7% measured by Google using the same API loop with custom mobile actions.

These numbers are reported by Google and include human evaluation components; treat them as vendor-reported benchmarks.

Safety, scope and constraints

The model is optimized for web browsers and is not yet tuned for desktop OS-level control. Mobile scenarios are supported by swapping in custom actions. A built-in safety monitor can block prohibited actions or ask for user confirmation before high-risk steps such as payments, sending messages, or accessing sensitive records.

Early production signals

Early adopters report practical benefits in testing and automation workflows. Google says its payments team used the model to repair more than 60% of previously failing automated UI tests, and an external tester reported workflows often about 50% faster compared with their next-best alternative. These are early signals and should be cited as preliminary or third-party reports.

Practical implications

Gemini 2.5 ‘Computer Use’ is a browser-first control model intended for developers building agents that interact with web UIs. It offers a constrained, auditable action space and a client-side execution loop that enables realistic automation while applying safety checks per step. For teams focused on web automation, UI testing, or developer tooling for agents, it provides an accessible public-preview path through Google AI Studio and Vertex AI.

For technical details, benchmarks and integration examples, refer to Google’s blog post and model card on the official blog and repository pages.