Gemini-Powered SIMA 2 Learns to Play and Improve Inside Goat Simulator 3 and Other Worlds

SIMA 2: a Gemini-powered multiworld agent

Google DeepMind's SIMA 2 is a new agent that combines the scalable instructable multiworld agent concept with the Gemini large language model. Built as the successor to the original SIMA, SIMA 2 can perceive game pixels frame by frame, interpret instructions, and act in diverse 3D virtual environments. DeepMind presents it as a step toward more general-purpose agents and future robot applications.

Training from human gameplay

Like its predecessor, SIMA 2 was trained on footage of humans playing eight commercial games, including No Man's Sky and Goat Simulator 3, plus three custom virtual worlds created by DeepMind. The training matches keyboard and mouse inputs to in-game actions so the agent learns how human controls map to virtual behavior.

Gemini's role in instruction and reasoning

Hooked up to Gemini, SIMA 2 gains stronger instruction following and better problem solving. Gemini helps the agent ask clarifying questions, provide status updates while attempting tasks, and generate hints when the agent struggles. That integration enables the system to plan multi-step sequences more effectively than before.

Generalization to new environments

DeepMind tested SIMA 2 in environments it had never seen. In one experiment the team used Genie 3, their world model, to procedurally generate entirely new worlds and then dropped SIMA 2 into them. The agent was able to navigate and carry out instructions in many of these novel settings, demonstrating better generalization than earlier systems.

Learning by trial and error

When SIMA 2 fails at a task, Gemini can produce tips and suggestions. The agent can retry tasks multiple times, incorporate Gemini's guidance, and often improve through repeated attempts. This loop of attempting, receiving generated advice, and retrying lets SIMA 2 learn to solve tougher problems without direct human reprogramming.

How humans interact with SIMA 2

Users can control SIMA 2 via text chat, voice, or by drawing on the game screen. The agent interprets those instructions alongside raw pixel input and decides what in-game keyboard and mouse actions to perform.

Limitations and remaining gaps

SIMA 2 is still experimental. It struggles with very long or highly complex multi-step tasks and is limited by a shortened long-term memory designed to keep responsiveness high. It also remains significantly worse than humans at fine-grained mouse and keyboard manipulation in some scenarios.

Community reactions and realism for robotics

Researchers are split on how far SIMA 2 takes the field. Some praise the result as progress on multi-game agents that learn from pixels and instructions. Others note that many commercial games share similar input schemes, which could simplify transfer across titles. Critics also point out that real-world robotics presents harder visual and physical challenges than games, where visuals are optimized for human play and rules vary across worlds.

Next steps: endless virtual training dojos

DeepMind plans to continue combining Genie 3 and Gemini to create a sort of endless training environment that generates new worlds and tasks. The goal is to let agents like SIMA 2 improve through ongoing trial and error, gradually building the skills needed for more complex environments and, ultimately, for controlling real-world robots.

Gemini-Powered SIMA 2 Learns to Play and Improve Inside Goat Simulator 3 and Other Worlds

Сменить язык