Google DeepMind Launches Gemini Robotics On-Device for Real-Time, Cloud-Free Robot Control
Google DeepMind released Gemini Robotics On-Device, a local AI model that enables robots to perform complex tasks in real time without cloud connectivity, opening new possibilities for edge robotics.
Local AI Brings Advanced Robotics to the Edge
Google DeepMind has introduced Gemini Robotics On-Device, a compact, local implementation of its powerful vision-language-action (VLA) model. This innovation enables sophisticated robotic intelligence to operate directly on robots without relying on cloud connectivity, preserving the flexibility and precision inherent to the Gemini model family.
Overcoming Cloud Dependency
Historically, high-capacity VLA models depended on cloud-based processing due to the heavy computational and memory demands. Gemini Robotics On-Device revolutionizes this by running entirely on embedded GPUs within robots. This approach supports environments with strict latency requirements and limited bandwidth, such as homes, hospitals, and manufacturing facilities.
Key Strengths of the On-Device Model
The model maintains Gemini Robotics' core capabilities: understanding human instructions, interpreting multimodal inputs (visual and textual), and generating precise real-time motor actions. Remarkably, it learns efficiently, needing only 50 to 100 demonstrations to generalize new skills, which makes it practical for diverse, real-world applications.
Core Features
- Fully Local Execution: Operates on the robot’s onboard GPU for closed-loop control without internet dependence.
- Two-Handed Dexterity: Executes complex bimanual tasks, thanks to pretraining on the ALOHA dataset and fine-tuning.
- Multi-Embodiment Compatibility: Generalizes across various robot platforms, including humanoid and industrial dual-arm manipulators.
- Few-Shot Adaptation: Quickly learns new tasks from a small number of demonstrations, reducing development time.
Real-World Applications
Gemini Robotics On-Device enables delicate manipulation tasks like folding clothes, assembling parts, and opening jars by providing fine motor control and immediate feedback. Its reduced communication latency enhances responsiveness, crucial for edge deployments where connectivity may be unreliable or privacy is paramount. Potential applications include:
- Home assistance robots performing daily chores
- Healthcare robots aiding rehabilitation and eldercare
- Industrial automation with adaptable assembly line workers
Developer Tools and Simulation Support
DeepMind has also released a Gemini Robotics SDK to facilitate testing, fine-tuning, and integration of the on-device model. The SDK supports task-specific training pipelines and works with various robot types and camera setups. Additionally, evaluation can be performed using the open-sourced MuJoCo physics simulator, which now includes benchmarks tailored to bimanual dexterity tasks.
Advancing On-Device Embodied AI
This release represents a significant step in unifying perception, reasoning, and action within physical environments, bridging foundational AI research and practical, autonomous robotic systems. By optimizing compute graphs, compressing models, and tailoring architectures for embedded GPUs, Gemini Robotics On-Device addresses the latency and cloud dependency limitations of larger models like Gemini 1.5.
Impact on Robotics and AI Deployment
Decoupling AI models from the cloud enables scalable, privacy-preserving robotics aligned with the edge AI trend—shifting computation closer to data sources. This improves safety, responsiveness, and operational reliability in latency-sensitive and privacy-critical environments. DeepMind's broader efforts to open simulation platforms and release benchmarks further empower researchers globally to innovate and build dependable, real-time robotic systems.
For more details, refer to the official paper and technical documentation. Follow updates on Twitter, join the ML subreddit, and subscribe to the newsletter for ongoing insights.
Сменить язык
Читать эту статью на русском