Google DeepMind Unveils On-Device Gemini Robotics, Pushing AI Closer to Autonomous Dexterity

Google DeepMind Gemini Robotics — The Apptronik Apollo humanoid robot, seen here performing a variety of tasks, demonstrates the versatile capabilities of Google DeepMind's new Gemini Robotics On-Device model, highlighting its adaptability across different robotic embodiments. Image: Google DeepMind

Google DeepMind Unveils On-Device Gemini Robotics, Pushing AI Closer to Autonomous Dexterity

Mountain View, CA — Google DeepMind has announced Gemini Robotics On-Device, its latest advancement in robotics AI, designed to bring sophisticated vision-language-action (VLA) models directly onto robotic hardware. This new iteration of the Gemini Robotics model, an evolution of the Gemini 2.0 framework, emphasizes efficiency, low-latency inference, and robust operation in environments with limited or no network connectivity.

The move to on-device processing is a significant step, addressing critical challenges for real-world robotic deployment, particularly in applications requiring immediate responses or operating in remote locations. By running the VLA model locally, DeepMind aims to enable robots to make decisions and execute actions without relying on constant cloud communication, enhancing both responsiveness and reliability.

Core Capabilities and Performance Benchmarks

Gemini Robotics On-Device is engineered as a foundational model primarily for bi-arm robots, requiring minimal computational resources. It builds upon the task generalization and dexterous manipulation capabilities previously demonstrated by the flagship Gemini Robotics model. According to DeepMind's internal evaluations, the on-device model exhibits strong visual, semantic, and behavioral generalization across diverse scenarios. It can interpret natural language instructions and carry out complex dexterous tasks such as unzipping bags, folding clothes, or pouring salad dressing, all while operating autonomously on the robot itself.

The model is also designed for rapid adaptation. DeepMind states that developers can fine-tune the model for new tasks with as few as 50 to 100 demonstrations, indicating its strong ability to generalize foundational knowledge to novel applications. Comparisons to previous on-device models suggest Gemini Robotics On-Device outperforms alternatives in both general instruction following and more challenging out-of-distribution tasks, particularly those requiring fine-tuning for optimal performance.

Adaptability Across Embodiments

A notable aspect of Gemini Robotics On-Device is its demonstrated adaptability across different robot embodiments. While initially trained on ALOHA robots, DeepMind has successfully adapted the model to control a bi-arm Franka FR3 robot and even the Apptronik Apollo humanoid robot. On the Franka, the model executed general instruction-following, including handling previously unseen objects and performing precision-demanding industrial tasks like belt assembly. Its adaptation to the Apollo humanoid, a significantly different form factor, further highlights the model's generalist capabilities in understanding and manipulating objects based on natural language instructions.

Competing Visions: On-Device VLAs from DeepMind, Figure, and 1X

The push for efficient, on-device Vision-Language-Action (VLA) models is a rapidly accelerating trend in robotics, with several key players advancing their own solutions. Google DeepMind's Gemini Robotics On-Device joins a competitive landscape that includes Figure's Helix and 1X's Redwood, each offering a distinct approach to bringing advanced AI capabilities to embodied agents.

Two Figure humanoid robots collaborating — Figure's Helix AI enables its robots to understand and execute complex, long-horizon tasks like collaborative grocery organization, showcasing the model's potential in unstructured home environments. Image credit: Figure

Figure AI's Helix is a VLA model designed for its humanoid robots, emphasizing full upper-body dexterity, multi-robot collaboration, and versatile grasping capabilities. Helix employs a dual-system architecture, with a slower, high-level VLM (System 2) for scene understanding and language comprehension, and a faster reactive visuomotor policy (System 1) for precise continuous actions. It also runs entirely on onboard GPUs and has been trained on comparatively smaller datasets of human demonstrations, augmented with auto-labeling. You can read more about it here.

1X's NEO humanoid robot against a forest background with the word REDWOOD. — The namesake of 1X's new AI model, the redwood forest, represents the complex, unstructured environments the company hopes its NEO robots will master.

1X's Redwood, the unified AI model for its NEO humanoid robots, stands out for its "whole-body control," integrating locomotion and manipulation into a single network. Redwood also runs fully onboard and operates at approximately 5 Hz. A key distinction for 1X is its emphasis on learning from both successful and failed tasks in real-world home environments, which the company believes is crucial for robust behavior. Further details on Redwood and its mobility controller can be found here and here.

While all three models prioritize on-device operation for low-latency and robust performance, their specific architectural choices, training methodologies, and demonstrated capabilities reflect differing strategies for achieving general-purpose robotics. DeepMind's focus on rapid adaptation with fewer demonstrations and its broad adaptability across various robot types (not just their own) positions Gemini Robotics On-Device as a versatile tool for the wider robotics community.

Responsible Development and Access

Google DeepMind emphasizes that all Gemini Robotics models are developed in adherence to its AI Principles, incorporating a comprehensive safety approach that spans both semantic and physical safety. This includes using the Live API for semantic and content safety, interfacing models with low-level safety-critical controllers, and conducting red-teaming exercises to identify vulnerabilities. The Responsible Development & Innovation (ReDI) team and the Responsibility & Safety Council (RSC) are involved in assessing and mitigating real-world impacts.

To gather feedback and further understand its usage and safety profile, Gemini Robotics On-Device is initially being released to a select group of trusted testers through an exclusive program. DeepMind has also made available a Gemini Robotics SDK to facilitate evaluation in both physical and simulated environments (via the MuJoCo physics simulator), allowing developers to experiment with and adapt the model for their specific needs.

This on-device solution is poised to help the robotics community overcome latency and connectivity hurdles, potentially accelerating innovation in autonomous systems and bringing more sophisticated AI capabilities directly into the physical world.

Google DeepMind Unveils On-Device Gemini Robotics, Pushing AI Closer to Autonomous Dexterity