Google DeepMind Gives Robots a 'Thinking' Brain with Agentic Gemini 1.5 Models

Gemini ER — Google DeepMind's new framework divides the labor of robotic tasks. The Gemini Robotics-ER 1.5 model acts as the strategic 'brain,' interpreting a high-level goal and breaking it down into a logical plan. It then sends simple, sequential instructions to the Gemini Robotics 1.5 vision-language-action (VLA) model, which executes the physical movements.

MOUNTAIN VIEW, CA — Google DeepMind has introduced a significant evolution in its robotics AI, unveiling two new models—Gemini Robotics 1.5 and Gemini Robotics-ER 1.5—that aim to shift robots from reactive machines to proactive "physical agents." The new framework, announced September 25, builds on the company's earlier work this year to bring Gemini models on-device for faster, local processing, and now endows robots with the ability to reason about complex tasks, plan multi-step actions, and perhaps most importantly, generalize learned skills across entirely different robotic bodies.

The announcement marks a deliberate move away from models that simply translate a command into an action, toward systems that can independently reason about a goal and orchestrate the steps to achieve it.

A Two-Part Brain for Physical Tasks

At the core of the new system is a two-model architecture that functions like a high-level brain and a specialized motor cortex.

Gemini Robotics-ER 1.5 (Embodied Reasoning): This model acts as the strategic planner or "high-level brain." It's a vision-language model (VLM) fine-tuned for understanding the physical world. It can interpret complex, open-ended commands, break them down into a logical sequence of steps, and even use tools like Google Search to find necessary information—for example, looking up local recycling rules to sort trash correctly.
Gemini Robotics 1.5 (Vision-Language-Action): This is the execution-focused model. It takes the step-by-step instructions from the reasoning model and translates them into the robot's actual motor commands. It's a vision-language-action (VLA) model, designed to turn perception and language into physical motion.

This division of labor allows robots to tackle long-horizon tasks that have traditionally been intractable, such as tidying a room or preparing a meal based on vague instructions.

"Thinks Before Acting"

A key advancement highlighted by DeepMind is the VLA model's ability to "think before acting." Instead of directly outputting motor commands, Gemini Robotics 1.5 can generate an internal monologue of its reasoning process in natural language. For a command like "sort the laundry," the model might first reason, "Okay, sorting by color means white clothes go in the white bin, and colored clothes go in the black bin," before proceeding with the physical actions.

This internal reasoning makes the robot's decision-making process more transparent and robust, allowing it to handle more semantically complex tasks that require contextual understanding.

The Holy Grail: Learning Across Embodiments

Perhaps the most significant breakthrough is the model's capacity for cross-embodiment learning. Historically, a major challenge in robotics has been that skills learned on one robot do not transfer well to another with a different size, shape, or set of joints.

DeepMind reports that Gemini Robotics 1.5 overcomes this hurdle. Skills trained exclusively on one type of robot, such as the dual-arm ALOHA 2, were successfully performed by entirely different platforms—including a Franka industrial arm and Apptronik's Apollo humanoid—without any specific fine-tuning for the new hardware. This ability to generalize learned behaviors is a critical step toward creating truly general-purpose robots that don't require bespoke training for every new skill or platform.

Apptronik Collaboration Signals Real-World Application

The news was amplified by robotics firm Apptronik, a key Google partner, which is integrating the models into its Apollo humanoid. In a statement on social media, Apptronik called the development a "major milestone on the road to embodied intelligence" and noted the move toward "a new era of robotic autonomy."

The collaboration demonstrates that these advancements are not just theoretical. Apptronik stated it is "gearing up to deploy Gemini-powered Apollo humanoid robots in additional customer facilities," suggesting that this more advanced agentic AI is on a clear path from the research lab to real-world industrial environments.

Apptronik

@Apptronik

·Follow

🎯 A major milestone on the road to embodied intelligence. This week, @GoogleDeepMind's Gemini Robotics VLA reaches an important milestone in our joint mission to bring AGI into the physical world. By introducing agentic capabilities - the ability to reason, plan, actively use

4:07 PM · Sep 25, 2025

305

Read 20 replies

Safety and Availability

DeepMind emphasized a multi-layered approach to safety, including alignment with existing Gemini safety policies and the ability for the AI to reason about physical constraints and trigger on-board safety systems. The company also released an updated version of its ASIMOV benchmark for evaluating the semantic and physical safety of robotic systems.

Gemini Robotics-ER 1.5, the reasoning model, is being made available to developers immediately via the Gemini API in Google AI Studio. The action-focused VLA model, Gemini Robotics 1.5, is currently being rolled out to select partners, likely including hardware collaborators like Apptronik.