Published on

Google DeepMind Unveils Gemini Robotics-ER 1.6: A Leap in Spatial Reasoning and Industrial Utility

P.A.
Written byP.A.
AGIBOT AI Week
AGIBOT logo

AGIBOT AI Week: Solving the Physical AI Bottleneck

April 7–14 | A new technical reveal every weekday. From foundational datasets to integrated hardware, go inside the stack built for real-world impact.

In collaboration with AGIBOT

MOUNTAIN VIEW, CA — On April 14, 2026, Google DeepMind announced the release of Gemini Robotics-ER 1.6, a significant upgrade to its specialized "Embodied Reasoning" framework. The new model, which follows the two-part brain architecture established in late 2025, introduces enhanced spatial reasoning, improved multi-view success detection, and a new "agentic vision" capability designed for high-precision industrial tasks.

Starting today, the model is available to developers via the Gemini API and Google AI Studio, signaling a rapid iteration cycle from the Gemini 1.5 Robotics framework released just months prior.

The Evolution of the "Strategic Planner"

In DeepMind’s robotics stack, the "ER" (Embodied Reasoning) model acts as the high-level strategic planner. While the Vision-Language-Action (VLA) model serves as the motor cortex, the ER model interprets complex, open-ended goals.

According to DeepMind researchers Laura Graesser and Peng Xu, Gemini Robotics-ER 1.6 shows substantial improvements over its predecessor, ER 1.5, and the base Gemini 3.0 Flash model. The model specializes in "bridging the gap between digital intelligence and physical action" by natively calling tools like Google Search or specialized VLAs to solve tasks.

A bar chart comparing the success rates of Gemini Robotics-ER 1.5, Gemini 3.0 Flash, and Gemini Robotics-ER 1.6 across four categories: Pointing & Counting, Single View Success Detection, Multiview Success Detection, and Instrument Reading with agentic vision.
Setting a new benchmark: Gemini Robotics-ER 1.6 shows significant performance gains over previous models, particularly in instrument reading and spatial reasoning tasks.

Precision Pointing and Spatial Logic

A core focus of the 1.6 update is "pointing"—a fundamental skill for spatial reasoning. The model uses points to identify objects, map trajectories, and define "from-to" relationships. DeepMind claims the new model can now handle more complex constraints, such as identifying every object in a scene small enough to fit inside a specific container, with much higher accuracy than earlier versions.

Solving the "Last Millimeter" with Agentic Vision

Perhaps the most notable addition is the ability to perform instrument reading. Developed in close collaboration with Boston Dynamics, this feature allows robots like the all-electric Atlas or the quadruped Spot to interpret analog pressure gauges, thermometers, and digital readouts during facility inspections.

This is achieved through Agentic Vision, a process where the model:

  • Zooms into high-resolution details of a gauge.
  • Estimates proportions and intervals using code execution.
  • Interprets context using world knowledge to determine if a reading indicates a safety hazard.

"Capabilities like instrument reading... will enable Spot to see, understand, and react to real-world challenges completely autonomously," said Marco da Silva, VP and GM of Spot at Boston Dynamics.

A yellow Boston Dynamics Spot robot standing in a dimly lit industrial facility with complex piping. An overlay shows a zoomed-in view of a circular temperature gauge with a reasoning block stating that the needle is pointing near the 80 mark.
Real-world reasoning: Using Gemini Robotics-ER 1.6, Boston Dynamics’ Spot can autonomously navigate industrial facilities to locate and interpret complex instruments like this temperature gauge.

Bridging the Success Detection Gap

One of the persistent hurdles in Physical AI is "success detection"—the ability for a robot to know when a task is actually finished.

Gemini Robotics-ER 1.6 introduces advanced multi-view reasoning, allowing it to synthesize data from multiple camera streams, such as an overhead view and a wrist-mounted feed. This enables the model to confirm task completion even in occluded or poorly lit environments, a critical requirement for moving beyond scripted lab demos and into the "messy" reality of factories.

Safety and Physical Constraints

DeepMind characterizes 1.6 as its "safest robotics model yet." It reportedly shows a significantly improved capacity to adhere to physical safety constraints, such as refusing to pick up objects that exceed a gripper's weight limit or avoiding hazardous materials like liquids.

In tests based on real-life injury reports, the Robotics-ER models improved by 10% in identifying safety hazards in video scenarios compared to the standard Gemini 3.0 Flash. This focus on "alignment for embodied intelligence" mirrors recent efforts by competitors like Generalist AI to ensure that autonomous improvisations remain safe on the factory floor.

The "Android of Robotics" Strategy Continues

The launch of Gemini Robotics-ER 1.6 reinforces DeepMind's ambition to build a universal operating system for robots. By making the reasoning model available via API, DeepMind is positioning its software to be the "brain" for a diverse ecosystem of hardware, including the Agile ONE humanoid and the Apptronik Apollo.

As 2026 progresses, the industry’s focus is clearly shifting from simple motor skills to the high-level reasoning required for robots to navigate the "long tail" of real-world industrial problems.


Bonus: watch a clip of the previous version of Gemini Robotics-ER in action:

Comments

No comments yet. Be the first to share your thoughts!

Share this article

Stay Ahead in Humanoid Robotics

Get the latest developments, breakthroughs, and insights in humanoid robotics — delivered straight to your inbox.