- Published on
Google DeepMind Robotics Director: We Need "One More Big Breakthrough" to Solve General Purpose Robots

In the race to build general-purpose robots, companies often project an image of imminent victory. However, in a candid new video podcast released by Google DeepMind, the lab’s leadership offered a refreshing dose of realism alongside their latest demonstrations.
While showcasing robots that can "think" before they act and sort trash based on vague instructions, Kanishka Rao, DeepMind’s Director of Robotics, admitted that the industry hasn't quite cracked the code yet.
"I think we need at least one more big breakthrough," Rao told mathematician and host Hannah Fry during a tour of the company's California facilities.
The "Inner Monologue" of a Robot
The episode highlights the capabilities of Gemini 1.5 Robotics, a framework that splits the robotic "brain" into two distinct parts: an Embodied Reasoning (ER) model that plans high-level strategy, and a Vision-Language-Action (VLA) model that executes movements.
The video demonstrates this "orchestration" in real-time. In one demo, Fry tells a robot, "I'm in San Francisco and I don't know the rules about sorting trash. Can you look it up for me and then tidy up?"
The robot doesn't just move; it first accesses Google Search to learn local recycling laws (compost vs. recycling vs. landfill), creates a plan, and then executes the sort.
Crucially, the system displays an "internal monologue"—a stream of text reasoning—before it moves. "Reds are all in the black box," the robot "thinks" while sorting laundry in a separate demo, confirming its semantic understanding of the scene before committing to a physical action.

The Data Bottleneck: Teleop vs. The World
Despite these advances, the conversation highlighted the industry's most persistent bottleneck: data.
When asked if the current architecture is sufficient to "pack it up" and declare robotics solved, Rao was skeptical. He noted that while large language models (LLMs) had the entire internet to learn from, robotics suffers from a scarcity of "physical interaction data."
"It's not as big as the internet," Rao explained. "We have a breakthrough where they can learn more efficiently... but the core of the problem is still the robot data."
This highlights a diverging philosophy in Silicon Valley regarding how to get that data.
- DeepMind's Approach: As revealed in the footage, the lab utilizes specialized mechanical "leader arms"—physical rigs that operators manipulate directly—to teleoperate the robots rather than VR headsets. While this 1:1 "puppet" matching ensures high-quality manipulation data, DeepMind emphasizes that these skills are transferable. Skills learned on these specific training rigs can be deployed onto entirely different robot bodies, such as the Apptronik Apollo, validating their "cross-embodiment" strategy.
- Sunday Robotics' Approach: Interestingly, Sunday Robotics—founded by former DeepMind researcher Tony Zhao—is explicitly trying to bypass this bottleneck. By distributing "Skill Capture Gloves" to users in homes, Sunday aims to collect millions of trajectories without needing a robot present at all.
While DeepMind’s teleoperation allows for high precision—demonstrated in the video by a robot packing a sandwich into a Ziploc bag with millimeter accuracy—it remains labor-intensive.

Learning from YouTube
If teleoperation is too slow and simulation is too "clean," where will the data come from? Keerthana Gopalakrishnan, a Research Scientist at DeepMind, pointed to a massive, untapped resource: video.
"There is a lot of manipulation data that is collected by humans posting videos about how to do anything," Gopalakrishnan said, referencing platforms like YouTube. "We should be able to learn from that at some point."
This aligns with DeepMind's broader "hardware agnostic" strategy. The video features the software running not just on stationary arms, but on the Apptronik Apollo humanoid, further cementing the company's goal to build the "Android operating system" for robotics rather than just the hardware.
Apptronik celebrated the feature, noting on social media that Apollo was "responding to complex instructions and adapting to changing contexts."
"A Long Tail of Problems"
The video serves as a progress report for DeepMind's "physical AI" ambitions. The visual generalization—the ability for robots to ignore lighting changes or backgrounds—is "much more solved" than it was four years ago, according to Rao.
However, the "final picture" of general-purpose robotics still requires bridging the gap between seeing the world and physically handling it with the ease of a human.
"There’s one hypothesis that [data] is all you need," Rao concluded. "If you can collect that much robot data, then we're done... but there is still a long tail of problems to solve."
Watch the full episode of the Google DeepMind podcast below
Share this article
Stay Ahead in Humanoid Robotics
Get the latest developments, breakthroughs, and insights in humanoid robotics — delivered straight to your inbox.