Boston Dynamics' Atlas Tackles Complex Warehouse Tasks with Advanced Perception and Adaptability

Boston Dynamics humanoid robot Atlas is working in a warehouse setting — See the video at the end of this article. Image credit: Boston Dynamics

Boston Dynamics' Atlas Demonstrates Advanced Perception and Dexterity in New Warehouse Task Video

Boston Dynamics released a new video on May 28th, showcasing its all-electric Atlas humanoid robot performing a complex sequencing task: autonomously picking, carrying, and placing automotive components. The demonstration highlights significant advancements in the robot's perception system and its ability to operate dynamically in environments that are not perfectly predictable.

The Challenge: Real-World Dexterity

The video, accompanied by commentary from the Atlas team, explains that the chosen task—moving parts from a staging area to a fixture—is a good fit for a humanoid robot. Gregory Izatt, Senior Research Scientist at Boston Dynamics, noted that the task has "the right blend of being just unstructured enough that you need the freedom and the power of a humanoid form factor... while at the same time, it is a pretty dull and pretty repetitive task that's both physically strenuous to do day in and day out."

This kind of work underscores what Jan Czarnowski, Perception Lead for Atlas, refers to as Moravec's paradox: tasks easy for humans, like motor skills and perception, are hard for robots, while complex calculations are easy. "Atlas's perception system has to be dynamic simply because we cannot predict the state of the world and how the world reacts to what we do with it," Czarnowski explained.

Moravec's paradox is the observation in the fields of artificial intelligence and robotics that, contrary to traditional assumptions, reasoning requires very little computation, but sensorimotor and perception skills require enormous computational resources. The principle was articulated in the 1980s by Hans Moravec, Rodney Brooks, Marvin Minsky, Allen Newell, and others. Source: Wikipeida

Seeing and Adapting in Real-Time

The team emphasized that these are not pre-recorded trajectories. "Small imperfections and small errors accumulate very quickly to make what we think the state of the world is diverge from reality," Czarnowski stated. Atlas uses camera sensors to build a 3D map of its environment, identify objects, and spot obstacles, employing a combination of AI and classical systems.

Precision is critical. Sudharshan Suresh, Senior Research Scientist, pointed out that for stowing components, "margins for success are very slim... about 5 centimeters." Real-time perception is essential to adjust for variables like objects slipping in hand or imperfect grasps. "It's essentially impossible unless you have real-time perception running," Suresh added.

Tristan Laidlow, another Senior Research Scientist, detailed the challenges of perception, noting that objects are "often shoved into dark cubbies... with only a small sliver of the object actually visible." Atlas may even obscure its own view with its arm, requiring sophisticated solutions. Suresh mentioned that Atlas might sometimes shift an object in its hand "as if to get a better glance at it."

Handling Imperfection and Failure

The demonstration also shows Atlas adapting to changes, such as a dolly holding fixtures being moved. "Atlas needs to constantly update its belief about where those fixtures are in the environment," Laidlow said. This adaptability extends to recovering from errors. Izatt described how Atlas can pick up an object dropped on the floor—a scenario indicating something has already gone wrong. "The instructions are literally just put your hands roughly here relative to the object, push them into the floor, curl the fingers and push them together," Izatt explained, highlighting the reliance on the robot's control stack and perception to execute such a recovery.

Towards 'Physical Intelligence'

Czarnowski framed these developments as part of a broader shift in robotics and AI. "Currently, the biggest challenge for Atlas and other humanoids on the market is adaptability," he said. The goal is to move towards systems that can learn more fundamental truths about the world, leveraging large foundational models trained on multi-modal data like video, images, and language.

"We're going past just perception and understanding images and more towards controlling the whole robot based on language and video inputs," Czarnowski concluded. "This shift is basically a shift from spatial AI to physical intelligence."

The new video provides a compelling glimpse into the ongoing efforts to make humanoid robots more capable and adaptable for practical, real-world applications, moving beyond controlled demonstrations to tasks requiring robust interaction with a dynamic environment.