Beyond Imitation: MotionDisco Uses LLMs to Discover Whole-Body Humanoid Skills From Scratch

Researchers from the Technical University of Munich, NYU, and CMU have introduced MotionDisco, a framework that autonomously discovers contact-rich, long-horizon humanoid motions from scratch.
By bypassing traditional teleoperation and motion retargeting, the system removes the bottleneck of collecting human demonstration data.
The framework pairs an LLM-guided evolutionary search over discrete contact plans with a physics-informed trajectory optimizer that provides text-based failure feedback.
Deployed zero-shot on the Unitree G1 humanoid, the framework successfully executed diverse whole-body behaviors in simulation and the real world, including climbing tables and clearing cluttered corridors.
The project highlights a growing geopolitical paradox: while the software is a Western academic collaboration, it relies on Chinese hardware currently threatened by the U.S. GUARD Act of 2026.

The prevailing paradigm in humanoid robotics relies heavily on a simple premise: if you want a robot to move like a human, you must first show it how a human moves. Whether through tedious VR teleoperation or mapping massive motion-capture datasets onto robotic frames, cutting-edge labs have largely focused on narrowing the "embodiment gap" via imitation.

However, a collaborative team of researchers from the Technical University of Munich (TUM), New York University (USA), and Carnegie Mellon University (USA) is challenging this approach. In a new paper, the team introduced MotionDisco, a framework that enables humanoid robots to autonomously discover contact-rich, long-horizon loco-manipulation skills from scratch, without relying on any human data or teleoperation.

A Unitree G1 humanoid robot in a motion-capture laboratory stands balanced on a blue crate placed on a white table, reaching one arm upward toward a yellow banana hanging from the ceiling, while a researcher in the background holds a safety tether. — Reaching New Heights: For the "Banana" task, MotionDisco discovered a complex sequence requiring the robot to stack an object onto a table and climb atop it to grasp an out-of-reach target.

Breaking the Demonstration Bottleneck

While imitation learning has driven impressive breakthroughs—such as Amazon’s recent OmniRetarget and ResMimic frameworks—it suffers from a fundamental scalability bottleneck. Collecting high-quality human demonstrations for every possible real-world permutation is logistically impossible. Furthermore, bounding a robot to human kinematics prevents the machine from leveraging its own unique physical capabilities, such as non-human joint limits or structural support configurations.

MotionDisco moves beyond pure imitation by shifting the problem into an automated discovery loop. The framework tackles the combinatorial explosion of long-horizon tasks—where a robot must figure out exactly when and where to place its hands and feet to manipulate objects and navigate terrain—by separating high-level strategic reasoning from low-level physics optimization.

The Evolutionary Code Loop

At the core of MotionDisco is a "yin and yang" relationship between a Large Language Model (LLM)—specifically Anthropic's Claude Opus 4.7—and a rigid kinodynamic trajectory optimizer.

The process operates as an evolutionary tree search over programs:

Program Mutation: The LLM acts as a code generator, writing and mutating Python programs that output candidate contact-mode sequences (discrete plans of which limbs touch what surfaces) based on a textual description of the scene geometry and the task goal.
Kinematic Filtering: A fast sequential kinematic optimizer acts as a geometric gatekeeper, quickly rejecting any contact plans that break configuration limits or cause self-collisions.
Trajectory Optimization (TO): Plans that survive the kinematic filter are passed to a full trajectory optimizer, which solves for the robot's physical states, controls, and forces while enforcing real-world dynamics.
Structured Feedback: Crucially, if a plan fails at any stage, the system doesn't just scratch it. It isolates the exact mode where the physics broke down and generates structured text feedback (e.g., "IK infeasible at mode 5"), sending it back to the LLM.

This closed-loop text feedback allows the LLM to act like an engineering intern, iteratively debugging and refining its contact plans rather than flailing blindly. According to the study, this iterative search framework found valid, low-cost solutions within a few minutes across all scenarios, vastly outperforming single-prompt LLM generation.

Complex Tasks and Hardware Deployment

The researchers evaluated MotionDisco across eight highly demanding, long-horizon tasks. These ranged from a "Banana" task—where the robot had to stack an object onto a table and climb on top of it to reach a high-hanging target—to navigating through cluttered corridors and climbing over tables while carrying cargo.

Because the evolutionary algorithm maintains multiple "islands" of populations, a single search run naturally yielded highly diverse solutions to the same problem. For instance, in parkour tasks, the system discovered multiple valid footwork patterns and alternative hand-support configurations without requiring human intervention or re-prompting.

To prove the trajectories weren't just simulated hallucinations, the team trained reinforcement learning tracking policies on the discovered paths and deployed them zero-shot onto a physical Unitree G1 humanoid. On real hardware, the robot successfully reproduced the discovered whole-body maneuvers, safely adjusting its posture to reach into low-clearance spaces and using its arms to stabilize itself while climbing.

Beyond the technical hurdles, the project highlights a glaring irony in the current robotics ecosystem. The software framework is a purely Western academic triumph, bridging institutions in Germany and the United States. Yet, its validation relies entirely on the Unitree G1—a highly capable, $16,000 Chinese-manufactured platform.

As American lawmakers aggressively advance the bipartisan GUARD Act of 2026 to ban Chinese robotics over national security concerns, software breakthroughs like MotionDisco face an uncertain future. If the legislation passes, U.S.-based institutions like NYU and CMU may find themselves legally severed from the very hardware baseline that makes their algorithmic discoveries reproducible. For now, MotionDisco proves that humanoids can think outside the box of human demonstration—even if the physical box they inhabit remains a geopolitical lightning rod.