Beyond the Gala: OmniXtreme Breaks the Generality Barrier for Humanoid Agility

A silver and black Unitree G1 humanoid robot with a glowing blue digital face performing a deep pistol squat on a tan foam floor mat. It is balanced on its right leg while extending the left leg horizontally forward, demonstrating high-dynamic stability in a studio setting. — Breaking the generality barrier: A Unitree G1 executing alternating pistol squats, a high-difficulty behavior managed by the unified OmniXtreme policy. This "extreme balance" demonstration highlights the framework's ability to maintain high-fidelity whole-body coordination during challenging single-support maneuvers.

The flashy martial arts routine that captivated audiences at the 2026 Spring Festival Gala may have been a milestone for public interest, but for the researchers behind the hardware, it highlighted a persistent technical limitation. As Siyuan Huang, a research scientist at the Beijing Institute for General Artificial Intelligence (BIGAI), noted on X, many of the high-dynamic motions seen on stage are the result of "overfitted tracking policies"—controllers fine-tuned for a single specific sequence rather than a general capability.

To address this, BIGAI and Unitree Robotics have introduced OmniXtreme, a scalable framework that allows a single, unified policy to execute a diverse library of extreme behaviors. The system enables a Unitree G1 robot to seamlessly transition between backflips, acrobatic rolls, and complex breakdancing maneuvers without the performance collapse typically seen when scaling motion libraries.

Cracking the "Generality Barrier"

In humanoid robotics, there is a long-standing trade-off between fidelity and scalability. As the library of reference motions grows more diverse, traditional reinforcement learning (RL) policies often struggle with "gradient interference," leading to "average" or conservative movements that fail during high-dynamic execution.

OmniXtreme overcomes this by identifying and decoupling two distinct bottlenecks: the learning bottleneck in simulation and the physical executability bottleneck on hardware.

Siyuan Huang

@siyuanhuang95

·Follow

You might have seen the WuBOT performing at the 2026 Spring Festival Gala; however, most high-dynamic extreme motions you see are executed by overfitted tracking policies. Until now, training a unified policy capable of performing various extreme motions with a high success rate

Watch on X

4:23 PM · Mar 2, 2026

755

Read 33 replies

Stage 1: Scalable Flow-based Pre-training

The first stage replaces standard MLP-based RL with a high-capacity flow-matching policy. By using a specialist-to-unified distillation process, the researchers "trained" the unified policy to imitate a collection of individual motion experts. This generative approach allows the model to capture a vast range of heterogeneous behaviors without the optimization interference that plagues from-scratch RL.

Stage 2: Actuation-Aware Post-training

While the flow-based policy provides the "brain" for movement, real-world physics remains the ultimate "litmus test". High-dynamic motions often push actuators to their limits, triggering unmodeled nonlinearities like torque-speed losses or overcurrent protection.

OmniXtreme introduces a residual reinforcement learning phase. Instead of relearning the motion, a lightweight MLP policy is trained to produce corrective actions that account for realistic hardware constraints, including torque-speed limits and power-safety regularization. This refinement ensures that a backflip that works in simulation remains physically executable on a real G1 robot.

Performance and Real-World Limits

The results of this two-stage approach are strikingly consistent. In 157 real-world trials, OmniXtreme achieved high success rates across a variety of skill categories:

Skill Category	Number of Motions	Success Rate (%)
Flip	7	96.36%
Martial Arts	3	93.33%
Handspring	5	88.57%
Breakdance	5	86.36%
Acrobatics	4	80.00%

The team reported that enforcing actuator torque-speed constraints was sufficient for impulsive motions like flips. However, more complex "contact-rich" skills, such as breakdancing, required aggressive domain randomization and power-safety penalties to survive the high braking loads and energy absorption during impact.

The Road to Universal Control

OmniXtreme enters an increasingly crowded field of high-scale humanoid control research. It follows NVIDIA's SONIC, which also aims for universal whole-body tracking , and Amazon's OmniRetarget and PHP framework, which focus on physical interactions and agile traversal.

Despite the success, the researchers acknowledge that the "reality gap" is not fully closed. Failures still occur during highly impulsive landings where transient loads trigger battery undervoltage or motor overcurrent events—factors often missing from current simulators.

As China moves toward a national standard system for humanoid robotics, the "showmanship" of backflips is becoming a proving ground for the stability required for future industrial labor. By releasing the model checkpoints, the BIGAI and Unitree team is inviting the community to push these "robot gymnasts" even further.