Published on

HumanX: The AI Framework Turning Unitree’s G1 into a Basketball Pro

Three vertical panels showing a Unitree G1 humanoid robot performing complex basketball maneuvers: left, a mid-air jumpshot; center, jumping to defend against a human player; and right, executing a layup into a basketball hoop.
Using the HumanX framework, the Unitree G1 acquires agile basketball skills—including jumpshots and reactive defense—learned directly from human video demonstrations without the need for task-specific reward engineering.

If you have been on social media lately, you may have seen a small, agile humanoid executing a jumpshot with surprising grace. That robot is the Unitree G1, and its suddenly improved "hoop dreams" are the result of HumanX, a new full-stack framework designed to translate raw human video into complex humanoid interaction skills.

Developed by researchers from The Hong Kong University of Science and Technology (HKUST) and the Shanghai AI Laboratory, HumanX aims to solve one of the most stubborn bottlenecks in robotics: the "data scarcity" problem. Traditionally, teaching a robot to play basketball required either thousands of hours of manual teleoperation or months of "reward engineering," where human experts meticulously code mathematical scores for every tiny movement. HumanX throws that playbook out, learning directly from monocular video of humans in action.

The Secret Sauce: XGen and XMimic

The framework operates through two synergistic components that bridge the gap between human pixels and robotic motors.

XGen: Synthesizing the Data

The first stage, XGen, acts as a data factory. It takes a single video of a human—say, someone kicking a football or shooting a layup—and extracts the motion. However, instead of just copying the human's "pose," it uses physics-based priors to synthesize what that interaction should look like for a robot's specific body.

Crucially, XGen supports "interaction augmentation." From one single video of a person lifting a box, it can generate synthetic training data for the robot lifting boxes of different sizes, at different heights, and from different angles. This provides the massive diversity of data needed for a robot to handle the "messiness" of the real world.

XMimic: Learning the Skill

The second stage, XMimic, is the brain. It uses a teacher-student training architecture to master the skills synthesized by XGen.

  • The Teacher: Trains in a physics simulator with "privileged" information (knowing exactly where the ball or object is at all times).
  • The Student: Is "distilled" from the teacher but must operate under real-world constraints, such as relying only on its onboard sensors.

Stay Ahead in Humanoid Robotics

Get the latest developments, breakthroughs, and insights in humanoid robotics — delivered straight to your inbox.

Playing "Blind": The Power of Proprioception

Perhaps the most impressive feat of HumanX is the No External Perception (NEP) mode. In the basketball videos, the G1 is often performing complex maneuvers—dribbling, pivots, and jumpshots—without any cameras or external sensors tracking the ball.

Instead, the robot relies on proprioception. By analyzing the internal torques and forces on its joints, the robot "feels" the ball. Much like a human can dribble a basketball with their eyes closed by feeling the impact on their palm, the G1 uses its own motor feedback to maintain control. According to the researchers, this mode achieved a success rate of over 80% for complex maneuvers like the pump-fake turnaround.

Generalization Beyond the Script

While previous "imitation learning" attempts often resulted in robots that simply replayed a recorded motion (failing if an object was moved even slightly), HumanX demonstrates significant emergent behavior.

In laboratory tests across five domains—basketball, football, badminton, cargo pickup, and reactive fighting—the G1 showed it could adapt to human interference. In one demo, a researcher forcefully kicked the robot and stole its cargo; the G1 stabilized itself, walked to where the cargo was dropped, and autonomously re-grasped it. In a "fighting" scenario, the robot could distinguish between a human's feint and a real punch, choosing only to block or counter when an actual strike was detected.

Why This Matters

The leap here is one of scalability. By achieving over 8x higher generalization success than prior methods, HumanX suggests a future where robots don't need to be "programmed" for specific tasks. Instead, they can be "shown" what to do.

The team's research establishes a task-agnostic pathway for humanoid skill acquisition. If you want a robot to learn a new warehouse task or a household chore, you may soon only need to point a smartphone camera at a human doing the job. As humanoids like the Unitree G1 drop in price—now hovering around $16,000—the software to make them actually useful is finally catching up to the hardware.

Share this article

Stay Ahead in Humanoid Robotics

Get the latest developments, breakthroughs, and insights in humanoid robotics — delivered straight to your inbox.