Published on

1X Unveils 1XWM: The Video-to-Action "Brain" That Lets NEO Imagine Its Chores

In a significant leap toward autonomous domestic labor, 1X Technologies has unveiled a major evolution of its 1X World Model (1XWM). No longer just a digital twin for testing, the new 1XWM serves as a generative "cognitive core" that allows the NEO humanoid to "imagine" a task through video generation before executing it in the physical world.

The update represents a shift in how the OpenAI-backed firm approaches robot intelligence. By leveraging "internet-scale" video data, 1X claims its robots can now generalize to tasks they have never seen in their specific training sets—a feat the company demonstrated by having NEO operate a toilet seat and steam a shirt without prior teleoperated demonstrations.

From "Expert Mode" to Self-Teaching

For the past year, 1X has been transparent about its human-in-the-loop strategy, where human "experts" guide robots via VR to collect training data. While critics viewed this as a sign of unfinished autonomy, 1X framed it as the necessary "bootstrapping" phase for its $20,000 NEO android.

The new 1XWM aims to complete that "flywheel". By pre-training on web-scale video and "mid-training" on 900 hours of egocentric human footage, 1X is teaching NEO to understand the "structural priors" of reality—how objects move and how humans apply force. This allows the robot to bridge the gap between human knowledge and robotic action, potentially moving away from a reliance on human-led data gathering. 1X CEO Bernt Børnich noted in a recent interview with Bloomberg Technology that this update allows for a "sensible approach" to novel tasks, such as picking a Post-it note off a board despite having no specific training data for that action.

How it Works: The Video-to-Action Pipeline

A white NEO humanoid robot standing in front of a white toilet, reaching out to lift the toilet seat while a human sits nearby watching.
Generalization in action: NEO uses its World Model to interact with a toilet, a task 1X says the robot had never performed or seen in its training data before. Image: 1X

Unlike standard Vision-Language-Action (VLA) models that predict motor commands from static images, 1XWM uses a high-fidelity video generation backbone.

The process involves two distinct stages:

  1. The World Model Backbone: A 14-billion parameter diffusion model predicts the visual future of a scene based on a text prompt (e.g., "pack this orange into the lunchbox").
  2. The Inverse Dynamics Model (IDM): This "bridges pixels to actuators," predicting the exact movements required for NEO’s tendon-driven anatomy to match the generated video frames.

"What the model can visualize, NEO usually can do," the company noted in a technical deep dive. This is made possible by NEO’s passive safety design, which ensures the robot’s physical interactions—like friction and contact—closely mimic the human motions seen in the training videos.

Børnich emphasized that this "embodiment" is why 1X develops its own models rather than using off-the-shelf AI from partners like NVIDIA. Because NEO's physical form is "kinematically congruent" with humans, it can directly apply the world dynamics inherent in internet video.

Visualizing the Safest Path

Safety is also being reframed as a cognitive capability. Børnich argues that the 1XWM allows the robot to reason about physical risks by visualizing potential failures before they happen.

"You can not only ask for how am I going to do this task, but how am I going to do this task in a manner that is as safe as possible?" Børnich told Bloomberg. By visualizing paths where something might go wrong, the model can actively choose the least risky trajectory. This software layer complements the hardware's inherent safety—NEO is designed to be soft and low-energy so that if it does fail, "the world is still okay".

Generalization in the Wild

The most compelling aspect of the 1XWM is its ability to handle "out-of-distribution" scenarios. In launch demonstrations, NEO was prompted to interact with a toilet—a task the company says it had never performed or seen in its robot data.

NEO robot in a dimly lit, chaotic room with a toilet seat swinging from the ceiling and a human sitting to the side.
Handling chaos: 1X demonstrates how the World Model allows NEO to maintain its task focus even during extreme, unpredictable environmental changes like rapid lighting shifts and swinging obstacles. Image: 1X

By visualizing how a human might lift a toilet seat based on internet video, the 1XWM generated a rollout that the robot then successfully executed. This suggests that 1X is successfully tapping into the vast "common sense" knowledge embedded in human-centric video datasets, rather than relying solely on costly, hand-labeled robot data.

The Inference Bottleneck

Despite the breakthrough, 1X is candid about the current limitations. The 1XWM currently takes roughly 11 seconds of multi-GPU compute to generate a 5-second real-time video rollout. This latency means the robot is not yet reactive in high-speed environments; it must "think" before it acts.

Success rates also vary wildly by task. While NEO achieves reliability in "grabbing chips" or "steaming a shirt," more complex tasks like "pouring cereal" or "drawing a smiley face" still hover near a 0% success rate. The company also noted that relying on a single camera (monocular vision) can lead to "weak 3D grounding," where the robot might overshoot an object because the generated video didn't perfectly capture depth.

Scaling Toward the Home

This update arrives as 1X prepares for its massive commercial rollout with EQT, aiming to deploy thousands of units by 2026.

By integrating 1XWM with its Redwood AI brain, 1X is betting that a robot that can teach itself through "imagined" experience will scale faster than competitors relying on manual programming. As Børnich puts it, this marks a "true paradigm shift," where the robot's own generated data becomes the engine for its future intelligence.

Share this article

Stay Ahead in Humanoid Robotics

Get the latest developments, breakthroughs, and insights in humanoid robotics — delivered straight to your inbox.