- Published on
Build AI Open-Sources 10,000 Hours of Factory-Worker Video to Scale Robot Learning

Build AI has released Egocentric-10K, a massive 10,000-hour dataset of first-person video, positioning it as the "largest egocentric dataset in history." The data, captured from 2,153 workers in real factory settings, was announced by Build AI's Eddy Xu on X.
The dataset, which comprises 1.08 billion video frames, is a significant contribution to the industry's race to solve the "physical AI bottleneck"—the critical lack of real-world data needed to train capable, general-purpose robots.
In his announcement, Xu stated that the "era of data scaling in robotics is here," framing egocentric video as a "general learning framework that passively captures how skilled workers do their jobs."
Inside Egocentric-10K
According to the dataset's technical details, released on Hugging Face, Egocentric-10K is the first dataset of its kind collected exclusively within real factories. The 16.4 TB dataset is licensed under the permissive Apache 2.0 license, making it widely available for researchers.
Key statistics of the dataset include:
- Total Duration: 10,000 hours
- Participants: 2,153 factory workers
- Total Frames: 1.08 billion
- Format: 1080p, 30fps H.265 video
- Audio: No audio is included.
- Collection: Captured using Build AI's proprietary "Gen 1" head-mounted monocular camera.
Critically, Build AI claims the dataset features a "higher density of visible hands and manipulation" than previous "in-the-wild" datasets. This focus on hands-on, skilled labor is vital for training models to perform complex manipulation tasks.
A Bet on Human Data
This release solidifies Build AI's strategic bet, placing it in a camp similar to Figure's "Project Go-Big", which also relies on massive-scale human video to achieve "zero-shot human-to-robot transfer." This strategy contrasts with competitors like Generalist AI, which is focusing on data collected from real-world robot interactions.
Build AI, founded in 2025, has a "singular focus on scaling economically useful egocentric human data," according to its website. The company manufactures its own recording devices and deploys them in businesses globally to achieve this.
While the company's mission is to "build physical superintelligence," it is candid about the difficulty, stating its research "has significant technical risk and low probability of success."
By open-sourcing Egocentric-10K, Build AI is providing a massive new resource to the academic community to "unlock the next order-of-magnitude" for research. The challenge, as industry leaders like Unitree's CEO have noted, remains turning this vast amount of data into models that can reliably generalize skills in new, unstructured environments.
Share this article
Stay Ahead in Humanoid Robotics
Get the latest developments, breakthroughs, and insights in humanoid robotics — delivered straight to your inbox.