Generalist AI CEO Pete Florence argues that terms like 'VLA' and 'World Model' are temporary crutches for the industry, revealing that GEN-1's 99% scratch-trained architecture is a bet on the eventual dominance of pure robotic data.
As "world models" become the dominant paradigm in robotics, the industry is grappling with a term that means everything and nothing. We map out the competing visions from LeCun, NVIDIA, 1X, and Tesla.
NVIDIA GEAR Lab has released DreamDojo, an open-source world model pretrained on a massive 44,000-hour dataset of human egocentric videos. By using "latent actions" to bridge the gap between human and robot movement, the model achieves zero-shot generalization and real-time controllability for teleoperation and planning.
Waymo has unveiled its new World Model, powered by Google DeepMind’s Genie 3, to simulate rare "long-tail" driving scenarios and edge cases, signaling a major shift toward generative world models in the race for physical AI.
In a deep dive on the Robo Papers podcast, 1X Director of Evaluations Daniel Ho explains how "imagination" via video generation is allowing humanoids to perform zero-shot tasks with minimal robot-specific training data.