In a new interview, the Meta Chief AI Scientist and AMI Labs founder doubles down on world models, arguing that the massive data requirements of current humanoid demos prove the Vision-Language-Action (VLA) approach is a dead end.
Generalist AI CEO Pete Florence argues that terms like 'VLA' and 'World Model' are temporary crutches for the industry, revealing that GEN-1's 99% scratch-trained architecture is a bet on the eventual dominance of pure robotic data.
As "world models" become the dominant paradigm in robotics, the industry is grappling with a term that means everything and nothing. We map out the competing visions from LeCun, NVIDIA, 1X, and Tesla.
NVIDIA GEAR Lab has released DreamDojo, an open-source world model pretrained on a massive 44,000-hour dataset of human egocentric videos. By using "latent actions" to bridge the gap between human and robot movement, the model achieves zero-shot generalization and real-time controllability for teleoperation and planning.
Waymo has unveiled its new World Model, powered by Google DeepMind’s Genie 3, to simulate rare "long-tail" driving scenarios and edge cases, signaling a major shift toward generative world models in the race for physical AI.