At Sequoia Capital’s AI Ascent, Jim Fan argued that robotics is speedrunning the LLM playbook, moving from video-based pre-training to autonomous self-improvement.
NVIDIA GEAR Lab has released DreamDojo, an open-source world model pretrained on a massive 44,000-hour dataset of human egocentric videos. By using "latent actions" to bridge the gap between human and robot movement, the model achieves zero-shot generalization and real-time controllability for teleoperation and planning.
NVIDIA GEAR Lab has unveiled DreamZero, a 14-billion parameter World Action Model (WAM) that uses video diffusion to grant robots physical "imagination," enabling zero-shot task completion and rapid adaptation across different robotic embodiments.