Yann LeCun’s World Model Vision Gets a Leaner Engine: Introducing LeWorldModel

Just days after AMI Labs' recent $1.03 billion seed round signaled a massive industrial bet on "world models," the academic foundation for that shift is becoming clearer. A new paper, LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels, introduces a streamlined framework that could bridge the gap between high-level theory and the "messy" reality of robotics.

Co-authored by Turing Award winner Yann LeCun and researchers from Mila, NYU, and Samsung, LeWorldModel (LeWM) represents a significant refinement of the Joint Embedding Predictive Architecture (JEPA). While LeCun has long argued that the industry’s "LLM-pilled" consensus is a "dead end" for physical intelligence, LeWM provides a practical, stable, and remarkably efficient alternative for training robots to understand their environments.

Lucas Maes

@lucasmaes_

·Follow

JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: le-wm.github.io

Watch on X

2:00 PM · Mar 23, 2026

3.9K

Read 107 replies

Solving the "Collapse" Problem

The central challenge with JEPA-style world models is representation collapse. Without careful tuning, these models tend to ignore the complexity of the world, mapping different inputs to identical representations to "cheat" the prediction task.

Previous attempts to solve this, such as PLDM, relied on complex, "fragile" training objectives with up to seven different loss terms. Alternatively, models like DINO-WM avoided collapse by using massive, frozen pre-trained encoders—essentially outsourcing the model's "sight" to a foundation model trained on millions of internet images.

LeWM takes a different path. It is the first JEPA to train stably end-to-end from raw pixels using only two loss terms: a standard next-embedding prediction loss and a novel regularizer called SIGReg (Sketched-Isotropic-Gaussian Regularizer). SIGReg prevents collapse by enforcing that the latent embeddings match a Gaussian distribution, promoting feature diversity without the need for the "heuristic tricks" or auxiliary supervision that plague other models.

Leaner, Faster, and GPU-Friendly

Perhaps the most striking aspect of LeWM is its efficiency. While the AI industry is currently obsessed with scaling to trillion-parameter models, LeWM operates with just 15 million parameters.

Key performance metrics from the paper include:

Planning Speed: LeWM plans up to 48x faster than foundation-model-based world models like DINO-WM.
Hardware Accessibility: The entire model can be trained on a single GPU in just a few hours.
Control Success: On the Push-T robotics benchmark, LeWM achieved a 96% success rate, outperforming both PLDM and the more computationally expensive DINO-WM.

The model’s ability to "ignore pixel noise"—like the flickering of a light or the texture of a rug—allows it to focus on the underlying causal physics of a task. In "violation-of-expectation" tests, LeWM was able to reliably detect physically implausible events, such as an object suddenly teleporting, suggesting it had acquired a rudimentary sense of "common sense" physics.

The Path to Generally Useful Robots

LeCun has been a vocal critic of current humanoid firms, claiming they have "no idea how to make those robots smart enough to be useful". He argues that robots need to learn from high-bandwidth video data—much like a child or a house cat—rather than low-bandwidth text tokens.

LeWM appears to be a direct attempt to operationalize this philosophy. By providing a stable, task-agnostic "brain" that can be trained directly from sensory input, AMI Labs and its academic partners are positioning JEPA as the foundational layer for future robotics.

However, the researchers admit that hurdles remain. LeWM currently struggles in very simple environments (like "TwoRoom") where the data diversity is too low for its Gaussian regularization to function effectively. Furthermore, like all current world models, it is still restricted to relatively short planning horizons.

As AMI Labs moves from research to industrial implementation with partners like Toyota and Nvidia, the success of LeWM suggests that the "one more big breakthrough" needed for AGI might not come from bigger datasets, but from smarter, more stable architectures.

Comments

No comments yet. Be the first to share your thoughts!

Yann LeCun’s World Model Vision Gets a Leaner Engine: Introducing LeWorldModel

Solving the "Collapse" Problem

Leaner, Faster, and GPU-Friendly

The Path to Generally Useful Robots

Comments

Share this article

Stay Ahead in Humanoid Robotics

Most Read This Week