Published on

The Dark Matter of Robotics: Generalist AI’s Andy Zeng on the Quest for Physical Commonsense

Physical interaction is the "dark matter" of robotics—unseen, omnipresent, and responsible for almost everything that actually works in the physical world. In a recent technical reflection, Generalist AI co-founder and chief scientist Andy Zeng argues that the industry's obsession with "LLM-pilled" reasoning ignores the fundamental ingredient of intelligence: physical commonsense.

This "commonsense" isn't the ability to describe a task, but the reactive, closed-loop intuition for forces, friction, and uncertainty that humans use to adjust mid-action. While the industry remains divided over how to solve the "physical AI bottleneck," Zeng suggests that this elusive intuition is finally starting to emerge from the same force that transformed language: scale.

Beyond the LLM Wall

Zeng’s thesis centers on a modern iteration of Moravec’s Paradox—the observation that high-level reasoning is computationally "cheap," while low-level sensorimotor skills are incredibly "expensive." While models trained on internet text can generate complex plans or code, they lack the "proprioception and consequence" required to handle a slipping book or a cluttered shelf.

"Studying the DMV manual online gives useful background knowledge," Zeng notes, "but it is not the same as the real experience of learning how to drive on the road." This echoes recent critiques from AMI Labs’ Yann LeCun, who has argued that current humanoid firms lack a path to general AI because they rely on text-based tokens rather than world models that understand reality through observation.

The Problem with Teleoperation

A significant hurdle in gathering the "right" data is the legacy of teleoperation. Zeng points out that traditional remote control often breaks the sensorimotor loop due to latency and unnatural interfaces, forcing operators into slow, deliberate "System 2" thinking. This results in stilted trajectories that, when used for training, lead to robots that are "jagged and slow."

To combat this, Generalist AI has developed lightweight, ergonomic handheld devices designed to capture "reflexes, micro-corrections, and real-time recovery." This focus on high-fidelity, reactive data is what separates Generalist's GEN-0 model from competitors. While firms like Figure are betting on massive datasets of human video, Zeng argues that the data must preserve the "loop" of sensing and acting to be effective.

This approach creates a fascinating technological contrast with Sunday Robotics. While Generalist utilizes handheld devices to ground robot data in human-like reflexes, Sunday Robotics built its entire system around a Skill Capture Glove (or UMI). By iterating on the glove 100 times before finalizing their robot, Sunday claims to have captured high-fidelity dexterity that allows their wheeled robot, Memo, to handle delicate tasks like folding socks or gripping wine glasses.

Emergent Intuition: "Doing Just a Little More"

The most compelling evidence for this "physical commonsense" comes from the behaviors emerging in Generalist’s latest models. Zeng highlights "moments of brilliance" where robots perform tasks they weren't explicitly programmed for:

  • Catching a slipping washer and double-nudging it into a tight slot.
  • Nudging a container away from a bin wall to make space for fingers to grasp it.
  • Recovering from a failed flap insertion in cardboard assembly using a secondary finger.

These behaviors suggest that large-scale robotics pretraining induces a "prior over contact-rich interaction." This aligns with Generalist's previous findings that models reaching a 7-billion parameter threshold undergo a "phase transition," beginning to internalize physical laws rather than just mimicking motions.

A New Era of Foundation Models

This shift from "programmed perfection" to "learned intuition" marks a definitive turn in the robotics arms race. While players like Skild AI focus on "omni-bodied" versatility and 1X Technologies leverages generative video to help robots "imagine" chores, Generalist is doubling down on raw physical interaction.

The ultimate goal, according to Zeng, is a blur between low-level interaction and high-level planning. By solving physical commonsense, Generalist hopes to build machines that don't just follow instructions, but react to the messy, adversarial, and unforgiving reality of the physical world. As the industry moves toward hardware-agnostic "skill" marketplaces, the "dark matter" of physical intuition may be the primary differentiator between a novelty and a truly useful tool.

Share this article

Stay Ahead in Humanoid Robotics

Get the latest developments, breakthroughs, and insights in humanoid robotics — delivered straight to your inbox.