Humanoid Startup Foundation Pins Hopes on Decades of AI Research to Give Robots a Deeper Understanding of Physics

Abstract illustration of an AI mind — From raw data to reasoning: The abstract challenge of teaching robots to understand and adapt to the physical world. Foundation aims to use principles like Deep Variational Bayes Filters to give their humanoids a deeper, more adaptable intelligence.

Humanoid Startup Foundation Pins Hopes on Decades of AI Research to Give Robots a Deeper Understanding of Physics

San Francisco and Munich-based startup Foundation, which emerged in May 2023, is making bold claims in the burgeoning humanoid robotics field. Led by Sankaet Pathak, former CEO of the now-bankrupt fintech firm Synapse, Foundation aims to imbue its 'Phantom' humanoid robots with a more intuitive grasp of the physical world. Their chosen path? A hybrid AI approach heavily featuring state-based models rooted in the research of Prof. Dr. Patrick van der Smagt, who leads Foundation's AI research team. Central to this strategy are concepts closely related to Deep Variational Bayes Filters (DVBFs), a technique van der Smagt and collaborators pioneered.

The Scientist: Prof. Dr. Patrick van der Smagt

Prof. Dr. Patrick van der Smagt is a recognized figure in machine learning, robotics, and probabilistic AI. His career spans academia, including a professorship at the Technical University of Munich, and industry, notably as Director of the Volkswagen Group Machine Learning Research Lab. His research has consistently focused on intelligent systems that learn from and interact with complex, dynamic environments. Van der Smagt's work delves into probabilistic deep learning for time series modeling, optimal control, and robotics—an intellectual background that directly informed the development of DVBFs. Foundation highlights that the AI principles they are employing are based on about a decade of his research.

Prof. Patrick van der Smagt, Head of AI Science, Foundation

The Science: Deep Variational Bayes Filters (DVBFs)

First introduced around 2016-2017 by Karl, Soelch, Bayer, and van der Smagt, Deep Variational Bayes Filters are a class of probabilistic models designed for unsupervised learning of state-space models from raw, high-dimensional sequential data, like video feeds. DVBFs kicked off a family of latent world-models (e.g., RSSM in PlaNet & Dreamer) now powering many state-of-the-art robot demos. In essence, DVBFs aim to allow a system to learn the underlying dynamics of an environment—the "rules of motion" or an intuitive understanding of physics—without explicit programming for every scenario.

DVBFs combine the representational power of deep neural networks with the principled uncertainty handling of Bayesian methods. They work by:

Encoding Sensory Input: Raw sensor data (e.g., camera images) is compressed into a lower-dimensional latent state, a kind of abstract "mental map" of the world.
Bayesian Belief Updating: As new data arrives, the system updates its belief about this latent state, similar to how a Kalman filter operates but with the capacity to handle complex, non-linear relationships thanks to neural networks.
Learning Dynamics: Crucially, DVBFs are trained to predict future observations based on the current latent state and a learned transition model. This forces the latent state to capture not just what the world looks like, but how it behaves over time. Proponents argue this allows the model to learn "full-information latent states," including unobserved properties like velocity, which are vital for long-term prediction and control.

The key idea is that by learning a good model of how the world works, a robot could potentially adapt to new situations more readily, predict the consequences of its actions ('imagination'), and operate with less explicit training data compared to methods like pure reinforcement learning or behavior cloning.

In promotional materials, van der Smagt explains that DVBFs allow AI to "understand the why behind action" and "grasp the rules of the game, not just following the playbook," needing "much less data" and adapting "on the fly."

Foundation's Gambit: DVBFs for Humanoid Intelligence

Foundation's CEO, Sankaet Pathak, states the company combines imitation learning with these "state-based models" or "latent variable models" that "compress the physics dynamics kinematics of the task space." Pathak cites a proof-of-concept cube-handoff that trained in ~30 minutes; more complex skills will obviously demand longer or richer data. The goal is to create a "reasoning layer" that understands the physics of a task, rather than just mimicking actions or relying solely on reinforcement learning, which Pathak deems too tedious for the multitude of manipulation tasks a humanoid might face due to the complexity of defining reward functions.

Foundation believes this approach will give their 'Phantom' humanoids, targeted for initial customer deliveries in mid-2025, a competitive edge in efficiency, safety, and speed-to-autonomy. The company is initially focusing on industrial tasks in manufacturing and logistics.

Humanoids daily

@humanoidsdaily

·Follow

Here @sankaet explains the unique approach @foundation_robo is taking to teach robots how to interact with the world using DVBFs, a method rooted in research developed by @padsmagt over the past decade. Clip by @Scobleizer, subtitles added by us.

Watch on X

Humanoids daily

@humanoidsdaily

Now with subtitles! In this version, you can follow Prof. Patrick van der Smagt’s insights from @foundation_robo more easily. Discover how Deep Variational Bayes Filters (DVBFs) could give AI a sense of imagination and make it truly understand our world. #AI #Robotics

Watch on X

9:44 AM · May 10, 2025

Read 1 reply

The Hype and the Hurdles

The promise of robots that can intuitively understand and interact with the world is compelling. DVBFs and similar world models offer a theoretically sound path towards more data-efficient and adaptable robots. Academic research, such as the Dreamer algorithm (which shares conceptual similarities with DVBFs), has shown impressive sample efficiency in learning complex behaviors like walking.

However, the path from academic promise to robust, real-world industrial application is fraught with challenges:

Scalability and Complexity: While DVBFs performed well on simulated tasks in the original papers, scaling these models to the full complexity of a humanoid robot interacting with diverse, unstructured real-world environments is a significant engineering feat. The computational cost of training and running these models can be substantial.
Data Requirements: While claiming "less data," the quality and breadth of data remain critical. To build a general understanding of physics, a robot needs to experience a rich variety of interactions. Less than pure RL or giant Transformers, yes—but still hours to days of varied play data for broad physical intuition.
Validation of Claims: Foundation's specific implementation and the performance of its 'Phantom' robot are yet to be independently verified. The robotics community will be watching closely for demonstrations that substantiate their claims of superior adaptability and learning efficiency.
Fundraising and Financial Prudence: Foundation is reportedly seeking to raise $100 million at a $1 billion valuation. This ambitious target comes as Pathak's previous venture, Synapse, filed for bankruptcy in early 2024, a factor potential investors will undoubtedly scrutinize. The humanoid robotics sector is capital-intensive, and sustained financial backing will be crucial.

While the core science behind DVBFs is established, Foundation's success will hinge on its execution: translating this research into reliable software, integrating it with their proprietary hardware (including rolling contact gearboxes, Foundation says its custom rolling-contact gearboxes supply high-resolution torque feedback, feeding richer sensor data into the DVBF), and demonstrating tangible advantages in real-world industrial settings. The company's assertion that its approach will allow it to leapfrog competitors relying on different AI paradigms is a bold one in a rapidly evolving and competitive field.

If Foundation can successfully harness the power of DVBF-like models, they could indeed contribute to a new generation of more intelligent and adaptable humanoid robots. However, the claims require rigorous validation, and the journey is likely to be one of iterative development and overcoming significant engineering and financial hurdles.

Sources used in this article: