To Trust a Robot, You First Need a Test You Can Trust

Researchers at Iowa State University tests robot safety — Image: ISU News Service

As investment pours into humanoid robotics, a critical, unglamorous challenge is holding back real-world deployment: how do you prove, verifiably, that a robot is safe? While many focus on advancing AI, researchers at Iowa State University are focused on what they call "physical intelligence"—a robot's ability to master balance, grasping, and movement in unpredictable environments.

Bowen Weng, an assistant professor of computer science at ISU, argues this is a key barrier. "As humans, we often take our physical intelligence for granted," Weng said in a recent university news release. "But the truth is, it’s remarkable... Robots struggle with physical intelligence because it requires adapting to unpredictable environments, integrating sensory feedback in real time and mastering complex motor skills."

This challenge is at the heart of the industry's safety problem. For a bipedal robot, which is "dynamically stable" (meaning it must actively work just to stand up), a simple power-down isn't a failsafe; it's a catastrophic fall.

Now, new research from Weng and his colleagues highlights an even deeper issue: the methods used to test for these failures are themselves unreliable.

The 'Repeatability Crisis' in Robot Testing

In a paper presented at the 2025 IEEE International Conference on Robotics and Automation, Weng and co-authors from The Ohio State University and Transportation Research Center Inc. confront this testing gap directly.

The study, "Repeatable and Reliable Efforts of Accelerated Risk Assessment in Robot Testing," argues that for a test to be valid, it must be repeatable (giving similar results for the same robot over multiple trials) and reliable (working consistently across different robots from different vendors).

According to the paper, today's testing methods fail this basic standard. Even "accelerated" methods that use importance sampling to efficiently find rare failures are not provably repeatable. The researchers demonstrated how running the same advanced test 100 times on the same subject could lead to 100 different risk estimates, with the required testing effort (sample size) varying dramatically and unpredictably.

"It’s exciting to see robots work," Weng told ISU News Service. "But at the same time, there isn’t nearly enough effort being put into responsibly and safely achieving these developments."

A New Algorithm for Trustworthy Tests

The team's solution is a new testing framework (Algorithm 3) that formally integrates repeatability and reliability into the risk assessment process.

Instead of running tests until a runtime metric "converges"—which the paper shows is unreliable —the new algorithm operates using a predefined, finite number of samples. This ensures that the testing effort is bounded and the results are consistent across multiple attempts. The researchers validated the approach by assessing the "risk of instability from frontal impacts" on both a controlled pendulum and a 7-DoF bipedal robot.

As Weng noted, this is an "indirect" impact. The research doesn't make the robot itself safer, but it makes the testing algorithm used to certify the robot trustworthy.

Closing the 'Standards Gap'

This work lands at a critical moment for the industry. Standards bodies are currently racing to write the first-ever safety rules for humanoids, including the new ISO 25785-1 standard.

Weng's research directly addresses what an IEEE study group recently called a "critical bottleneck" for the industry: stability. That IEEE report noted that existing standards "have an 'unwritten assumption' that robots are statically stable," a fact that is simply not true for humanoids.

Developing reliable, standardized tests for dynamic stability is a prerequisite for widespread adoption. It's essential for:

Regulators: Who need to certify machines as safe for public and domestic spaces.
Insurers: Who are just beginning to underwrite this new category of risk and need verifiable data to model failure modes.
The Public: Who must be able to trust these machines.

Weng's work is set to continue this focus. He was recently awarded a grant from the National Institute of Standards and Technology (NIST) for a new project titled "Testing of Trustworthy Mobility: Standardized Performance Evaluation of Legged Robots Stability."

"Transparency is vital for fostering public trust, enhancing safety and helping us have meaningful discussions around the practical deployment and responsible use of robots,” Weng said. “The bottom line is, you have to be able to trust it, and the path to proving the trustworthiness of humanoid robots is through human-led research.”

To Trust a Robot, You First Need a Test You Can Trust

The 'Repeatability Crisis' in Robot Testing

A New Algorithm for Trustworthy Tests

Closing the 'Standards Gap'

Share this article

Stay Ahead in Humanoid Robotics