
Research at the Frontier of Physical AI
Advancing next-gen intelligence
Researching the Technologies That Will Define Our Future: Autonomy and Robotics.
Led by Dr. Wei Zhan, our research team brings together experts from top institutions and companies, recognized for their contributions to both academia and industry — including eight Best Paper awards at premier conferences and journals such as CVPR and ICRA. Together, they are developing cutting-edge technology that powers next-generation physical AI.
World-action foundation model pre-training
Next-generation foundation models for physical AI require pretraining on well-balanced, grounded multimodal data — spanning egocentric action, vision, behavior, physics, and language — anchored to concrete tasks. We focus on:
- Feed-forward/generative 4D reconstruction and world foundation model for reactive generation of 4D world conditioned on ego action with high throughput
- Pretraining of world-action model with grounded modalities including vision, physics, and language
Reinforcement learning and foundation model post-training
Post-training of foundation models — including world-action and vision-language-action models — is essential for improving performance, ensuring safety, and aligning behavior in physical AI applications. We focus on the following research areas:
- Closed-loop reinforcement-learning-based post-training of foundation models supported by high-fidelity, scalable, high-throughput simulation constructed/learned from large-scale real-world data
- Self-play reinforcement learning supported by high-throughput simulation in combination with human data imitation toward robust and human-like physical AI even with low data.
Robot learning and data
Building capable robotic generalists presents unique data challenges: physics-aware modalities are essential, yet large-scale data is far harder to obtain than in driving applications. We focus on scalable robot learning paradigms — drawing from robot, human, and synthetic sources — and the data design methods needed to effectively leverage physics-aware modalities toward general-purpose robots.
S2GO: Streaming Sparse Gaussian Occupancy Prediction
Jinhyung Park, Chensheng Peng, Yihan Hu, Wenzhao Zheng, Kris Kitani, Wei Zhan
SPACeR: Self-Play Anchoring with Centralized Reference Models
Wei-Jer Chang, Akshay Rangesh, Kevin Joseph, Matthew Strong, Masayoshi Tomizuka, Yihan Hu, Wei Zhan
RAYNOVA: Geometry-Free Auto-Regressive 4D World Modeling with Unified Spatio-Temporal Representation
Yichen Xie, Chensheng Peng, Mazen Abdelfattah, Yihan Hu, Jiezhi Yang, Eric Higgins, Ryan Brigden, Masayoshi Tomizuka, Wei Zhan
Outstanding Paper Award, RIWM Workshop @ ICCV 2025
Learning to Drive is a Free Gift: Large-Scale Label-Free Autonomy Pretraining from Unposed In-The-Wild Videos
Matthew Strong, Wei-Jer Chang, Quentin Herau, Jiezhi Yang, Yihan Hu, Chensheng Peng, Wei Zhan
NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
Ishaan Singh Rawal, Shubh Gupta, Yihan Hu, Wei Zhan

Facility
Supported by world-class fleets, data, infrastructure, and tools, we address research topics such as reinforcement learning, 3D vision and generation, robot learning with large-scale human and synthetic data exploited in closed loop.
Various vehicle fleets with massive data to deploy our autonomy
Large fleets across various products globally to test and deploy autonomy, from autonomous cars in challenging urban scenarios to self-driving trucks and mining/construction vehicles for both onroad and offroad scenes. Significant amounts of data have been collected, and automatically processed by our data engine, enabling industrial-level AI research at scale.
Robot fleet and human data
Robot fleets with humanoids, mobile manipulators, and table-top dual-arm robots with cutting-edge, tactile-aware dexterous hands, as well as various human data collection devices such as MoCap, headset, and gloves.
Large-scale ML infrastructure and tools
As a world-leading tool provider for autonomous systems, our researchers are supported by various efficient tools, high-quality neural simulation and synthetic data at scale in closed loop, and ML infrastructure toward large-scale training with thousands or more of GPUs.
