WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotics

Westworld diagram: knowledge encoder and mixture of expert SSM models.

Abstract

Trajectory world models have emerged as a cornerstone of robotic dynamics learning, enabling more effective planning and control in complex environments. Recent studies have explored pre-training such models across diverse robotic systems, but they still face two major challenges – 1) scaling to a large number of heterogeneous robotic systems, and 2) failing to incorporate domain knowledge of robot morphology, which limits zero-shot generalization to previously unseen systems. To address these challenges, we introduce WestWorld, a knoWledge-Encoded Scalable Trajectory World model for diverse robotics. To address the challenge of scalability, WestWorld uses a system-aware Mixture-of-Experts (Sys-MoE) that routes inputs to specialized experts via a learnable system embedding. To enhance zero-shot generalization, we incorporate domain knowledge of robot physical structure through a structural embedding that aligns trajectory representations with morphological information. After pretraining on 89 environments spanning diverse morphologies in both simulation and real-world settings, WestWorld significantly outperforms state-of-the-art baselines in zero-shot trajectory prediction. Notably, it demonstrates strong scalability as the number of robotic environments increases.

Publication
ICML 2026
Click the Slides button above to demo Academic’s Markdown slides feature.

Supplementary notes can be added here, including code and math.

Related