Westworld diagram: knowledge encoder and mixture of expert SSM models.Trajectory world models have emerged as a cornerstone of robotic dynamics learning, enabling more effective planning and control in complex environments. Recent studies have explored pre-training such models across diverse robotic systems, but they still face two major challenges – 1) scaling to a large number of heterogeneous robotic systems, and 2) failing to incorporate domain knowledge of robot morphology, which limits zero-shot generalization to previously unseen systems. To address these challenges, we introduce WestWorld, a knoWledge-Encoded Scalable Trajectory World model for diverse robotics. To address the challenge of scalability, WestWorld uses a system-aware Mixture-of-Experts (Sys-MoE) that routes inputs to specialized experts via a learnable system embedding. To enhance zero-shot generalization, we incorporate domain knowledge of robot physical structure through a structural embedding that aligns trajectory representations with morphological information. After pretraining on 89 environments spanning diverse morphologies in both simulation and real-world settings, WestWorld significantly outperforms state-of-the-art baselines in zero-shot trajectory prediction. Notably, it demonstrates strong scalability as the number of robotic environments increases.
Supplementary notes can be added here, including code and math.