Offline Model-based Reinforcement Learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies, especially in scenarios where exploration is costly or infeasible. Nevertheless, its performance often suffers from the objective mismatch between model and policy learning, resulting in inferior performance despite accurate model predictions. This paper initially identifies the primary source of this mismatch as the distribution shift present in offline data. Subsequently, we introduce BilinEar CAUSal rEpresentation~(BECAUSE), a novel algorithm to capture causal representation for both states and actions to reduce the influence of the distribution shift, thus fixing the objective mismatch problem. Empirical evaluations across 18 tasks varying in data quality demonstrate the superior performance of BECAUSE over existing offline MBRL algorithms. Additionally, we offer a theoretical analysis of BECAUSE within the framework of a causal world model and pessimistic planning approach, affirming its capability and efficiency in integrating causal representation into offline MBRL.
Supplementary notes can be added here, including code and math.