Model-based offline RL is a reinforcement learning paradigm where an agent learns solely from a fixed, pre-existing dataset of environment interactions, without any further online exploration. The agent first learns a dynamics model (or world model) that predicts state transitions and rewards. This learned model then serves as a simulated environment for planning algorithms like Model Predictive Control (MPC) or for generating synthetic experience to train a policy via standard RL methods, aiming to overcome the data inefficiency of purely model-free offline RL.
