Model-based reinforcement learning (MBRL) is a paradigm where an agent learns or is provided with an internal dynamics model—a predictive function that simulates the environment's response to actions. This model is then used for planning sequences of actions to maximize cumulative reward, often through methods like Monte Carlo Tree Search or trajectory optimization. The primary advantage over model-free RL is dramatically improved sample efficiency, as the model allows extensive internal simulation from limited real-world data.




