Model-Based Policy Optimization (MBPO) is a reinforcement learning algorithm that trains an agent by generating synthetic experience from a learned dynamics model and using that data to optimize a policy with model-free algorithms like Soft Actor-Critic (SAC). Its core innovation is limiting imagined rollouts to a short horizon to control compounding error, while still providing sufficient data for sample-efficient policy improvement. This hybrid approach aims to combine the data efficiency of model-based planning with the asymptotic performance of model-free optimization.
