Model Predictive Control (MPC) is an online, receding-horizon optimal control algorithm that repeatedly solves a finite-horizon planning problem using a dynamics model, executes only the first action from the optimized sequence, and then replans from the new state. This feedback loop compensates for model inaccuracies and environmental disturbances. In model-based reinforcement learning, MPC uses a learned transition model and reward model to simulate and evaluate potential future trajectories, selecting actions that maximize expected cumulative reward over the planning horizon.
