The MuZero algorithm is a model-based reinforcement learning agent that extends AlphaZero by learning a compressed, internal latent dynamics model to predict rewards, policy (action probabilities), and state transitions. This allows it to perform planning with Monte Carlo Tree Search (MCTS) in environments where the true rules or dynamics are unknown, effectively mastering games and sequential decision tasks from pixels or raw observations alone.
