AlphaZero is a model-based reinforcement learning system that combines a deep neural network with a Monte Carlo Tree Search planner. The neural network, trained through self-play, provides both a policy (probability distribution over moves) and a value estimation (predicted game outcome) to guide the tree search. This tight integration allows it to evaluate millions of positions efficiently, focusing its search on the most promising branches identified by its learned model.
