Backpropagation (MCTS Phase) is the algorithm's learning mechanism where the reward or outcome from a completed simulation (rollout) is propagated backward through the sequence of nodes visited during the preceding selection phase. This process updates two key statistics stored in each node: the visit count (N), incremented by one, and the cumulative reward (Q), which is adjusted by the simulation's result. These updated statistics directly inform future selection decisions via policies like Upper Confidence Bound for Trees (UCT).
