Upper Confidence Bound for Trees (UCT) is a specific formula applied during the selection phase of Monte Carlo Tree Search. It guides the traversal from the root node to a leaf node by treating each node as a multi-armed bandit problem. The policy mathematically balances two competing objectives: exploiting nodes with a high average reward from previous simulations and exploring nodes that have been visited less frequently to reduce uncertainty in their value estimates.
