The exploration-exploitation tradeoff is the fundamental decision-making dilemma where an agent must choose between exploring new, uncertain actions to gather information and exploiting known actions that yield high rewards. In Monte Carlo Tree Search (MCTS), this tradeoff is mathematically managed during the selection phase by policies like Upper Confidence Bound for Trees (UCT), which balances the estimated value of a node against its uncertainty, quantified by its visit count.
