Inferensys

Glossary

Selection (MCTS Phase)

Selection is the first phase of Monte Carlo Tree Search where the algorithm traverses the existing tree from the root node to a leaf node by recursively choosing child nodes according to a tree policy, typically UCT.
Developer reviewing semantic search engine results on laptop, relevance scores visible, technical search demo.
MONTE CARLO TREE SEARCH

What is Selection (MCTS Phase)?

The first and defining phase of the Monte Carlo Tree Search algorithm where the existing search tree is traversed.

Selection is the initial phase of a Monte Carlo Tree Search (MCTS) iteration where the algorithm traverses the existing search tree from the root node (current state) to a leaf node by recursively choosing child nodes according to a tree policy. The primary objective is to navigate the tree to a point where further expansion and simulation are beneficial, balancing the exploration-exploitation tradeoff to efficiently allocate computational resources. This phase is guided by node statistics, primarily visit count and cumulative reward, which are updated during backpropagation.

The canonical tree policy for selection is the Upper Confidence Bound for Trees (UCT) formula, which mathematically balances exploiting nodes with high average reward and exploring nodes with high uncertainty (few visits). Other policies include Progressive Widening for large action spaces. The selection phase concludes when the traversal reaches a node that is either not fully expanded (a leaf) or terminal, setting the stage for the subsequent expansion and simulation phases to gather new information from that point in the state space.

MCTS PHASE

Key Components of the Selection Phase

The Selection phase is the initial traversal step in a Monte Carlo Tree Search iteration, where the algorithm navigates from the root to a leaf node using a tree policy to balance exploration and exploitation.

01

Tree Policy

The tree policy is the core decision function used during the Selection phase to choose which child node to traverse at each step. It defines the algorithm's strategy for navigating the existing search tree.

  • The canonical policy is the Upper Confidence Bound for Trees (UCT), which mathematically balances the exploration-exploitation tradeoff.
  • Other policies include UCB1, Thompson sampling, or domain-specific heuristics for problems like continuous action spaces.
  • The policy's output determines the path from the root to a leaf node that has not been fully expanded.
02

Upper Confidence Bound for Trees (UCT)

Upper Confidence Bound for Trees (UCT) is the standard formula used as the tree policy in MCTS. It selects the child node j of parent node i that maximizes:

UCT(i, j) = Q_j / N_j + c * sqrt( ln(N_i) / N_j )

  • Q_j / N_j: The exploitation term, representing the average reward (value) of child j.
  • c * sqrt( ln(N_i) / N_j ): The exploration term, which encourages sampling less-visited nodes. The constant c controls the exploration weight.
  • This formula ensures that promising nodes are revisited (exploitation) while under-sampled nodes are eventually tried (exploration), guaranteeing asymptotic convergence to the optimal action.
03

Node Statistics (Visit Count & Value)

Each node in the MCTS tree maintains key statistics that are read and updated during Selection and Backpropagation.

  • Visit Count (N): The number of times the node has been traversed during the Selection phase. This is the denominator in the UCT formula and is crucial for guiding exploration.
  • Cumulative Value (Q or W): The sum of all simulation results (rewards) that have been backpropagated through this node. The average value Q/N represents the node's estimated utility.
  • These statistics are stored in memory and are continuously refined as the search progresses, forming the algorithm's "knowledge" about the state space.
04

Traversal to a Leaf Node

The Selection phase's objective is to traverse from the root node (current state) to a leaf node. A leaf node is defined as either:

  1. A terminal node (a game-ending state).
  2. A non-terminal node that has not yet been expanded (i.e., not all its possible child actions have been added to the tree).
  • The traversal is recursive: starting at the root, the tree policy selects one child, moves to it, and repeats until a leaf is reached.
  • This process focuses computational effort on the most promising regions of the vast state space, as informed by the updated node statistics.
05

Exploration vs. Exploitation

The exploration-exploitation tradeoff is the fundamental dilemma managed by the Selection phase. The tree policy must decide between:

  • Exploitation: Choosing the child node with the highest average value (Q/N). This refines the estimate of the currently best-known path.

  • Exploration: Choosing a less-visited child node (low N). This gathers new information, which may reveal a better path than the current best estimate.

  • The UCT formula provides a principled, optimal balance. Early in search, exploration dominates. As visit counts grow, exploitation of high-value nodes becomes more prominent.

06

Handling Large Action Spaces

For environments with vast or continuous action spaces, naive Selection over all possible children is infeasible. Advanced techniques modify the phase:

  • Progressive Widening: The number of child nodes considered for a parent is not fixed. It starts small and increases gradually as the parent's visit count N_i grows (e.g., K * sqrt(N_i)). New actions are dynamically added during Selection/Expansion.
  • Double Progressive Widening: Extends this concept to both action widening (adding new actions) and state widening (adding new outcome states for stochastic environments).
  • These methods ensure the tree grows in a tractable, focused manner, prioritizing depth and evaluation of promising actions over breadth.
SELECTION (MCTS PHASE)

Frequently Asked Questions

The Selection phase is the critical first step in each Monte Carlo Tree Search (MCTS) iteration, where the algorithm navigates the existing tree to choose a promising leaf node for further exploration. These questions address its core mechanics, policies, and engineering considerations.

The Selection phase is the initial step in a Monte Carlo Tree Search (MCTS) iteration where the algorithm traverses from the root node (current state) to a leaf node by recursively selecting child nodes according to a tree policy, most commonly the Upper Confidence Bound for Trees (UCT) formula. This phase balances exploration of less-visited paths with exploitation of paths known to yield high rewards, efficiently guiding the search toward the most promising regions of the decision tree without exhaustive expansion. Its output is a leaf node where the next phase, Expansion, will occur.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.