Expansion is the phase in Monte Carlo Tree Search (MCTS) where one or more child nodes are added to the selected leaf node, thereby growing the search tree based on the legal actions available from that state. This step transitions the algorithm from traversing the existing tree to exploring new, previously unevaluated states. It typically occurs after the selection phase has identified a promising leaf node that has not been fully expanded. The new child nodes represent the immediate possible future states, making them candidates for subsequent simulation.
Glossary
Expansion (MCTS Phase)

What is Expansion (MCTS Phase)?
The expansion phase is a critical step in the Monte Carlo Tree Search algorithm where the search tree is strategically grown to explore new possibilities.
The process is governed by an expansion policy, which determines how many and which child nodes to create. In its simplest form, a single, unvisited child is added per iteration. For problems with vast action spaces, techniques like progressive widening are used to gradually expand the node's branching factor. This phase is essential for balancing the exploration-exploitation tradeoff, as it directly introduces new paths for the algorithm to evaluate, preventing it from becoming stuck in a shallow local optimum of the search space.
Key Mechanisms in the Expansion Phase
The Expansion phase is where the MCTS tree physically grows. This section details the specific algorithms and heuristics that determine how new child nodes are created from a selected leaf.
Action Space Enumeration
The core function of expansion is to enumerate the legal actions available from the leaf node's state. This creates the set of potential child nodes.
- In a board game like Go, this involves listing all empty intersections.
- For a planning agent, it involves listing all valid API calls or sub-task initializations.
- The complexity of this enumeration directly impacts the branching factor of the search tree.
Progressive Widening
A critical technique for managing large or continuous action spaces. Instead of expanding all legal actions at once, the algorithm gradually adds child nodes as the parent node is visited more.
- A common rule: only add a new child when
visit_count(parent) ^ (1/expansion_coefficient)increases. - This prevents the tree from becoming impractically wide early in the search, focusing computational budget on the most promising branches first.
- It is essential for scaling MCTS to real-world problems like robotic motion planning or chemical design.
Prior Knowledge Integration
In advanced architectures like AlphaZero, expansion is guided by a neural network. When a new child node is created for action a, it is initialized with:
- Prior Probability (P(s, a)): A value from the network's policy head, estimating the likelihood that a is the best move from state s. This biases future selection.
- Initial Value Estimate: Often set to the network's value head output for the parent state or zero, to be updated during backpropagation.
- This replaces random or uniform initialization, dramatically increasing search efficiency.
Single vs. Multiple Expansion
A design choice in the MCTS loop: how many child nodes to add per iteration.
- Standard Single Expansion: Adds one child node per iteration (the first unvisited action from the selected leaf). This is the classic approach.
- Multiple Expansion: Adds all unvisited child nodes at once. This can speed up initial exploration but increases memory usage and may lead to a less focused search if not paired with strong priors.
- The choice affects the trade-off between breadth and depth of exploration in the early stages.
Terminal State Handling
Expansion is skipped if the selected leaf node is a terminal state (e.g., game over, goal achieved).
- The algorithm proceeds directly to the Simulation phase, which in this case is a null operation, and then Backpropagation with the terminal reward.
- Correct detection of terminal states is crucial to avoid attempting illegal expansions and to ensure value estimates are accurate.
- In adversarial settings, the terminal reward is often +1 (win), 0 (draw), or -1 (loss).
Connection to Neural MCTS
In AlphaZero and MuZero, expansion is deeply integrated with learned models.
- AlphaZero: The network suggests priors P(s, a) and a value v for the current state s during expansion.
- MuZero: The network's dynamics function is used. After choosing an action a to expand, the function predicts the next latent state s' and immediate reward r. The new child node represents this predicted state s'.
- This allows planning in environments where the true rules (transition model) are unknown, using the learned model as a substitute.
Frequently Asked Questions
Common technical questions about the Expansion phase in Monte Carlo Tree Search, where the algorithm grows its decision tree by adding new child nodes to explore possible future states.
The Expansion phase is the step in a Monte Carlo Tree Search (MCTS) iteration where one or more child nodes are added to the selected leaf node, thereby growing the search tree based on the legal actions available from that game or planning state. It follows the Selection phase and precedes the Simulation (Rollout) phase. The primary function of expansion is to incrementally explore the state space by converting a leaf node from the existing tree into an internal node with new, unexplored child nodes. This growth is controlled; typically, only a single new child is added per iteration to manage memory and computational cost, though techniques like progressive widening can add more over time. The new child node(s) are initialized with prior statistics (like a visit count of zero and an initial value estimate) before a simulation is run from that new state.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
The Expansion phase is one of the four core steps in a Monte Carlo Tree Search iteration. It is preceded by Selection and followed by Simulation and Backpropagation. Understanding related concepts clarifies its role in the overall planning algorithm.
Selection (MCTS Phase)
Selection is the phase immediately before Expansion, where the algorithm traverses the existing tree from the root to a leaf node. It uses a tree policy, most commonly the Upper Confidence Bound for Trees (UCT), to balance exploring less-visited paths and exploiting paths with high estimated value. The chosen leaf node becomes the target for the subsequent Expansion phase.
Simulation (Rollout)
Simulation, or rollout, is the phase that follows Expansion. Starting from the newly added child node (or the selected leaf if expansion was skipped), a fast playout policy (often random) is used to simulate a trajectory to a terminal state. The outcome of this simulation (win/loss, score) is the sample used to evaluate the expanded node's potential.
Progressive Widening
Progressive widening is a critical technique for managing Expansion in domains with vast or continuous action spaces. Instead of expanding all possible actions at once, the algorithm gradually increases the number of child nodes for a parent node as its visit count grows. This ensures computational resources are focused on promising areas of a large search space.
- Key Mechanism: The number of children considered is a function
f(N), whereNis the parent's visit count. - Use Case: Essential for robotics planning, continuous control, and games with many legal moves.
Upper Confidence Bound for Trees (UCT)
UCT is the canonical formula that guides the Selection phase, directly influencing which leaf node is chosen for Expansion. It balances:
- Exploitation: Favoring children with a high average reward.
- Exploration: Favoring children that have been visited less frequently.
The selected node's statistics, updated via Backpropagation, determine the UCT values for future selections, creating a feedback loop that efficiently grows the tree.
Neural Monte Carlo Tree Search
In advanced architectures like AlphaZero, the Expansion phase is guided by a neural network. When a leaf node is selected for expansion, a policy network suggests a probability distribution over possible actions, and a value network provides an initial estimate of the state's worth. This replaces random or heuristic-based expansion, allowing the algorithm to build a much more informed and efficient search tree from the outset.
Tree Policy vs. Playout Policy
Understanding these two policies is essential for differentiating the Expansion and Simulation phases.
- Tree Policy: Used during Selection and Expansion. It is computationally expensive and aims for optimal growth of the search tree (e.g., UCT). It decides which nodes to add to the tree.
- Playout Policy: Used during Simulation. It is designed to be very fast to allow many rollouts. It decides how to simulate from an expanded node to a terminal state, often using random actions or a simple heuristic.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us