Progressive widening is a technique used in Monte Carlo Tree Search (MCTS) for environments with large or continuous action spaces, where the number of child nodes considered for a parent node is gradually increased as the parent's visit count grows. Instead of expanding all possible actions at once—which is computationally infeasible—the algorithm starts with a small subset and intelligently adds new candidate actions over time, focusing computational budget on the most promising regions of the action space. This creates a scalable, adaptive search tree.
Glossary
Progressive Widening

What is Progressive Widening?
Progressive widening is a core technique for scaling Monte Carlo Tree Search to problems with vast or continuous action spaces.
The technique directly manages the exploration-exploitation tradeoff within the tree's expansion phase. Common implementations, like Double Progressive Widening, apply the logic separately to state nodes and action nodes. It is a foundational method for applying MCTS to domains like robotics control, continuous optimization, and complex resource allocation, where traditional discrete search would fail. It is closely related to adaptations for continuous action space MCTS.
Core Mechanisms of Progressive Widening
Progressive widening is a critical technique for applying Monte Carlo Tree Search (MCTS) to domains with vast or continuous action spaces. It manages computational complexity by dynamically controlling how the search tree expands.
The Core Principle: Gradual Expansion
Progressive widening dynamically controls the branching factor of the MCTS tree. Instead of expanding all possible child nodes from a parent at once—which is infeasible in large action spaces—it gradually increases the number of considered child nodes as the parent node's visit count grows. This creates a best-first search property where computational resources are focused on promising parts of the action space.
- Mechanism: A function
f(N(s))determines how many child actions to consider for a parent statesbased on its visit countN(s). A common choice isf(N) = k * N^α, wherekandαare hyperparameters. - Benefit: It prevents the search tree from becoming intractably wide in the early stages, ensuring simulations sample a diverse but manageable set of actions.
Double Progressive Widening
For problems with continuous state and action spaces, double progressive widening is used. It applies the widening logic to both the actions available from a state and the resulting states from taking an action.
- State Widening: When a new action
ais added to a node for states, the resulting next states'is not fully determined in a continuous domain. The algorithm maintains a set of sampled resultant states for the (s, a) pair, and this set also widens progressively as the (s, a) pair is visited more often. - Process: First, decide if a new action should be added (action widening). If a known action is selected, then decide if a new resultant state should be sampled for it (state widening). This two-layer approach manages the complexity of modeling stochastic transitions in continuous spaces.
Integration with Upper Confidence Bounds
Progressive widening is seamlessly integrated with the Upper Confidence Bound for Trees (UCT) formula. The UCT balance of exploration vs. exploitation is applied only to the subset of child actions currently available at a widened node.
- Selection within Widened Set: At a parent node
swithKcurrently widened child actions, the algorithm selects childausing:argmax_a( Q(s,a) + c * sqrt( log(N(s)) / N(s,a) ) ), whereQis the action value andcis an exploration constant. - Dynamic Competition: As
N(s)increases and more actions are added to the widened set, new actions immediately compete with existing ones using the UCT formula. This ensures newly discovered promising actions can quickly attract more visits.
Parameterization: The Widening Function
The behavior of progressive widening is governed by its widening function. The most common form is a polynomial function: the number of allowed actions K at a node with visit count N is K = ceil( C * N^α ).
- C (Coefficient): Controls the initial width. A higher
Callows more actions to be considered early, promoting exploration but increasing cost. - α (Exponent): Controls the growth rate. Typically
0 < α < 1. Anαclose to 0 leads to very slow widening (focus on few actions), while anαclose to 1 makes the tree widen almost linearly. - Tuning: These are critical hyperparameters.
α = 0.5is a common starting point, providing a balance between sub-linear growth and sufficient exploration.
Application: Continuous Control & Robotics
Progressive widening is essential for applying MCTS to continuous control problems like robotic manipulation and autonomous vehicle path planning, where actions are real-valued vectors (e.g., joint torques, steering angles).
- Example - Robotic Arm: The action space for a 7-DoF arm is a 7-dimensional continuous space. Naïve discretization is impossible. Progressive widening samples candidate torque vectors, focusing on regions of the space that lead to higher rewards (e.g., grasping an object).
- Use with Dynamics Models: It is often paired with a learned or analytical dynamics model during the simulation phase. The model predicts the next state
s'givensand a continuous actionasampled from the widened set.
Contrast with Fixed Discretization
Progressive widening provides a superior alternative to fixed discretization of large action spaces.
- Fixed Discretization: Pre-defines a finite set of actions (e.g., 100 possible steering angles). This can miss optimal actions that fall between discretization points and suffers from the curse of dimensionality.
- Adaptive Advantage: Progressive widening adaptively discretizes the space. It spends more computational effort refining the value estimates for actions in promising regions, effectively creating a non-uniform, resolution-on-demand discretization.
- Efficiency: It avoids wasting simulations on clearly poor areas of a vast action space, which a fixed uniform grid would inevitably sample.
How Progressive Widening Works in MCTS
Progressive Widening is a core technique for adapting Monte Carlo Tree Search to environments with vast or continuous action spaces, where the standard tree expansion strategy becomes computationally intractable.
Progressive Widening is a node expansion strategy for Monte Carlo Tree Search (MCTS) that dynamically controls the branching factor of the search tree. Instead of adding all possible child nodes at once, it gradually increases the number of considered actions for a parent node as that node's visit count grows. This creates a sparser, more manageable tree focused on the most promising regions of a vast action space, balancing thorough evaluation with computational feasibility.
The technique directly addresses the curse of dimensionality in planning. A common implementation, Double Progressive Widening, controls both the number of actions (action widening) and the number of resulting states (state widening) considered from a node. It is a foundational component for applying MCTS to domains like continuous control in robotics, high-dimensional strategy games, and complex automated planning systems, enabling effective search where exhaustive expansion is impossible.
Frequently Asked Questions
Progressive widening is a core technique for scaling Monte Carlo Tree Search to problems with vast or continuous action spaces. These FAQs address its core mechanics, implementation, and role within modern agentic systems.
Progressive widening is a technique used in Monte Carlo Tree Search (MCTS) to manage environments with large or continuous action spaces, where it is computationally infeasible to expand all possible child nodes from a parent node at once. Instead, the algorithm gradually increases the number of child nodes considered for a parent node as that parent's visit count grows. This creates a dynamic balance, focusing computational resources on the most promising parts of the search tree while maintaining the theoretical possibility of exploring any action given enough simulations.
In standard MCTS for games like Go, the number of legal moves (child nodes) is discrete and manageable. In contrast, a robot arm has a continuous range of joint angles, and a financial trading agent has a near-infinite set of possible order sizes and prices. Progressive widening makes MCTS tractable for these real-world, sequential decision-making problems by incrementally building a sparse, adaptive representation of the action space.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Progressive widening is a key technique within the broader Monte Carlo Tree Search (MCTS) framework for managing complex action spaces. These related concepts detail the core algorithm phases, parallelization strategies, and advanced enhancements that define modern MCTS implementations.
Monte Carlo Tree Search (MCTS)
Monte Carlo Tree Search (MCTS) is a heuristic, best-first search algorithm for optimal sequential decision-making under uncertainty. It does not require a positional evaluation function. Instead, it builds a search tree through four iterative phases:
- Selection: Traverse the tree from the root to a leaf node using a tree policy (e.g., UCT).
- Expansion: Add one or more child nodes to the leaf.
- Simulation (Rollout): Run a randomized playout from the new node to a terminal state.
- Backpropagation: Update the statistics (visit count, total reward) of all nodes along the traversed path with the simulation result. Its strength lies in its asymmetric growth, focusing computational effort on the most promising branches.
Upper Confidence Bound for Trees (UCT)
Upper Confidence Bound for Trees (UCT) is the canonical tree policy used during the selection phase of MCTS to balance exploration and exploitation. For a node (i), the formula for choosing a child (j) is: [ \text{UCT}(j) = \overline{X}_j + c \sqrt{\frac{\ln N_i}{N_j}} ] Where (\overline{X}_j) is the average reward (exploitation term), (N_i) is the parent's visit count, (N_j) is the child's visit count, and (c) is an exploration constant. The term under the square root encourages visiting less-sampled children. UCT is directly derived from solutions to the multi-armed bandit problem and is the foundation upon which progressive widening builds for discrete action spaces.
Continuous Action Space MCTS
Continuous Action Space MCTS refers to the family of adaptations required to apply MCTS to environments where the action space is continuous or extremely large and discrete. The standard UCT formula becomes impractical because enumerating all child nodes for expansion is impossible. Core techniques include:
- Progressive Widening: Dynamically adds new child actions to a node as its visit count increases.
- Hierarchical Action Discretization: Uses a coarse-to-fine strategy, where high-level actions are refined upon selection.
- Parameterized Action Sampling: Employs a learned or heuristic policy to sample promising continuous actions for expansion. These methods are essential for applying MCTS to domains like robotics control, continuous-game AI, and high-dimensional planning.
Rapid Action Value Estimation (RAVE)
Rapid Action Value Estimation (RAVE) is an enhancement to MCTS that accelerates value convergence, particularly in games like Go. It addresses the slow initial learning of the standard AMAF (All Moves As First) principle. RAVE shares simulation statistics across all nodes in the tree where a specific action was taken, not just along the unique path from the root. The RAVE value for an action (a) from a node is combined with its standard UCT value using a weighted average: [ Q_{\text{mix}} = \beta(s, a) \cdot Q_{\text{RAVE}}(a) + (1 - \beta(s, a)) \cdot Q_{\text{UCT}}(a) ] The weight (\beta) decays as the node's visit count increases, trusting the more accurate, path-specific UCT value over time. This is conceptually related to progressive widening in that both techniques modify how action values are aggregated and estimated to improve search efficiency.
Tree Parallelization (MCTS)
Tree Parallelization is a strategy for running MCTS on multi-core systems where multiple threads share and concurrently update a single, global search tree. This contrasts with root parallelization, which uses independent trees. The major challenge is managing contention when threads select the same node. The primary solution is virtual loss:
- When a thread selects a node during traversal, it temporarily adds a virtual loss (e.g., a negative reward) to that node's statistics.
- This artificially lowers the node's UCT score, making other threads less likely to select it, thereby encouraging parallel exploration of different branches.
- The virtual loss is removed and replaced with the true simulation result during backpropagation. Progressive widening must be implemented with thread safety in mind under this paradigm to avoid race conditions when adding new child nodes.
Neural Monte Carlo Tree Search
Neural Monte Carlo Tree Search is the hybrid architecture that integrates deep neural networks with MCTS, as pioneered by DeepMind's AlphaGo, AlphaZero, and MuZero. The neural networks guide and replace traditional MCTS components:
- Policy Network: Provides a prior probability distribution over actions for the expansion phase, replacing uniform random expansion. In AlphaZero, this prior is used in a modified PUCT (Polynomial Upper Confidence Trees) formula.
- Value Network: Predicts the expected outcome from a given state, providing a substitute or complement to the rollout phase, drastically reducing the need for long, random simulations. In this context, progressive widening can be managed by the policy network, which samples from a continuous distribution or provides a ranked list of high-probability actions for expansion, focusing the tree growth on the most promising regions of a vast action space.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us