Beam search is a best-first search algorithm that maintains a fixed-size set, the beam width (k), of the most promising partial solutions at each step of the search tree. It expands only the k best nodes according to a heuristic evaluation function, pruning all others to manage computational cost. This makes it a memory-efficient and time-efficient approximation of breadth-first search, trading optimality for speed in vast state spaces like those encountered in machine translation and sequence generation tasks.
Glossary
Beam Search

What is Beam Search?
Beam search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set, the beam width, at each level.
The algorithm operates level-by-level, generating all successors for the current beam, scoring them, and selecting the top k for the next iteration. It is fundamentally tied to the exploration-exploitation tradeoff, where a wider beam explores more but costs more. In Tree-of-Thought reasoning and agentic planning, beam search enables parallel exploration of multiple reasoning paths, allowing an autonomous system to evaluate several potential action sequences before committing to a final decision or output.
Key Characteristics of Beam Search
Beam search is a heuristic search algorithm that explores a graph by expanding the most promising nodes in a limited set, defined by the beam width, at each level. It balances the completeness of breadth-first search with the efficiency of greedy search.
Beam Width (k)
The beam width (k) is the algorithm's core hyperparameter, representing the maximum number of candidate sequences retained at each step of the search. It creates a memory and compute budget, trading off between:
- Greedy Search (k=1): Only the single best hypothesis is kept, highly efficient but prone to local optima.
- BFS-like Search (k=∞): All hypotheses are kept, guaranteeing optimality but at prohibitive exponential cost. A typical beam width ranges from 4 to 10 for language generation tasks, providing a practical balance.
Time and Space Efficiency
Beam search achieves polynomial time and space complexity relative to sequence length, unlike the exponential growth of exhaustive search. At each time step t, the algorithm:
- Expands the
kbest hypotheses from stept-1. - Scores the resulting
k * Vcandidates (whereVis vocabulary size). - Prunes back to the top-
kfor the next step. This results in O(k * V * T) time complexity, making it feasible for large vocabularies and long sequences where exhaustive search is impossible.
Pruning Strategy & Hypothesis Selection
The algorithm maintains a beam—a priority queue of the top-k partial sequences ranked by a scoring function (typically log probability). At each iteration:
- Each hypothesis in the beam is expanded by all possible next tokens.
- New scores are computed (cumulative log probability).
- All new hypotheses are pooled and sorted; only the top-
ksurvive. This pruning is aggressive and deterministic, discarding potentially good long-term paths for short-term gains, which is its primary limitation versus optimal algorithms like A*.
Relation to Breadth-First Search
Beam search is a best-first, breadth-limited variant of Breadth-First Search (BFS).
- Similar to BFS: It explores the tree level-by-level.
- Difference from BFS: Instead of keeping all nodes at a level (which grows exponentially), it keeps only the
kmost promising nodes. - Result: It approximates BFS's ability to avoid the depth traps of DFS but within a fixed memory budget. It is not complete (may not find a solution) and not optimal (may not find the best solution) due to this pruning.
Common Applications: Sequence Generation
Beam search is the dominant decoding algorithm for autoregressive sequence generation tasks:
- Machine Translation: Generating the most fluent and accurate translation from a source sentence.
- Text Summarization: Producing a concise summary from a long document.
- Speech Recognition: Decoding the most probable word sequence from acoustic features.
- Code Generation: Generating syntactically correct code from natural language descriptions. In these tasks, the scoring function is usually the log probability assigned by a trained model (e.g., an LLM), and the beam width is tuned for a balance of quality and latency.
Core Limitations and Mitigations
Standard beam search has known limitations, leading to several algorithmic variants:
- Lack of Diversity: The beam can become filled with nearly identical sequences. Mitigation: Use diversity-promoting penalties (e.g., Hamming diversity) or group beam search.
- Length Bias: It favors shorter sequences due to the multiplicative nature of probabilities. Mitigation: Apply length normalization (e.g., dividing score by length).
- Suboptimality: Pruning can discard the globally best path. Mitigation: Increased beam width (costly) or integrated use with top-k/top-p sampling for a stochastic hybrid approach. These trade-offs make beam search a tunable engineering component rather than a purely theoretical optimizer.
How Beam Search Works: Step-by-Step
Beam search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set, the beam width, at each level.
Beam search operates on a state space represented as a tree, where each node is a partial solution (e.g., a sequence of tokens). The algorithm maintains a beam—a fixed-size set of the most promising nodes at the current depth, defined by a heuristic function or score. At each step, it generates all successors for the nodes in the beam, scores them, and selects only the top-k to form the new beam, pruning all others. This process repeats until a termination condition is met, such as reaching a maximum depth or a complete solution.
The algorithm's efficiency stems from its pruning of low-probability paths, which prevents the exponential growth typical of exhaustive methods like breadth-first search. The beam width parameter directly controls the trade-off between computational cost and solution quality; a wider beam explores more possibilities at the expense of greater memory and time. In sequence generation tasks like machine translation, beam search approximates the most likely output sequence by iteratively extending high-scoring partial hypotheses, balancing the exploration-exploitation tradeoff within a manageable search frontier.
Frequently Asked Questions
Beam search is a core heuristic search algorithm used in AI planning and sequence generation. These questions address its mechanics, applications, and trade-offs for developers and system architects.
Beam search is a heuristic graph search algorithm that explores a tree or graph by expanding only the most promising nodes at each depth level, as defined by a fixed parameter called the beam width (k). It works by maintaining a beam—a limited set of the best partial solutions—at each step of the search. At each level, all nodes in the current beam are expanded, their children are evaluated using a heuristic or scoring function, and only the top-k highest-scoring children are retained for the next beam. This process repeats until a terminal state (e.g., a complete sequence) is found or a maximum depth is reached. Unlike exhaustive searches like Breadth-First Search (BFS), beam search prunes less promising paths to manage computational cost, making it a best-first search variant with a memory budget.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Beam search is a core heuristic search algorithm. Understanding its relationship to these other search and optimization techniques is essential for designing efficient AI planning and reasoning systems.
Best-First Search
Best-First Search is a graph traversal algorithm that expands the most promising node according to an evaluation function, typically implemented with a priority queue. Unlike beam search, which maintains a fixed-width beam of candidates, best-first search explores a single, globally optimal path at each step, which can require significant memory for the open list.
- Key Mechanism: Uses a heuristic function
f(n)to prioritize node expansion. - Primary Use Case: Pathfinding (e.g., A* algorithm) and optimization problems where finding any optimal path is critical.
- Contrast with Beam Search: Beam search is a memory-constrained variant of best-first search, sacrificing global optimality guarantees for efficiency by pruning all but the top-k (beam width) nodes at each depth level.
Breadth-First Search (BFS)
Breadth-First Search is an uninformed search algorithm that explores a tree or graph by visiting all nodes at the present depth level before moving to nodes at the next depth level. It guarantees finding the shortest path in an unweighted graph but suffers from exponential memory complexity O(b^d), where b is the branching factor and d is the solution depth.
- Key Mechanism: Uses a First-In-First-Out (FIFO) queue.
- Primary Use Case: Exhaustive exploration where solution depth is minimal or shortest-path guarantees are required.
- Contrast with Beam Search: Beam search can be viewed as a memory-bounded approximation of BFS. Instead of keeping all nodes at a level, it retains only the top-k most promising nodes, trading completeness for tractability in large state spaces.
Greedy Search
Greedy Search (or Greedy Best-First Search) is a heuristic algorithm that always expands the node that appears to be closest to the goal, based solely on a heuristic function h(n). It is highly efficient but often incomplete and non-optimal, as it can become stuck in local optima.
- Key Mechanism: Expands the node with the lowest heuristic cost
h(n)to the goal. - Primary Use Case: Problems where computational speed is prioritized over solution quality, or where the heuristic is highly reliable.
- Contrast with Beam Search: Beam search with a width of
k=1is equivalent to greedy search. However, withk>1, beam search maintains multiple hypotheses in parallel, making it more robust to heuristic errors and local optima than a purely greedy approach.
Viterbi Algorithm
The Viterbi Algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states (the Viterbi path) in a Hidden Markov Model (HMM). It is mathematically analogous to beam search with a beam width equal to the total number of possible states.
- Key Mechanism: Computes the maximum a posteriori probability estimate for each state sequence by recursively evaluating all paths and retaining only the most probable path to each state.
- Primary Use Case: Sequence decoding in signal processing, speech recognition, and bioinformatics.
- Contrast with Beam Search: The Viterbi algorithm is exact and considers all states. Beam search applied to sequence models (like in neural machine translation) is an approximation that prunes all but the
kbest partial sequences at each time step, vastly improving speed for large vocabularies.
Top-k Sampling
Top-k Sampling is a decoding strategy for autoregressive language models where, at each generation step, the probability distribution over the vocabulary is truncated to only the k most likely tokens, from which one is sampled. This balances diversity and coherence.
- Key Mechanism: Filters the vocabulary to a shortlist of size
kbefore sampling, preventing low-probability, nonsensical tokens. - Primary Use Case: Text generation with large language models to improve output quality.
- Contrast with Beam Search: Both methods use a width parameter
k. However, top-k is a sampling-based method (stochastic) that produces diverse outputs, while beam search is a deterministic search for the highest-scoring overall sequence according to the model's probabilities. Beam search often leads to more fluent but potentially generic and repetitive text.
Monte Carlo Tree Search (MCTS)
Monte Carlo Tree Search is a best-first, heuristic search algorithm for decision processes that combines tree search with random sampling. It builds a search tree asymmetrically, focusing computational resources on the most promising regions of the state space through repeated iterations of selection, expansion, simulation, and backpropagation.
- Key Mechanism: Uses random rollouts (simulations) to estimate the value of leaf nodes, guiding future exploration via the Upper Confidence Bound for Trees (UCT) formula.
- Primary Use Case: Game-playing AI (e.g., AlphaGo, AlphaZero), complex planning, and combinatorial optimization.
- Contrast with Beam Search: MCTS is adaptive and asymmetric, deeply exploring promising lines. Beam search is synchronous and breadth-oriented, exploring all retained branches equally to a fixed depth. MCTS is often more sample-efficient for problems with deep, complex state spaces where heuristic evaluation is difficult.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us