Inferensys

Glossary

Beam Search

Beam search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set, the beam width, at each level.
Stylish WeWork-like workspace with hot desks and document wall, professional searching through enterprise knowledge base on a mounted ultrawide display, warm industrial pendants overhead.
HEURISTIC SEARCH ALGORITHM

What is Beam Search?

Beam search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set, the beam width, at each level.

Beam search is a best-first search algorithm that maintains a fixed-size set, the beam width (k), of the most promising partial solutions at each step of the search tree. It expands only the k best nodes according to a heuristic evaluation function, pruning all others to manage computational cost. This makes it a memory-efficient and time-efficient approximation of breadth-first search, trading optimality for speed in vast state spaces like those encountered in machine translation and sequence generation tasks.

The algorithm operates level-by-level, generating all successors for the current beam, scoring them, and selecting the top k for the next iteration. It is fundamentally tied to the exploration-exploitation tradeoff, where a wider beam explores more but costs more. In Tree-of-Thought reasoning and agentic planning, beam search enables parallel exploration of multiple reasoning paths, allowing an autonomous system to evaluate several potential action sequences before committing to a final decision or output.

HEURISTIC SEARCH ALGORITHM

Key Characteristics of Beam Search

Beam search is a heuristic search algorithm that explores a graph by expanding the most promising nodes in a limited set, defined by the beam width, at each level. It balances the completeness of breadth-first search with the efficiency of greedy search.

01

Beam Width (k)

The beam width (k) is the algorithm's core hyperparameter, representing the maximum number of candidate sequences retained at each step of the search. It creates a memory and compute budget, trading off between:

  • Greedy Search (k=1): Only the single best hypothesis is kept, highly efficient but prone to local optima.
  • BFS-like Search (k=∞): All hypotheses are kept, guaranteeing optimality but at prohibitive exponential cost. A typical beam width ranges from 4 to 10 for language generation tasks, providing a practical balance.
02

Time and Space Efficiency

Beam search achieves polynomial time and space complexity relative to sequence length, unlike the exponential growth of exhaustive search. At each time step t, the algorithm:

  • Expands the k best hypotheses from step t-1.
  • Scores the resulting k * V candidates (where V is vocabulary size).
  • Prunes back to the top-k for the next step. This results in O(k * V * T) time complexity, making it feasible for large vocabularies and long sequences where exhaustive search is impossible.
03

Pruning Strategy & Hypothesis Selection

The algorithm maintains a beam—a priority queue of the top-k partial sequences ranked by a scoring function (typically log probability). At each iteration:

  • Each hypothesis in the beam is expanded by all possible next tokens.
  • New scores are computed (cumulative log probability).
  • All new hypotheses are pooled and sorted; only the top-k survive. This pruning is aggressive and deterministic, discarding potentially good long-term paths for short-term gains, which is its primary limitation versus optimal algorithms like A*.
04

Relation to Breadth-First Search

Beam search is a best-first, breadth-limited variant of Breadth-First Search (BFS).

  • Similar to BFS: It explores the tree level-by-level.
  • Difference from BFS: Instead of keeping all nodes at a level (which grows exponentially), it keeps only the k most promising nodes.
  • Result: It approximates BFS's ability to avoid the depth traps of DFS but within a fixed memory budget. It is not complete (may not find a solution) and not optimal (may not find the best solution) due to this pruning.
05

Common Applications: Sequence Generation

Beam search is the dominant decoding algorithm for autoregressive sequence generation tasks:

  • Machine Translation: Generating the most fluent and accurate translation from a source sentence.
  • Text Summarization: Producing a concise summary from a long document.
  • Speech Recognition: Decoding the most probable word sequence from acoustic features.
  • Code Generation: Generating syntactically correct code from natural language descriptions. In these tasks, the scoring function is usually the log probability assigned by a trained model (e.g., an LLM), and the beam width is tuned for a balance of quality and latency.
06

Core Limitations and Mitigations

Standard beam search has known limitations, leading to several algorithmic variants:

  • Lack of Diversity: The beam can become filled with nearly identical sequences. Mitigation: Use diversity-promoting penalties (e.g., Hamming diversity) or group beam search.
  • Length Bias: It favors shorter sequences due to the multiplicative nature of probabilities. Mitigation: Apply length normalization (e.g., dividing score by length).
  • Suboptimality: Pruning can discard the globally best path. Mitigation: Increased beam width (costly) or integrated use with top-k/top-p sampling for a stochastic hybrid approach. These trade-offs make beam search a tunable engineering component rather than a purely theoretical optimizer.
ALGORITHM EXPLANATION

How Beam Search Works: Step-by-Step

Beam search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set, the beam width, at each level.

Beam search operates on a state space represented as a tree, where each node is a partial solution (e.g., a sequence of tokens). The algorithm maintains a beam—a fixed-size set of the most promising nodes at the current depth, defined by a heuristic function or score. At each step, it generates all successors for the nodes in the beam, scores them, and selects only the top-k to form the new beam, pruning all others. This process repeats until a termination condition is met, such as reaching a maximum depth or a complete solution.

The algorithm's efficiency stems from its pruning of low-probability paths, which prevents the exponential growth typical of exhaustive methods like breadth-first search. The beam width parameter directly controls the trade-off between computational cost and solution quality; a wider beam explores more possibilities at the expense of greater memory and time. In sequence generation tasks like machine translation, beam search approximates the most likely output sequence by iteratively extending high-scoring partial hypotheses, balancing the exploration-exploitation tradeoff within a manageable search frontier.

BEAM SEARCH

Frequently Asked Questions

Beam search is a core heuristic search algorithm used in AI planning and sequence generation. These questions address its mechanics, applications, and trade-offs for developers and system architects.

Beam search is a heuristic graph search algorithm that explores a tree or graph by expanding only the most promising nodes at each depth level, as defined by a fixed parameter called the beam width (k). It works by maintaining a beam—a limited set of the best partial solutions—at each step of the search. At each level, all nodes in the current beam are expanded, their children are evaluated using a heuristic or scoring function, and only the top-k highest-scoring children are retained for the next beam. This process repeats until a terminal state (e.g., a complete sequence) is found or a maximum depth is reached. Unlike exhaustive searches like Breadth-First Search (BFS), beam search prunes less promising paths to manage computational cost, making it a best-first search variant with a memory budget.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.