Inferensys

Glossary

Beam Search

Beam Search is a heuristic search algorithm that explores a graph by expanding only the most promising nodes in a limited set, called the beam width, to reduce memory usage compared to breadth-first search.
Stylish WeWork-like workspace with hot desks and document wall, professional searching through enterprise knowledge base on a mounted ultrawide display, warm industrial pendants overhead.
HEURISTIC SEARCH ALGORITHM

What is Beam Search?

Beam Search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set, called the beam width, to reduce memory usage compared to breadth-first search.

Beam Search is a best-first search algorithm that maintains only a fixed number, k (the beam width), of the most promising nodes at each level of the search tree. Unlike breadth-first search, which expands all nodes at a given depth, Beam Search prunes less promising branches, significantly reducing memory and computational cost. It is not guaranteed to find an optimal solution, as its performance depends heavily on the quality of the heuristic function used to evaluate nodes.

The algorithm is fundamental in sequence generation tasks for large language models, where it is used to decode high-probability text sequences token-by-token. By tracking multiple candidate sequences (the beam), it balances exploration and efficiency, outperforming greedy decoding while being far less computationally intensive than an exhaustive search. Its parameters, beam width and length normalization, are critical for tuning the trade-off between output quality and inference latency.

HEURISTIC SEARCH ALGORITHM

Key Characteristics of Beam Search

Beam Search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set, called the beam width, to reduce memory usage compared to breadth-first search. It is a core technique for balancing exploration and computational efficiency in sequence generation tasks.

01

Beam Width (k)

The beam width (k) is the algorithm's primary hyperparameter, defining the number of most promising partial sequences (or nodes) retained at each step of the search. This creates a memory- and compute-efficient approximation of breadth-first search.

  • k=1: Equivalent to a greedy search, following the single best option at each step.
  • k=n (full breadth): Equivalent to exhaustive breadth-first search, exploring all possibilities but at high computational cost.
  • Typical values: Range from 4 to 10 for tasks like machine translation or text summarization, providing a practical trade-off between quality and speed.
02

Pruning & Candidate Selection

At each decoding step, Beam Search performs a pruning operation. It generates all possible next-step extensions for the k sequences in the current beam, scores them (e.g., using log probabilities), and then selects only the top-k highest-scoring sequences to form the new beam. All other candidates are permanently discarded.

This selective pruning is what differentiates it from best-first search algorithms that maintain a global priority queue, making Beam Search far more memory-efficient for long sequences common in natural language generation.

03

Time & Space Complexity

Beam Search offers a predictable computational profile, making it suitable for production systems.

  • Time Complexity: Approximately O(k * V * T), where k is the beam width, V is the vocabulary size (for language models), and T is the target sequence length. This is a dramatic reduction from the exponential complexity of exploring all possible sequences.
  • Space Complexity: O(k * T), as the algorithm must store the k candidate sequences, each of length up to T. This is its key advantage over breadth-first search, which has a space complexity that grows exponentially with depth.
04

Termination & Output Selection

Beam Search typically terminates when all k candidate sequences in the beam have reached an end-of-sequence token or when a predefined maximum length T is reached. The final output is selected from the completed sequences in the beam.

The standard selection criterion is the sequence with the highest cumulative score, often the sum of log probabilities. However, this can favor shorter sequences. Length normalization (dividing the score by sequence length) is a common technique to correct for this bias and is critical for comparing sequences of different lengths.

05

Comparison to Greedy & BFS

Beam Search occupies a strategic middle ground between other fundamental search strategies.

  • vs. Greedy Decoding (k=1): Greedy search is faster but can commit to a suboptimal early token, leading to locally optimal but globally poor sequences (a greedy trap). Beam Search maintains multiple hypotheses, offering a chance to recover from early mistakes.
  • vs. Breadth-First Search (BFS): BFS is guaranteed to find the optimal sequence but is computationally infeasible for large state spaces (like language model vocabularies). Beam Search provides a bounded approximation of BFS, trading optimality guarantees for tractability.
06

Primary Use Case: Sequence Generation

Beam Search is the de facto standard decoding algorithm for autoregressive sequence generation tasks where quality is prioritized over pure speed. Its core application is guiding large language models (LLMs) and sequence-to-sequence models.

Common Applications:

  • Machine Translation (e.g., Google's Neural Machine Translation systems)
  • Text Summarization
  • Image Captioning
  • Speech Recognition (decoding over phoneme or word sequences)
  • Code Generation

It is less suitable for tasks requiring highly diverse outputs or creative exploration, where sampling-based methods (like top-k or nucleus sampling) are often preferred.

HEURISTIC SEARCH COMPARISON

Beam Search vs. Other Search Algorithms

A feature comparison of Beam Search against other common heuristic and uninformed search algorithms, highlighting trade-offs in memory, optimality, and computational efficiency relevant to agent decision-making.

Feature / MetricBeam SearchBreadth-First Search (BFS)Depth-First Search (DFS)A* Search

Primary Use Case

Pruning large state spaces (e.g., text generation, planning)

Finding shortest path in unweighted graphs, exhaustive exploration

Exploring deep branches, cycle detection, topology sorting

Finding optimal cost path with a known admissible heuristic

Memory Complexity (Worst-Case)

O(b * w)

O(b^d)

O(b * d)

O(b^d)

Optimality Guarantee

No (suboptimal due to beam pruning)

Yes (for unweighted graphs)

No

Yes (with admissible heuristic)

Completeness Guarantee

No (can prune goal)

Yes

No (in infinite spaces)

Yes

Time Complexity (Worst-Case)

O(b * w * d)

O(b^d)

O(b^d)

O(b^d)

Guided by Heuristic?

Key Parameter(s)

Beam Width (w)

None

Depth Limit (optional)

Heuristic Function h(n)

Best For Agentic Systems When...

Generating high-probability sequences under strict memory constraints

Guaranteeing shortest action plans in small, unweighted state spaces

Exploring deep, unbounded reasoning chains where memory is limited

Finding provably optimal action sequences with a reliable cost model

BEAM SEARCH

Frequently Asked Questions

Beam Search is a core heuristic search algorithm for navigating vast decision spaces efficiently. These questions address its mechanics, trade-offs, and practical applications in AI systems.

Beam Search is a heuristic graph search algorithm that explores a state space by expanding only the most promising k nodes at each depth level, where k is the beam width. It operates by maintaining a beam—a limited-size priority queue of the best nodes—rather than the full frontier used in breadth-first search. At each step, it generates all successors for the nodes in the current beam, evaluates them using a scoring function (often a heuristic plus path cost), and selects the top k successors to form the new beam. This process repeats until a goal state is found or a maximum depth is reached. The algorithm provides a memory-efficient compromise between greedy search (beam width of 1) and exhaustive breadth-first search.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.