Inferensys

Glossary

Simulated Annealing

Simulated Annealing is a probabilistic optimization algorithm inspired by the annealing process in metallurgy, which allows occasional moves to worse states to escape local optima, with the probability of accepting worse moves decreasing over time according to a cooling schedule.
Finance analyst reviewing cash flow AI optimization on laptop, charts and projections visible, home office work session.
HEURISTIC SEARCH ALGORITHMS

What is Simulated Annealing?

A probabilistic optimization algorithm inspired by metallurgical annealing, designed to escape local optima by occasionally accepting worse solutions.

Simulated Annealing (SA) is a metaheuristic optimization algorithm that finds an approximate global optimum in a large, complex search space. It is a probabilistic technique inspired by the physical process of annealing in metallurgy, where a material is heated and slowly cooled to reduce defects and minimize its internal energy. The algorithm's core mechanism is its ability to accept moves to worse states with a probability that decreases over time, controlled by a temperature parameter and a defined cooling schedule. This allows it to escape local optima that trap simpler algorithms like Hill Climbing.

The algorithm begins at a high temperature, where it frequently accepts worse solutions to explore the state space broadly. As the temperature cools according to the schedule, it becomes increasingly greedy, converging toward a local—and ideally global—optimum. Key parameters include the initial temperature, cooling rate, and stopping criterion. It is widely applied to NP-hard problems like the traveling salesman, VLSI design, and scheduling, where exhaustive search is infeasible. Within Agentic Cognitive Architectures, SA provides a robust planning and decision-making component for agents navigating environments with deceptive reward landscapes.

HEURISTIC SEARCH ALGORITHMS

Key Features of Simulated Annealing

Simulated Annealing is a probabilistic metaheuristic for global optimization, inspired by the physical annealing process in metallurgy. Its defining features enable it to escape local optima and navigate complex, high-dimensional search spaces.

01

Probabilistic Acceptance of Worse Moves

The core mechanism that differentiates Simulated Annealing from greedy algorithms like Hill Climbing. At each iteration, the algorithm may accept a new candidate solution that is worse than the current one, with a probability given by the Metropolis criterion: P(accept) = exp(-ΔE / T), where ΔE is the change in solution cost (or energy) and T is the current temperature. This stochastic acceptance allows the search to escape local optima by traversing through valleys in the energy landscape.

02

Temperature Parameter & Cooling Schedule

The temperature (T) is a control parameter that dictates the exploration-exploitation balance. A high initial temperature encourages widespread exploration (high probability of accepting worse moves). The cooling schedule is a deterministic function that gradually reduces the temperature over time, slowly shifting the algorithm's behavior from exploration to exploitation. Common schedules include:

  • Geometric Cooling: T_{k+1} = α * T_k, where α is a constant (e.g., 0.95).
  • Linear Cooling: T_{k+1} = T_k - β.
  • Logarithmic Cooling: T_{k+1} = T_0 / log(k), which guarantees convergence to the global optimum but is impractically slow.
03

Asymptotic Convergence Guarantee

Under specific theoretical conditions, Simulated Annealing can be proven to converge to the global optimum with probability 1. This guarantee requires an infinitely slow cooling schedule, such as the logarithmic schedule T(k) ≥ C / log(1 + k), where C is a problem-dependent constant. While impractical for finite-time computation, this property establishes it as a global optimization method, unlike purely local search algorithms. In practice, faster cooling schedules are used to find high-quality solutions efficiently.

04

Neighborhood Structure

The algorithm's performance is heavily dependent on how a neighbor is defined for a given solution. The neighborhood structure is the set of all states reachable from the current state by applying a small, predefined perturbation (e.g., swapping two cities in a TSP tour, flipping a bit in a binary string). A well-designed neighborhood should:

  • Be efficiently computable.
  • Allow the search to traverse the entire state space.
  • Provide a smooth gradient for the algorithm to follow. Poor neighborhood design can render the temperature schedule ineffective.
05

Application to Discrete & Continuous Spaces

While classically applied to combinatorial optimization problems (e.g., Traveling Salesperson Problem, job scheduling, circuit placement), variants exist for continuous domains. For continuous optimization, the neighborhood move is typically a random step in parameter space, often drawn from a Gaussian distribution. The step size can be annealed alongside the temperature. This makes it a versatile tool for optimizing agent policies, neural network weights, or hyperparameters where gradient information is unavailable or unreliable.

06

Connection to Markov Chain Monte Carlo (MCMC)

Simulated Annealing is fundamentally linked to MCMC sampling methods. At a fixed temperature T, the sequence of states visited by the algorithm forms a Markov chain whose stationary distribution is the Boltzmann distribution π(s) ∝ exp(-E(s) / T). This means the algorithm spends more time in low-energy (high-quality) states. Annealing can be viewed as running a series of MCMC chains at progressively lower temperatures, gradually focusing the sampling on the deepest minima. This statistical mechanics perspective provides a robust theoretical foundation.

SIMULATED ANNEALING

Frequently Asked Questions

A probabilistic optimization technique inspired by metallurgy, used to find global optima in complex search spaces by allowing controlled exploration of worse states.

Simulated Annealing is a probabilistic optimization algorithm designed to approximate the global optimum of a given function in a large search space. It works by iteratively proposing random moves to neighboring states. A move to a better state is always accepted, while a move to a worse state is accepted with a probability determined by the Metropolis criterion: P(accept) = exp(-ΔE / T), where ΔE is the change in the objective function (energy) and T is a global temperature parameter. The algorithm starts with a high temperature, allowing frequent exploration of worse states to escape local optima. Over time, the temperature is gradually reduced according to a cooling schedule, making the algorithm increasingly greedy and converging towards a (hopefully global) optimum.

Key components include:

  • State: A candidate solution to the optimization problem.
  • Energy (Cost) Function: The objective function to be minimized.
  • Neighbor Generation Function: Defines how to perturb the current state to create a new candidate.
  • Temperature (T): Controls the probability of accepting worse moves.
  • Cooling Schedule: The rule for decreasing T over time (e.g., T_{k+1} = α * T_k, where α is a decay rate like 0.95).
Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.