Inferensys

Glossary

Iterative Linear Quadratic Regulator (iLQR)

The Iterative Linear Quadratic Regulator (iLQR) is an efficient trajectory optimization algorithm that iteratively linearizes dynamics and quadratizes cost to compute optimal control updates.
Developer reviewing LLM cost optimization spreadsheet on laptop, calculator and coffee on desk, casual finance-technical moment.
ALGORITHM

What is Iterative Linear Quadratic Regulator (iLQR)?

The Iterative Linear Quadratic Regulator (iLQR) is a trajectory optimization algorithm for solving nonlinear optimal control problems.

The Iterative Linear Quadratic Regulator (iLQR) is an efficient, gradient-based algorithm for computing locally optimal control sequences for nonlinear dynamical systems. It operates by iteratively linearizing the system dynamics and quadratizing the cost function around a nominal trajectory, then solving the resulting Linear Quadratic Regulator (LQR) subproblem to compute a control update. This process repeats until convergence, producing a trajectory that minimizes a defined cost.

iLQR is a cornerstone of model-based reinforcement learning (MBRL) and Model Predictive Control (MPC), prized for its quadratic convergence rate near an optimum. Its core output is a sequence of feedback gain matrices and feedforward controls, enabling real-time, closed-loop execution. The algorithm is closely related to Differential Dynamic Programming (DDP), with iLQR typically ignoring second-order dynamics terms for faster computation, making it highly practical for real-time robotic and autonomous system applications.

ALGORITHM MECHANICS

Key Features of iLQR

The Iterative Linear Quadratic Regulator (iLQR) is a trajectory optimization algorithm that efficiently computes locally optimal control sequences by iteratively refining a nominal trajectory. Its core features are derived from its iterative, model-based approach to solving nonlinear optimal control problems.

01

Iterative Local Approximation

iLQR operates through successive linear-quadratic approximations. Starting from an initial, often random, control sequence (nominal trajectory), the algorithm iteratively:

  • Linearizes the nonlinear system dynamics around the current nominal trajectory.
  • Quadratizes the cost function (to second order) around the same trajectory.
  • Solves the resulting Linear Quadratic Regulator (LQR) problem to compute a control update.
  • Applies the update to generate a new, improved nominal trajectory for the next iteration. This local approximation converts a hard nonlinear problem into a series of tractable, convex subproblems.
02

Differential Dynamic Programming (DDP) Core

iLQR is a specific variant of Differential Dynamic Programming (DDP). The key distinction is that iLQR typically ignores the second-order derivatives of the system dynamics during the backward pass, assuming they have negligible effect on the optimal feedback gains. This approximation:

  • Dramatically reduces computational cost per iteration.
  • Maintains super-linear convergence rates near an optimum.
  • Makes the algorithm more practical for systems with complex, high-dimensional dynamics where computing full second derivatives is prohibitive. The forward pass simulates the new trajectory using the updated linear feedback policy.
03

Forward & Backward Passes

Each iteration consists of two distinct computational sweeps through time:

  1. Forward Pass: Simulates the system forward in time using the current control sequence and the locally optimal feedback law computed in the previous backward pass. This generates a new nominal trajectory (states) and evaluates its total cost.
  2. Backward Pass: Works backwards from the final time step to the first. At each step, it:
    • Computes local quadratic models of the value function (cost-to-go).
    • Solves for the optimal feedforward control adjustment and linear feedback gain matrix for that time step. This backward-forward structure is fundamental to dynamic programming and enables the incorporation of future cost information into immediate control decisions.
04

Locally Optimal Feedback Policy

The output of iLQR is not just an open-loop sequence of actions, but a time-varying linear feedback policy. For each time step (k), the algorithm computes:

  • A feedforward term ((\mathbf{l}_k)): The nominal control adjustment.
  • A feedback gain matrix ((\mathbf{L}_k)): Optimal linear response to state deviations. The executed control is: (\mathbf{u}_k = \mathbf{u}_k^{nominal} + \mathbf{l}_k + \mathbf{L}_k(\mathbf{x}_k - \mathbf{x}_k^{nominal})). This policy provides disturbance rejection, making the controller robust to small model errors and perturbations during execution, a significant advantage over pure open-loop trajectory optimization.
05

Line Search & Regularization

To ensure stable convergence, iLQR implementations include mechanisms to control the step size:

  • Line Search: The forward pass tests the new policy with a scaling factor on the control update. It accepts the step only if it produces a sufficient decrease in the total cost (satisfying the Armijo condition).
  • Regularization: Inspired by the Levenberg-Marquardt method, a damping term ((\mu)) is added to the Hessian of the value function during the backward pass. This term:
    • Ensures the computed descent direction is valid when approximations are poor.
    • Is increased if a step is rejected and decreased if a step is accepted, adapting to the local curvature of the problem. These features make iLQR robust to poor initializations.
06

Connection to Optimal Control & RL

iLQR sits at the intersection of classical optimal control and modern model-based reinforcement learning (MBRL):

  • Optimal Control: It solves the finite-horizon, discrete-time nonlinear optimal control problem directly, providing a model-based planning solution. It is closely related to Model Predictive Control (MPC), where iLQR can serve as the underlying optimizer for the MPC's online planning loop.
  • Model-Based RL: In MBRL, iLQR is used as the planning algorithm atop a learned dynamics model. The learned model (transition model) provides the dynamics for linearization, and iLQR computes the optimal actions. This combination is a cornerstone of sample-efficient RL, as planning with a model requires fewer expensive interactions with the real environment.
ITERATIVE LINEAR QUADRATIC REGULATOR (ILQR)

Frequently Asked Questions

The Iterative Linear Quadratic Regulator (iLQR) is a foundational algorithm in optimal control and model-based reinforcement learning. These questions address its core mechanics, applications, and relationship to other planning techniques.

The Iterative Linear Quadratic Regulator (iLQR) is a trajectory optimization algorithm that efficiently computes locally optimal control sequences for nonlinear dynamical systems by iteratively solving a series of linear-quadratic approximations. It is a differential dynamic programming (DDP) method that leverages a second-order expansion of the value function to achieve fast, quadratic convergence near an optimum. The algorithm works by taking a nominal trajectory (an initial guess of states and controls), linearizing the system dynamics around it, and quadratizing the cost function. It then solves the resulting Linear Quadratic Regulator (LQR) problem to compute a feedforward and feedback control policy that improves the trajectory. This new trajectory becomes the nominal for the next iteration, repeating until convergence. iLQR is prized for its sample efficiency in model-based reinforcement learning (MBRL), as it requires only a differentiable dynamics model and cost function, not millions of environment interactions.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.