Glossary

Iterative Linear Quadratic Regulator (iLQR)

The Iterative Linear Quadratic Regulator (iLQR) is an efficient trajectory optimization algorithm that iteratively linearizes dynamics and quadratizes cost to compute optimal control updates.

Get in touch Learn more

Developer reviewing LLM cost optimization spreadsheet on laptop, calculator and coffee on desk, casual finance-technical moment.

ALGORITHM

What is Iterative Linear Quadratic Regulator (iLQR)?

The Iterative Linear Quadratic Regulator (iLQR) is a trajectory optimization algorithm for solving nonlinear optimal control problems.

The Iterative Linear Quadratic Regulator (iLQR) is an efficient, gradient-based algorithm for computing locally optimal control sequences for nonlinear dynamical systems. It operates by iteratively linearizing the system dynamics and quadratizing the cost function around a nominal trajectory, then solving the resulting Linear Quadratic Regulator (LQR) subproblem to compute a control update. This process repeats until convergence, producing a trajectory that minimizes a defined cost.

iLQR is a cornerstone of model-based reinforcement learning (MBRL) and Model Predictive Control (MPC), prized for its quadratic convergence rate near an optimum. Its core output is a sequence of feedback gain matrices and feedforward controls, enabling real-time, closed-loop execution. The algorithm is closely related to Differential Dynamic Programming (DDP), with iLQR typically ignoring second-order dynamics terms for faster computation, making it highly practical for real-time robotic and autonomous system applications.

ALGORITHM MECHANICS

Key Features of iLQR

The Iterative Linear Quadratic Regulator (iLQR) is a trajectory optimization algorithm that efficiently computes locally optimal control sequences by iteratively refining a nominal trajectory. Its core features are derived from its iterative, model-based approach to solving nonlinear optimal control problems.

Iterative Local Approximation

iLQR operates through successive linear-quadratic approximations. Starting from an initial, often random, control sequence (nominal trajectory), the algorithm iteratively:

Linearizes the nonlinear system dynamics around the current nominal trajectory.
Quadratizes the cost function (to second order) around the same trajectory.
Solves the resulting Linear Quadratic Regulator (LQR) problem to compute a control update.
Applies the update to generate a new, improved nominal trajectory for the next iteration. This local approximation converts a hard nonlinear problem into a series of tractable, convex subproblems.

Differential Dynamic Programming (DDP) Core

iLQR is a specific variant of Differential Dynamic Programming (DDP). The key distinction is that iLQR typically ignores the second-order derivatives of the system dynamics during the backward pass, assuming they have negligible effect on the optimal feedback gains. This approximation:

Dramatically reduces computational cost per iteration.
Maintains super-linear convergence rates near an optimum.
Makes the algorithm more practical for systems with complex, high-dimensional dynamics where computing full second derivatives is prohibitive. The forward pass simulates the new trajectory using the updated linear feedback policy.

Forward & Backward Passes

Each iteration consists of two distinct computational sweeps through time:

Forward Pass: Simulates the system forward in time using the current control sequence and the locally optimal feedback law computed in the previous backward pass. This generates a new nominal trajectory (states) and evaluates its total cost.
Backward Pass: Works backwards from the final time step to the first. At each step, it:
- Computes local quadratic models of the value function (cost-to-go).
- Solves for the optimal feedforward control adjustment and linear feedback gain matrix for that time step. This backward-forward structure is fundamental to dynamic programming and enables the incorporation of future cost information into immediate control decisions.

Locally Optimal Feedback Policy

The output of iLQR is not just an open-loop sequence of actions, but a time-varying linear feedback policy. For each time step (k), the algorithm computes:

A feedforward term ((\mathbf{l}_k)): The nominal control adjustment.
A feedback gain matrix ((\mathbf{L}_k)): Optimal linear response to state deviations. The executed control is: (\mathbf{u}_k = \mathbf{u}_k^{nominal} + \mathbf{l}_k + \mathbf{L}_k(\mathbf{x}_k - \mathbf{x}_k^{nominal})). This policy provides disturbance rejection, making the controller robust to small model errors and perturbations during execution, a significant advantage over pure open-loop trajectory optimization.

Line Search & Regularization

To ensure stable convergence, iLQR implementations include mechanisms to control the step size:

Line Search: The forward pass tests the new policy with a scaling factor on the control update. It accepts the step only if it produces a sufficient decrease in the total cost (satisfying the Armijo condition).
Regularization: Inspired by the Levenberg-Marquardt method, a damping term ((\mu)) is added to the Hessian of the value function during the backward pass. This term:
- Ensures the computed descent direction is valid when approximations are poor.
- Is increased if a step is rejected and decreased if a step is accepted, adapting to the local curvature of the problem. These features make iLQR robust to poor initializations.

Connection to Optimal Control & RL

iLQR sits at the intersection of classical optimal control and modern model-based reinforcement learning (MBRL):

Optimal Control: It solves the finite-horizon, discrete-time nonlinear optimal control problem directly, providing a model-based planning solution. It is closely related to Model Predictive Control (MPC), where iLQR can serve as the underlying optimizer for the MPC's online planning loop.
Model-Based RL: In MBRL, iLQR is used as the planning algorithm atop a learned dynamics model. The learned model (transition model) provides the dynamics for linearization, and iLQR computes the optimal actions. This combination is a cornerstone of sample-efficient RL, as planning with a model requires fewer expensive interactions with the real environment.

ITERATIVE LINEAR QUADRATIC REGULATOR (ILQR)

Frequently Asked Questions

The Iterative Linear Quadratic Regulator (iLQR) is a foundational algorithm in optimal control and model-based reinforcement learning. These questions address its core mechanics, applications, and relationship to other planning techniques.

The Iterative Linear Quadratic Regulator (iLQR) is a trajectory optimization algorithm that efficiently computes locally optimal control sequences for nonlinear dynamical systems by iteratively solving a series of linear-quadratic approximations. It is a differential dynamic programming (DDP) method that leverages a second-order expansion of the value function to achieve fast, quadratic convergence near an optimum. The algorithm works by taking a nominal trajectory (an initial guess of states and controls), linearizing the system dynamics around it, and quadratizing the cost function. It then solves the resulting Linear Quadratic Regulator (LQR) problem to compute a feedforward and feedback control policy that improves the trajectory. This new trajectory becomes the nominal for the next iteration, repeating until convergence. iLQR is prized for its sample efficiency in model-based reinforcement learning (MBRL), as it requires only a differentiable dynamics model and cost function, not millions of environment interactions.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

MODEL-BASED REINFORCEMENT LEARNING

Related Terms

iLQR is a core algorithm within the model-based reinforcement learning paradigm. These related terms define the components, challenges, and alternative methods within this sample-efficient approach to optimal control.

Trajectory Optimization

Trajectory optimization is the broader class of planning algorithms that search for a sequence of actions minimizing a cost function (or maximizing rewards) over a finite horizon, subject to a dynamics model. iLQR is a specific, efficient method within this class.

Key Methods: Include shooting methods (optimize actions directly) and collocation methods (optimize both states and actions).
Applications: Used in robotics for motion planning, aerospace for trajectory design, and anywhere a smooth, optimal control sequence is needed.
Contrast with Policy Search: Trajectory optimization finds a specific optimal path, while policy search learns a general function mapping states to actions.

Model Predictive Control (MPC)

Model Predictive Control (MPC) is an online, receding-horizon control strategy that repeatedly solves a finite-horizon trajectory optimization problem (often using iLQR as the solver) and executes only the first action before replanning.

Core Loop: 1) Measure current state, 2) Solve for optimal action sequence over horizon H, 3) Execute first action, 4) Repeat.
Robustness: This feedback mechanism makes MPC robust to model inaccuracies and external disturbances.
Computational Demand: Requires fast, real-time optimization, making efficient solvers like iLQR critical for high-frequency control (e.g., autonomous driving, drone flight).

Dynamics Model (Transition Model)

A dynamics model (or transition model) is a learned or known function f(s, a) that predicts the next state s' given the current state s and action a. It is the foundational component for any model-based algorithm, including iLQR.

Types: Can be analytic (physics-based equations) or learned (neural networks, Gaussian processes).
iLQR Requirement: iLQR requires the model to be differentiable to compute the linear approximations (Jacobians) of the dynamics around a trajectory.
Model Error: Inaccuracies in the dynamics model are a primary source of failure, leading to the compounding error problem in long-horizon planning.

Linear Quadratic Regulator (LQR)

The Linear Quadratic Regulator (LQR) is the foundational, non-iterative optimal control algorithm for linear dynamics and quadratic cost functions. iLQR generalizes LQR to nonlinear systems.

Assumptions: LQR assumes dynamics are of the form s' = A*s + B*a and cost is s^T Q s + a^T R a.
Solution: Provides an optimal linear feedback control policy a = -K*s via the Riccati equations.
iLQR Connection: Each iteration of iLQR solves a local LQR problem around the current trajectory, using linearized dynamics and a quadratic approximation of the cost.

Differential Dynamic Programming (DDP)

Differential Dynamic Programming (DDP) is a closely related trajectory optimization algorithm that, like iLQR, uses second-order approximations of the dynamics and cost. The primary difference lies in the expansion of the value function.

Second-Order Dynamics: DDP includes second-order derivatives (Hessians) of the dynamics in its approximation, while iLQR typically ignores them for speed.
Theoretical Basis: Both are derived from the principle of optimality and dynamic programming.
Performance: DDP can converge in fewer iterations but with higher per-iteration cost. iLQR is often preferred for its better computational trade-off.

Shooting Method vs. Collocation

These are the two main numerical approaches to trajectory optimization, defining what variables are optimized. iLQR is a shooting method.

Shooting Method (iLQR): Optimizes over the action sequence only. The state trajectory is implicitly defined by integrating the dynamics model forward from an initial state. This is efficient but can be numerically unstable for long horizons.
Collocation Method: Optimizes over both the state and action sequences simultaneously, adding the dynamics model as a constraint to the optimization problem. This is more stable for stiff systems but results in a much larger optimization problem.
Choice: iLQR's shooting approach is favored when fast, gradient-based optimization of controls is paramount.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.