Inferensys

Glossary

Trajectory Optimization

Trajectory optimization is a planning method that searches for a sequence of actions minimizing a cost function over a finite horizon according to a dynamics model.
Engineer reviewing vector database search results on laptop, embeddings visualization on screen, home office coding session.
PLANNING & CONTROL

What is Trajectory Optimization?

Trajectory optimization is a core planning method in model-based reinforcement learning and control theory.

Trajectory optimization is a planning method that searches for a sequence of actions (a trajectory) that minimizes a defined cost function or maximizes cumulative reward over a finite time horizon, subject to a model of the system's dynamics. It treats planning as a numerical optimization problem, finding the most efficient path from an initial state to a goal state according to the model's predictions. This is a fundamental technique in model-based reinforcement learning (MBRL) and optimal control for tasks like robotics and autonomous systems.

The process typically involves an internal dynamics model—learned or known—that predicts state transitions. Algorithms like the Iterative Linear Quadratic Regulator (iLQR) or Model Predictive Control (MPC) solve this by iteratively refining a candidate trajectory. The optimizer computes gradients through the simulated future to adjust actions, balancing immediate costs against long-term outcomes. This enables sample-efficient planning by 'imagining' outcomes without real-world trial and error, though performance depends heavily on model accuracy to avoid compounding error.

PLANNING METHOD

Core Characteristics of Trajectory Optimization

Trajectory optimization is a planning method that searches for a sequence of actions that minimizes a cost function (or maximizes rewards) over a finite horizon according to a dynamics model, often using gradient-based methods like iLQR.

01

Finite-Horizon Planning

Trajectory optimization solves for an optimal sequence of actions over a defined, finite number of future time steps, known as the planning horizon. This contrasts with infinite-horizon methods common in policy optimization. The horizon length is a critical trade-off: a longer horizon enables better long-term planning but increases computational cost and susceptibility to model error.

02

Model-Based Foundation

The method is fundamentally reliant on a dynamics model (or transition model) that predicts the next state given the current state and action. This model can be:

  • Analytical: Derived from first principles (e.g., physics equations for a robot arm).
  • Learned: A neural network trained on interaction data, as in Model-Based Reinforcement Learning (MBRL). Planning occurs within this internal simulation, enabling sample-efficient evaluation of candidate action sequences without real-world trial-and-error.
03

Cost Function Minimization

The core objective is to find the action trajectory that minimizes a scalar cost function (or equivalently, maximizes cumulative reward). This function encodes the task goals, such as:

  • Reaching a target state with minimal error.
  • Minimizing control effort or energy consumption.
  • Avoiding obstacles or unsafe states via penalty terms. The optimizer's job is to navigate the high-dimensional space of possible trajectories to find the one with the lowest total cost.
04

Gradient-Based Solvers

Efficient solvers leverage gradient information from the dynamics and cost models. The most prominent algorithm is the Iterative Linear Quadratic Regulator (iLQR) and its stochastic variant, iLQG. These methods:

  1. Iteratively linearize the dynamics around a current trajectory guess.
  2. Quadratize the cost function.
  3. Solve the resulting LQR problem efficiently via dynamic programming.
  4. Update the trajectory and repeat. This provides fast convergence to a locally optimal solution.
05

Online Replanning (MPC)

In practice, trajectory optimization is often deployed within a Model Predictive Control (MPC) loop. At each control cycle:

  1. The current state is observed.
  2. A new optimal trajectory is computed from this state.
  3. Only the first action of the planned sequence is executed.
  4. The process repeats at the next time step. This receding horizon control provides robustness to model inaccuracies and unexpected disturbances.
06

Contrast with Policy Search

Trajectory optimization is a planning method, distinct from policy search in reinforcement learning. Key differences:

  • Output: Trajectory optimization outputs a specific action sequence for a given start state. Policy search learns a function (policy) mapping any state to an action.
  • Online Computation: Planning is computationally intensive at runtime. A trained policy offers cheap, constant-time action selection.
  • Use Case: Planning is ideal for problems with accurate models and where conditions vary (e.g., robot arm reaching for different objects). Policies are better for fast reaction in fixed environments.
MODEL-BASED REINFORCEMENT LEARNING

How Trajectory Optimization Works

Trajectory optimization is a core planning technique in model-based reinforcement learning (MBRL) and control theory, where an agent uses an internal model to search for the best sequence of actions.

Trajectory optimization is a planning method that searches for a sequence of actions (a trajectory) that minimizes a defined cost function—or maximizes cumulative rewards—over a finite future horizon. It operates by leveraging a dynamics model (also called a transition model) to predict how actions will influence future states. The process formulates a constrained optimization problem, where the goal is to find the optimal action sequence subject to the constraints imposed by the model's predicted state transitions. This is distinct from policy optimization, as it plans open-loop sequences rather than learning a closed-loop policy function.

Algorithms like the Iterative Linear Quadratic Regulator (iLQR) solve this problem efficiently by iteratively linearizing the dynamics and quadratizing the cost around a nominal trajectory to compute optimal control updates. In Model Predictive Control (MPC), a form of online trajectory optimization, only the first action of the optimized sequence is executed before the agent replans from the new state, providing robustness to model inaccuracies. This method is fundamental for enabling agents to perform complex, multi-step reasoning and physical control using an internal simulation of their environment.

TRAJECTORY OPTIMIZATION

Frequently Asked Questions

A technical FAQ on trajectory optimization, a core planning method in model-based reinforcement learning and robotics that searches for optimal action sequences.

Trajectory optimization is a planning method that searches for a sequence of actions (a trajectory) that minimizes a specified cost function, or maximizes cumulative reward, over a finite time horizon according to a dynamics model. It works by treating the search for an optimal control sequence as a numerical optimization problem. Given an initial state and a model of how the world evolves (the transition model), the algorithm iteratively adjusts the proposed action sequence to reduce the total predicted cost, often using efficient gradient-based methods like the Iterative Linear Quadratic Regulator (iLQR) or shooting/collocation techniques. The output is an open-loop plan of optimal actions, which may be executed directly or used within a Model Predictive Control (MPC) framework for closed-loop control.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.