Inferensys

Glossary

Successor Representation

Successor representation is a predictive state encoding in reinforcement learning that factors the value function into a reward-independent successor matrix and a state-reward vector.
Engineer reviewing vector database search results on laptop, embeddings visualization on screen, home office coding session.
REINFORCEMENT LEARNING

What is Successor Representation?

A predictive state representation that factors the value function into a reward-independent successor matrix and a state-reward vector.

The successor representation (SR) is a predictive state representation in reinforcement learning (RL) that encodes the expected future occupancy of states. It decomposes the traditional value function into two components: a successor matrix, which is independent of reward and captures the dynamics of the environment, and a state-reward vector. This factorization allows an agent to rapidly re-evaluate states when rewards change, without relearning the environment's transition model.

The SR provides a middle ground between model-free and model-based RL. Unlike model-free methods that learn a monolithic value, the SR learns a predictive representation of state transitions. Unlike full model-based planning, it avoids expensive online simulations. This makes it highly efficient for corrective action planning, as agents can quickly compute new optimal policies when goals or rewards are updated, enabling flexible and sample-efficient adaptation.

CORRECTIVE ACTION PLANNING

Key Features of Successor Representation

The successor representation (SR) is a predictive state representation that decomposes value learning into a reward-independent model of future state occupancy and a reward function. This decomposition provides unique computational advantages for planning and generalization.

01

Decomposition of Value

The core innovation of the successor representation is its factorization of the value function V(s). It separates the problem into two components:

  • Successor Matrix M(s, s'): A reward-independent model predicting the expected discounted future occupancy of state s' starting from state s. This is defined as M(s, s') = E[∑_{t=0}^{∞} γ^t I(s_t = s') | s_0 = s], where γ is the discount factor.
  • State-Reward Vector R(s'): The expected immediate reward for being in each state. The value is then computed as V(s) = ∑_{s'} M(s, s') R(s'). This separation allows the agent to rapidly re-evaluate policies if rewards change, without relearning the dynamics.
02

Generalization Across Rewards

Because the successor matrix M is independent of the reward function, it enables powerful transfer learning. Once an agent has learned M for a given policy in an environment, it can instantly compute the value function for any new reward function R' using the same matrix: V'(s) = M(s, ·) · R'.

  • This is critical for corrective action planning, where the 'cost' of an error defines a new, often sparse, reward signal. The agent can immediately re-plan without additional environment interaction.
  • This property makes SR highly sample-efficient for multi-task learning and rapid adaptation to new goals or constraints.
03

Connection to Temporal Context

The successor representation formalizes the concept of temporal proximity between states. The entry M(s, s') represents how 'close' state s' is in the agent's future, discounted by time.

  • It provides a predictive map of the environment under a given policy.
  • This map can be seen as a generalization of adjacency in a graph, weighted by the policy and discounting. States that are frequently visited soon after one another have high mutual SR values.
  • This structure is foundational for model-based planning without a full transition model, as it directly encodes long-term consequences.
04

Eigen-Decomposition & Fast Planning

The successor matrix can be eigen-decomposed, revealing the underlying temporal structure of the environment. This decomposition enables very fast planning computations.

  • The SR can be expressed as M = (I - γ T)^{-1}, where T is the transition matrix under the policy.
  • Using this formulation, computing the new value for a changed reward reduces to a simple linear equation solve or matrix-vector multiplication, bypassing iterative dynamic programming.
  • This makes it highly suitable for real-time re-planning in agents that must adjust their execution path after detecting an error.
05

Bridging Model-Based and Model-Free RL

The successor representation occupies a unique middle ground between model-free and model-based reinforcement learning.

  • Like model-free methods: It can be learned directly from experience (via TD learning) without explicitly modeling transition probabilities T(s'|s,a).
  • Like model-based methods: It supports flexible prediction and rapid re-evaluation when goals/rewards change, a key feature for planning.
  • This hybrid nature is ideal for autonomous agents that need the sample efficiency of model-free learning but the flexible re-planning capability of a model to correct errors.
06

Successor Features

A powerful extension is successor features, which generalize the SR to linear function approximation and feature spaces.

  • Instead of a matrix over states M(s, s'), successor features ψ(s) are vectors where each component corresponds to the expected discounted future occupancy of a feature (e.g., 'has key', 'near door').
  • The value is then V(s) = ψ(s) · w, where w is a weight vector representing the reward associated with each feature.
  • This allows for generalization across both states and tasks in high-dimensional spaces, making it practical for complex environments where agents must formulate corrective plans based on abstract features.
REPRESENTATION COMPARISON

Successor Representation vs. Other RL Representations

A technical comparison of how the Successor Representation decomposes value prediction versus other core representations in reinforcement learning.

Representation FeatureSuccessor Representation (SR)Model-Based (Dynamics Model)Model-Free (Value Function)

Core Mathematical Object

Successor Matrix M(s, s')

Transition Function T(s'|s,a)

Value Function V(s) or Q(s,a)

Primary Output

Expected future state occupancy

Predicted next state and reward

Expected cumulative return

Reward Dependency

Decoupled (M is reward-independent)

Tightly coupled (reward is part of model)

Tightly coupled (value is reward-dependent)

Generalization to New Rewards

Sample Efficiency for Planning

High (reuses M for new R(s))

Medium (requires learning T & R)

Low (requires re-learning for new R(s))

Supports Zero-Shot Revaluation

Temporal Abstraction

Implicit via discounted occupancy

Explicit via multi-step rollouts

Explicit via Bellman equation

Common Algorithmic Use Case

Successor Features, Generalized Policy Evaluation

Dyna, Monte Carlo Tree Search (MCTS)

Q-Learning, Policy Gradient, TD Learning

SUCCESSOR REPRESENTATION

Frequently Asked Questions

The successor representation is a predictive state representation in reinforcement learning that factors the value function into a reward-independent successor matrix and a state-reward vector. It provides a fundamental bridge between model-based and model-free learning.

The successor representation (SR) is a predictive state representation in reinforcement learning that encodes the expected future occupancy of states, factoring the value function into a reward-independent successor matrix and a state-reward vector. It decomposes the value of a state into two components: the expected discounted future occupancy of all other states (the successor matrix) and the immediate rewards associated with those states. This provides a middle ground between model-based planning, which requires a full model of the environment's transition dynamics, and model-free methods like Q-learning, which learn values directly without explicit dynamics.

Formally, the successor matrix M(s, s') for a given policy π represents the expected discounted number of times the agent will visit state s' in the future, starting from state s. The value function V(s) can then be computed as the dot product of this matrix row and a reward vector r: V(s) = Σ_s' M(s, s') * r(s'). This separation allows for rapid recomputation of values if rewards change, without relearning the entire environment's dynamics.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.