The successor representation (SR) is a predictive state representation in reinforcement learning (RL) that encodes the expected future occupancy of states. It decomposes the traditional value function into two components: a successor matrix, which is independent of reward and captures the dynamics of the environment, and a state-reward vector. This factorization allows an agent to rapidly re-evaluate states when rewards change, without relearning the environment's transition model.
Glossary
Successor Representation

What is Successor Representation?
A predictive state representation that factors the value function into a reward-independent successor matrix and a state-reward vector.
The SR provides a middle ground between model-free and model-based RL. Unlike model-free methods that learn a monolithic value, the SR learns a predictive representation of state transitions. Unlike full model-based planning, it avoids expensive online simulations. This makes it highly efficient for corrective action planning, as agents can quickly compute new optimal policies when goals or rewards are updated, enabling flexible and sample-efficient adaptation.
Key Features of Successor Representation
The successor representation (SR) is a predictive state representation that decomposes value learning into a reward-independent model of future state occupancy and a reward function. This decomposition provides unique computational advantages for planning and generalization.
Decomposition of Value
The core innovation of the successor representation is its factorization of the value function V(s). It separates the problem into two components:
- Successor Matrix M(s, s'): A reward-independent model predicting the expected discounted future occupancy of state s' starting from state s. This is defined as M(s, s') = E[∑_{t=0}^{∞} γ^t I(s_t = s') | s_0 = s], where γ is the discount factor.
- State-Reward Vector R(s'): The expected immediate reward for being in each state. The value is then computed as V(s) = ∑_{s'} M(s, s') R(s'). This separation allows the agent to rapidly re-evaluate policies if rewards change, without relearning the dynamics.
Generalization Across Rewards
Because the successor matrix M is independent of the reward function, it enables powerful transfer learning. Once an agent has learned M for a given policy in an environment, it can instantly compute the value function for any new reward function R' using the same matrix: V'(s) = M(s, ·) · R'.
- This is critical for corrective action planning, where the 'cost' of an error defines a new, often sparse, reward signal. The agent can immediately re-plan without additional environment interaction.
- This property makes SR highly sample-efficient for multi-task learning and rapid adaptation to new goals or constraints.
Connection to Temporal Context
The successor representation formalizes the concept of temporal proximity between states. The entry M(s, s') represents how 'close' state s' is in the agent's future, discounted by time.
- It provides a predictive map of the environment under a given policy.
- This map can be seen as a generalization of adjacency in a graph, weighted by the policy and discounting. States that are frequently visited soon after one another have high mutual SR values.
- This structure is foundational for model-based planning without a full transition model, as it directly encodes long-term consequences.
Eigen-Decomposition & Fast Planning
The successor matrix can be eigen-decomposed, revealing the underlying temporal structure of the environment. This decomposition enables very fast planning computations.
- The SR can be expressed as M = (I - γ T)^{-1}, where T is the transition matrix under the policy.
- Using this formulation, computing the new value for a changed reward reduces to a simple linear equation solve or matrix-vector multiplication, bypassing iterative dynamic programming.
- This makes it highly suitable for real-time re-planning in agents that must adjust their execution path after detecting an error.
Bridging Model-Based and Model-Free RL
The successor representation occupies a unique middle ground between model-free and model-based reinforcement learning.
- Like model-free methods: It can be learned directly from experience (via TD learning) without explicitly modeling transition probabilities T(s'|s,a).
- Like model-based methods: It supports flexible prediction and rapid re-evaluation when goals/rewards change, a key feature for planning.
- This hybrid nature is ideal for autonomous agents that need the sample efficiency of model-free learning but the flexible re-planning capability of a model to correct errors.
Successor Features
A powerful extension is successor features, which generalize the SR to linear function approximation and feature spaces.
- Instead of a matrix over states M(s, s'), successor features ψ(s) are vectors where each component corresponds to the expected discounted future occupancy of a feature (e.g., 'has key', 'near door').
- The value is then V(s) = ψ(s) · w, where w is a weight vector representing the reward associated with each feature.
- This allows for generalization across both states and tasks in high-dimensional spaces, making it practical for complex environments where agents must formulate corrective plans based on abstract features.
Successor Representation vs. Other RL Representations
A technical comparison of how the Successor Representation decomposes value prediction versus other core representations in reinforcement learning.
| Representation Feature | Successor Representation (SR) | Model-Based (Dynamics Model) | Model-Free (Value Function) |
|---|---|---|---|
Core Mathematical Object | Successor Matrix M(s, s') | Transition Function T(s'|s,a) | Value Function V(s) or Q(s,a) |
Primary Output | Expected future state occupancy | Predicted next state and reward | Expected cumulative return |
Reward Dependency | Decoupled (M is reward-independent) | Tightly coupled (reward is part of model) | Tightly coupled (value is reward-dependent) |
Generalization to New Rewards | |||
Sample Efficiency for Planning | High (reuses M for new R(s)) | Medium (requires learning T & R) | Low (requires re-learning for new R(s)) |
Supports Zero-Shot Revaluation | |||
Temporal Abstraction | Implicit via discounted occupancy | Explicit via multi-step rollouts | Explicit via Bellman equation |
Common Algorithmic Use Case | Successor Features, Generalized Policy Evaluation | Dyna, Monte Carlo Tree Search (MCTS) | Q-Learning, Policy Gradient, TD Learning |
Frequently Asked Questions
The successor representation is a predictive state representation in reinforcement learning that factors the value function into a reward-independent successor matrix and a state-reward vector. It provides a fundamental bridge between model-based and model-free learning.
The successor representation (SR) is a predictive state representation in reinforcement learning that encodes the expected future occupancy of states, factoring the value function into a reward-independent successor matrix and a state-reward vector. It decomposes the value of a state into two components: the expected discounted future occupancy of all other states (the successor matrix) and the immediate rewards associated with those states. This provides a middle ground between model-based planning, which requires a full model of the environment's transition dynamics, and model-free methods like Q-learning, which learn values directly without explicit dynamics.
Formally, the successor matrix M(s, s') for a given policy π represents the expected discounted number of times the agent will visit state s' in the future, starting from state s. The value function V(s) can then be computed as the dot product of this matrix row and a reward vector r: V(s) = Σ_s' M(s, s') * r(s'). This separation allows for rapid recomputation of values if rewards change, without relearning the entire environment's dynamics.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
The successor representation is a core concept in reinforcement learning and planning. These related terms define the mathematical frameworks, algorithms, and representations that enable agents to predict future states and learn optimal corrective policies.
Markov Decision Process (MDP)
The foundational mathematical framework for modeling sequential decision-making. An MDP is defined by a tuple (S, A, P, R, γ), where:
- S is a set of states.
- A is a set of actions.
- P(s' | s, a) is the state transition probability function.
- R(s, a, s') is the reward function.
- γ is a discount factor. The successor representation is derived directly from the transition dynamics P of an MDP, encoding the expected future occupancy of states under a given policy.
Temporal Difference (TD) Learning
A core family of algorithms for learning value functions in reinforcement learning. TD methods update estimates based on the difference between consecutive predictions (the TD error). The successor representation can be learned efficiently using TD algorithms, as its update rule is a form of TD learning where the "reward" is a state-indicator vector. This allows agents to learn long-term predictions without requiring model-based planning from scratch at each step.
Model-Based Reinforcement Learning
An RL paradigm where the agent learns an explicit model of the environment's dynamics (transition function P) and reward function (R). The successor representation sits between model-based and model-free approaches. It is model-free in that it learns a predictive representation directly from experience, but it is model-like because this representation (the successor matrix) can be rapidly recombined with new reward functions for flexible planning, a key advantage of model-based methods.
Eigenoptions
A set of intrinsic options (temporally extended actions) derived from the eigenvectors of the successor representation's graph Laplacian. Eigenoptions provide a mathematical framework for discovering proto-skills that facilitate exploration. They correspond to directions of slowest mixing in the state space, guiding an agent to cover the environment efficiently. This demonstrates how the SR's representation of state relationships can be decomposed to drive autonomous skill discovery for corrective navigation.
Successor Features
A direct generalization of the successor representation for continuous state spaces or linear function approximation. Successor features ψ(s, a) represent the expected discounted sum of feature vectors (φ) following a state-action pair. The value function can then be computed as a simple linear product: Q(s, a) = ψ(s, a) · w, where w is a weight vector for the features. This decoupling enables generalized policy improvement—rapid adaptation to new reward functions (new w) without relearning the dynamics.
Generalized Value Functions (GVFs)
A formalism within the Horde architecture for learning many predictive questions ("questions") about the future in parallel. A GVF asks: "What is the expected discounted sum of some cumulant (signal) if I follow a certain policy?" The successor representation is a specific, foundational GVF where the cumulant is a state-indicator signal. This framework positions the SR as a building block within a scalable ecosystem of predictive knowledge that an agent can maintain for comprehensive world modeling and planning.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us