Glossary

Successor Representation

Successor representation is a predictive state encoding in reinforcement learning that factors the value function into a reward-independent successor matrix and a state-reward vector.

Get in touch Learn more

Engineer reviewing vector database search results on laptop, embeddings visualization on screen, home office coding session.

REINFORCEMENT LEARNING

What is Successor Representation?

A predictive state representation that factors the value function into a reward-independent successor matrix and a state-reward vector.

The successor representation (SR) is a predictive state representation in reinforcement learning (RL) that encodes the expected future occupancy of states. It decomposes the traditional value function into two components: a successor matrix, which is independent of reward and captures the dynamics of the environment, and a state-reward vector. This factorization allows an agent to rapidly re-evaluate states when rewards change, without relearning the environment's transition model.

The SR provides a middle ground between model-free and model-based RL. Unlike model-free methods that learn a monolithic value, the SR learns a predictive representation of state transitions. Unlike full model-based planning, it avoids expensive online simulations. This makes it highly efficient for corrective action planning, as agents can quickly compute new optimal policies when goals or rewards are updated, enabling flexible and sample-efficient adaptation.

CORRECTIVE ACTION PLANNING

Key Features of Successor Representation

The successor representation (SR) is a predictive state representation that decomposes value learning into a reward-independent model of future state occupancy and a reward function. This decomposition provides unique computational advantages for planning and generalization.

Decomposition of Value

The core innovation of the successor representation is its factorization of the value function V(s). It separates the problem into two components:

Successor Matrix M(s, s'): A reward-independent model predicting the expected discounted future occupancy of state s' starting from state s. This is defined as M(s, s') = E[∑_{t=0}^{∞} γ^t I(s_t = s') | s_0 = s], where γ is the discount factor.
State-Reward Vector R(s'): The expected immediate reward for being in each state. The value is then computed as V(s) = ∑_{s'} M(s, s') R(s'). This separation allows the agent to rapidly re-evaluate policies if rewards change, without relearning the dynamics.

Generalization Across Rewards

Because the successor matrix M is independent of the reward function, it enables powerful transfer learning. Once an agent has learned M for a given policy in an environment, it can instantly compute the value function for any new reward function R' using the same matrix: V'(s) = M(s, ·) · R'.

This is critical for corrective action planning, where the 'cost' of an error defines a new, often sparse, reward signal. The agent can immediately re-plan without additional environment interaction.
This property makes SR highly sample-efficient for multi-task learning and rapid adaptation to new goals or constraints.

Connection to Temporal Context

The successor representation formalizes the concept of temporal proximity between states. The entry M(s, s') represents how 'close' state s' is in the agent's future, discounted by time.

It provides a predictive map of the environment under a given policy.
This map can be seen as a generalization of adjacency in a graph, weighted by the policy and discounting. States that are frequently visited soon after one another have high mutual SR values.
This structure is foundational for model-based planning without a full transition model, as it directly encodes long-term consequences.

Eigen-Decomposition & Fast Planning

The successor matrix can be eigen-decomposed, revealing the underlying temporal structure of the environment. This decomposition enables very fast planning computations.

The SR can be expressed as M = (I - γ T)^{-1}, where T is the transition matrix under the policy.
Using this formulation, computing the new value for a changed reward reduces to a simple linear equation solve or matrix-vector multiplication, bypassing iterative dynamic programming.
This makes it highly suitable for real-time re-planning in agents that must adjust their execution path after detecting an error.

Bridging Model-Based and Model-Free RL

The successor representation occupies a unique middle ground between model-free and model-based reinforcement learning.

Like model-free methods: It can be learned directly from experience (via TD learning) without explicitly modeling transition probabilities T(s'|s,a).
Like model-based methods: It supports flexible prediction and rapid re-evaluation when goals/rewards change, a key feature for planning.
This hybrid nature is ideal for autonomous agents that need the sample efficiency of model-free learning but the flexible re-planning capability of a model to correct errors.

Successor Features

A powerful extension is successor features, which generalize the SR to linear function approximation and feature spaces.

Instead of a matrix over states M(s, s'), successor features ψ(s) are vectors where each component corresponds to the expected discounted future occupancy of a feature (e.g., 'has key', 'near door').
The value is then V(s) = ψ(s) · w, where w is a weight vector representing the reward associated with each feature.
This allows for generalization across both states and tasks in high-dimensional spaces, making it practical for complex environments where agents must formulate corrective plans based on abstract features.

REPRESENTATION COMPARISON

Successor Representation vs. Other RL Representations

A technical comparison of how the Successor Representation decomposes value prediction versus other core representations in reinforcement learning.

Representation Feature	Successor Representation (SR)	Model-Based (Dynamics Model)	Model-Free (Value Function)
Core Mathematical Object	Successor Matrix M(s, s')	Transition Function T(s'\|s,a)	Value Function V(s) or Q(s,a)
Primary Output	Expected future state occupancy	Predicted next state and reward	Expected cumulative return
Reward Dependency	Decoupled (M is reward-independent)	Tightly coupled (reward is part of model)	Tightly coupled (value is reward-dependent)
Generalization to New Rewards
Sample Efficiency for Planning	High (reuses M for new R(s))	Medium (requires learning T & R)	Low (requires re-learning for new R(s))
Supports Zero-Shot Revaluation
Temporal Abstraction	Implicit via discounted occupancy	Explicit via multi-step rollouts	Explicit via Bellman equation
Common Algorithmic Use Case	Successor Features, Generalized Policy Evaluation	Dyna, Monte Carlo Tree Search (MCTS)	Q-Learning, Policy Gradient, TD Learning

SUCCESSOR REPRESENTATION

Frequently Asked Questions

The successor representation is a predictive state representation in reinforcement learning that factors the value function into a reward-independent successor matrix and a state-reward vector. It provides a fundamental bridge between model-based and model-free learning.

The successor representation (SR) is a predictive state representation in reinforcement learning that encodes the expected future occupancy of states, factoring the value function into a reward-independent successor matrix and a state-reward vector. It decomposes the value of a state into two components: the expected discounted future occupancy of all other states (the successor matrix) and the immediate rewards associated with those states. This provides a middle ground between model-based planning, which requires a full model of the environment's transition dynamics, and model-free methods like Q-learning, which learn values directly without explicit dynamics.

Formally, the successor matrix M(s, s') for a given policy π represents the expected discounted number of times the agent will visit state s' in the future, starting from state s. The value function V(s) can then be computed as the dot product of this matrix row and a reward vector r: V(s) = Σ_s' M(s, s') * r(s'). This separation allows for rapid recomputation of values if rewards change, without relearning the entire environment's dynamics.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

CORRECTIVE ACTION PLANNING

Related Terms

The successor representation is a core concept in reinforcement learning and planning. These related terms define the mathematical frameworks, algorithms, and representations that enable agents to predict future states and learn optimal corrective policies.

Markov Decision Process (MDP)

The foundational mathematical framework for modeling sequential decision-making. An MDP is defined by a tuple (S, A, P, R, γ), where:

S is a set of states.
A is a set of actions.
P(s' | s, a) is the state transition probability function.
R(s, a, s') is the reward function.
γ is a discount factor. The successor representation is derived directly from the transition dynamics P of an MDP, encoding the expected future occupancy of states under a given policy.

Temporal Difference (TD) Learning

A core family of algorithms for learning value functions in reinforcement learning. TD methods update estimates based on the difference between consecutive predictions (the TD error). The successor representation can be learned efficiently using TD algorithms, as its update rule is a form of TD learning where the "reward" is a state-indicator vector. This allows agents to learn long-term predictions without requiring model-based planning from scratch at each step.

Model-Based Reinforcement Learning

An RL paradigm where the agent learns an explicit model of the environment's dynamics (transition function P) and reward function (R). The successor representation sits between model-based and model-free approaches. It is model-free in that it learns a predictive representation directly from experience, but it is model-like because this representation (the successor matrix) can be rapidly recombined with new reward functions for flexible planning, a key advantage of model-based methods.

Eigenoptions

A set of intrinsic options (temporally extended actions) derived from the eigenvectors of the successor representation's graph Laplacian. Eigenoptions provide a mathematical framework for discovering proto-skills that facilitate exploration. They correspond to directions of slowest mixing in the state space, guiding an agent to cover the environment efficiently. This demonstrates how the SR's representation of state relationships can be decomposed to drive autonomous skill discovery for corrective navigation.

Successor Features

A direct generalization of the successor representation for continuous state spaces or linear function approximation. Successor features ψ(s, a) represent the expected discounted sum of feature vectors (φ) following a state-action pair. The value function can then be computed as a simple linear product: Q(s, a) = ψ(s, a) · w, where w is a weight vector for the features. This decoupling enables generalized policy improvement—rapid adaptation to new reward functions (new w) without relearning the dynamics.

Generalized Value Functions (GVFs)

A formalism within the Horde architecture for learning many predictive questions ("questions") about the future in parallel. A GVF asks: "What is the expected discounted sum of some cumulant (signal) if I follow a certain policy?" The successor representation is a specific, foundational GVF where the cumulant is a state-indicator signal. This framework positions the SR as a building block within a scalable ecosystem of predictive knowledge that an agent can maintain for comprehensive world modeling and planning.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.

Talk to Us

Successor Representation

What is Successor Representation?

Key Features of Successor Representation

Decomposition of Value

Generalization Across Rewards

Connection to Temporal Context

Eigen-Decomposition & Fast Planning

Bridging Model-Based and Model-Free RL

Successor Features

Successor Representation vs. Other RL Representations

Frequently Asked Questions

Intelligent Analysis, Decision & Execution

Search across company data

Automate internal workflows

Add AI to products and internal tools

Prasad Kumkar

Partnered with leading AI, data, and software stack.

Custom AI workflows for your Business

Review the use case

Pick the right approach

Build the first useful version

Improve from there