The Bellman equation is a recursive mathematical relationship that expresses the value of a state (or state-action pair) as the sum of the immediate reward received and the discounted value of the successor state. Formally, for a Markov Decision Process (MDP), the optimal value function V*(s) satisfies V*(s) = max_a [ R(s,a) + γ Σ_s' P(s'|s,a) V*(s') ], where γ is a discount factor. This decomposition is fundamental to dynamic programming and enables the efficient computation of optimal policies through iterative methods like value iteration and policy iteration.
