The Bellman equation is a recursive decomposition of a value function in sequential decision-making, expressing the value of a state (or state-action pair) as the immediate reward plus the discounted value of the expected future states. Formally, for a Markov Decision Process (MDP), the optimal value function V*(s) satisfies V*(s) = max_a [ R(s,a) + γ Σ_s' P(s'|s,a) V*(s') ], where γ is the discount factor. This recursion is the cornerstone of dynamic programming and most reinforcement learning (RL) algorithms, enabling the efficient computation of optimal policies.




