A Markov Decision Process (MDP) is a formal mathematical framework for modeling sequential decision-making problems where outcomes are partly random and partly under the control of a decision-maker. It is defined by the tuple (S, A, P, R, γ), representing the state space, action space, state transition probability function, reward function, and discount factor. The defining Markov property ensures the future state depends only on the current state and action, not the full history.




