A Markov Decision Process (MDP) is a discrete-time stochastic control process that provides a formal model for decision-making in situations where outcomes are partly random and partly under the control of a decision-maker. It is defined by a tuple (S, A, P, R, γ), where S is a set of states, A is a set of actions, P defines transition probabilities between states, R is a reward function, and γ is a discount factor. The core Markov property ensures the future state depends only on the current state and action, not the full history.
