In reinforcement learning (RL) and automated planning, a policy is a mapping from states (or belief states) to actions that defines an agent's strategy for selecting actions to maximize its expected cumulative reward. In Markov Decision Processes (MDPs), it is a function π(s) → a, while in Partially Observable MDPs (POMDPs), it maps belief states to actions. Policies can be deterministic, specifying a single action per state, or stochastic, defining a probability distribution over actions.
