Q-Learning is a model-free, off-policy reinforcement learning algorithm that learns an optimal action-selection policy by iteratively estimating the action-value function (Q-function). This function, defined by the Bellman equation, represents the expected cumulative future reward for taking a given action in a specific state and thereafter following the optimal policy. The algorithm does not require a model of the environment's dynamics, learning instead from sampled experiences.




