Experience replay is a reinforcement learning technique where an agent stores its past experiences—each a tuple of (state, action, reward, next state)—in a finite buffer called a replay buffer. During training, batches of these experiences are sampled randomly and used to update the agent's policy or value function. This random sampling breaks the strong temporal correlations inherent in sequential on-policy learning, which dramatically improves training stability and sample efficiency by allowing experiences to be reused multiple times.
