Experience replay is a data storage and sampling mechanism in reinforcement learning where an agent's past experiences—each a tuple of (state, action, reward, next state, done flag)—are stored in a fixed-size buffer called a replay buffer or replay memory. During training, mini-batches of experiences are randomly sampled from this buffer to update the agent's policy or value networks. This random sampling breaks the temporal correlations inherent in sequential, on-policy data, which is crucial for stabilizing the training of deep neural networks and preventing catastrophic forgetting of past experiences.




