Reward normalization is a preprocessing technique in reinforcement learning where the scalar reward signals received by an agent are scaled or standardized—typically by subtracting a running mean and dividing by a running standard deviation—to have a consistent statistical distribution. This stabilization is critical for preventing issues like exploding gradients and maintaining stable learning rates across diverse environments, especially when reward magnitudes vary unpredictably. It is a foundational engineering practice for algorithms like Proximal Policy Optimization (PPO) and Deep Q-Networks (DQN).
