Distributional Reinforcement Learning models the return distribution, the full probability distribution of cumulative future rewards, instead of just its expectation (the value function). This approach captures the inherent aleatoric uncertainty in environments, such as sensor noise or stochastic dynamics, which is critical for robust robotics. By learning this distribution, algorithms can derive richer representations and more stable learning signals, moving beyond a single scalar value estimate.




