Pessimistic exploration, also known as conservative model-based reinforcement learning, is a strategy where an agent's policy is constrained to avoid exploiting regions of the state space where its learned dynamics model has high predictive uncertainty. This approach prioritizes robustness and safety over aggressive reward-seeking, making it particularly critical for offline reinforcement learning and real-world applications where trial-and-error exploration is costly or dangerous. The agent typically uses uncertainty quantification from its model to penalize or restrict actions leading to uncertain future states.
