Calibration error quantifies the discrepancy between a machine learning model's predicted confidence scores and the true empirical frequencies of outcomes. A perfectly calibrated classifier's predicted probability for a class (e.g., 80%) should match its actual accuracy for all instances where it predicts that probability. High calibration error indicates overconfidence or underconfidence, which can lead to poor downstream decisions in autonomous systems that rely on probabilistic thresholds.
