Inferensys

Glossary

Calibration via Platt

Calibration via Platt, or Platt scaling, is a parametric post-hoc method that fits a logistic regression model to a classifier's raw outputs to produce calibrated probability estimates.
ML engineer working on model compression and quantization, laptop showing performance benchmarks, technical workspace.
MODEL CALIBRATION TECHNIQUES

What is Calibration via Platt?

Calibration via Platt is a common shorthand for applying Platt scaling, a logistic regression-based method, to transform a classifier's scores into calibrated probabilities.

Calibration via Platt, formally known as Platt scaling, is a parametric post-hoc calibration method for binary classifiers. It fits a logistic regression model—specifically, a sigmoid function—to the raw, uncalibrated scores (logits) output by a model. This learned transformation maps these scores to well-calibrated probability estimates that accurately reflect the true likelihood of a positive outcome. The method requires a held-out calibration set, distinct from training and test data, to fit its two parameters (slope and intercept).

The technique is named for its inventor, John Platt, who introduced it for calibrating Support Vector Machine outputs. Its primary advantage is simplicity and low risk of overfitting due to its minimal two-parameter model. However, it assumes the raw scores follow a sigmoidal distribution, which may not hold for all classifiers. For multi-class calibration, the method is often extended as Platt scaling per class or used within an OvR (One-vs-Rest) framework. It is a foundational technique often compared to temperature scaling (simpler) and isotonic regression (more flexible, non-parametric).

CALIBRATION METHOD

Key Characteristics of Platt Scaling

Platt scaling is a parametric, post-hoc calibration technique that applies logistic regression to a classifier's outputs to produce calibrated probability estimates.

01

Parametric Logistic Mapping

Platt scaling fits a logistic regression model with two parameters (a scaling weight and a bias term) to map the classifier's raw scores (logits) to calibrated probabilities. The transformation is defined as: P(y=1 | s) = 1 / (1 + exp(A * s + B)), where s is the classifier's score. This assumes the uncalibrated scores follow a sigmoidal distribution, which is often a valid approximation for many discriminative models like SVMs and neural networks.

02

Requires a Held-Out Calibration Set

The logistic regression parameters (A, B) are not learned during the original model training. They are estimated using a separate calibration set—a held-out dataset not used for training or final testing. This set should be representative of the target distribution. The method minimizes the negative log-likelihood on this set, treating the true class labels as the target for the logistic regression. Using the training data for calibration would lead to overfitting and unreliable probability estimates.

03

Primarily for Binary Classification

The standard Platt scaling formulation is designed for binary classification. It calibrates the scores for the positive class. For multi-class problems, the common extension is the One-vs-Rest (OvR) strategy: calibrate each class against all others independently, then normalize the resulting probabilities across classes (e.g., via softmax). However, this can be computationally intensive and may not guarantee well-calibrated multi-class probabilities as effectively as dedicated multi-class methods like temperature scaling.

04

Post-Hoc and Model-Agnostic

Platt scaling is a post-hoc method, meaning it is applied after a model is fully trained, without modifying its internal parameters. It is also model-agnostic; it works on the scores output by any classifier, including Support Vector Machines (where it was originally developed), boosted trees, and neural networks. This decoupling allows calibration to be treated as a separate pipeline step, facilitating integration into existing MLOps workflows.

05

Risk of Overfitting on Small Calibration Sets

As a parametric method, Platt scaling can overfit when the calibration set is very small (e.g., fewer than 1000 instances). The logistic regression model may learn a mapping that fits noise rather than the true miscalibration pattern. In such cases, a less flexible, non-parametric method like isotonic regression may be more robust. The reliability of Platt scaling is highly dependent on the size and quality of the calibration data.

06

Comparison to Temperature Scaling

For neural networks, temperature scaling is a simpler, more constrained special case of Platt scaling. Temperature scaling uses a single parameter T (temperature) to scale all logits: logits_scaled = logits / T. Platt scaling, with its two parameters (A, B), is more flexible. However, this flexibility can be a disadvantage for neural nets, as the extra degree of freedom can lead to overfitting on the calibration set, whereas temperature scaling's single parameter often provides more reliable calibration with modern deep learning models.

COMPARISON

Platt Scaling vs. Other Calibration Methods

A feature comparison of common post-hoc calibration techniques for binary and multi-class classifiers.

Feature / MetricPlatt Scaling (Sigmoid Calibration)Temperature ScalingIsotonic Regression

Method Type

Parametric

Parametric

Non-Parametric

Underlying Model

Logistic Regression

Single Scalar (Temperature)

Piecewise Constant, Non-Decreasing Function

Primary Use Case

Binary Classification

Multi-Class Classification

Binary & Multi-Class (1-vs-All)

Data Efficiency

Requires ~100+ samples

Requires ~100+ samples

Requires ~1000+ samples

Risk of Overfitting

Low (2 parameters)

Very Low (1 parameter)

Medium (flexible function)

Output Guarantee

Monotonic transformation

Monotonic transformation

Monotonic transformation

Computational Cost

Low (convex optimization)

Very Low (scalar optimization)

Medium (pair-adjacent violators algorithm)

Handles Multi-Modal Distributions

Common Evaluation Metric

Brier Score, ECE, NLL

Brier Score, ECE, NLL

Brier Score, ECE, NLL

Typical Performance (ECE Reduction)

30-70%

20-60%

40-80%

PLATT SCALING

Frequently Asked Questions

Platt scaling, commonly referred to as calibration via Platt, is a foundational technique in machine learning for ensuring a classifier's confidence scores are trustworthy. This FAQ addresses its core mechanics, applications, and relationship to other calibration methods.

Platt scaling is a parametric, post-hoc calibration method that transforms a binary classifier's raw output scores (logits) into well-calibrated probability estimates by fitting a logistic regression model. The technique works by taking the original model's scores on a held-out calibration set and learning two parameters: a scaling factor and a bias term. These parameters adjust the scores via the logistic sigmoid function, σ(ax + b), to produce probabilities that accurately reflect the true likelihood of the positive class. It assumes the uncalibrated scores follow a sigmoidal distribution, which is often a valid approximation for outputs from models like Support Vector Machines (SVMs) and neural networks.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.