Uncertainty sampling is an active learning query strategy where the next data point selected for human labeling is the one the current model is most uncertain about predicting. The core assumption is that labeling and learning from these high-uncertainty instances will be most efficient for improving model performance. Common uncertainty measures include predictive entropy, least confidence, and margin sampling, each quantifying different aspects of the model's indecision over possible output classes.
Glossary
Uncertainty Sampling

What is Uncertainty Sampling?
Uncertainty sampling is a core query strategy in active learning that systematically identifies the most informative data points for a model to learn from next.
This method directly targets epistemic uncertainty—the model's reducible ignorance due to a lack of training data in specific regions of the feature space. By iteratively querying uncertain points, the strategy builds a more robust decision boundary with fewer labeled examples than random sampling. It is fundamentally linked to confidence scoring and uncertainty quantification (UQ), as it relies on the model's ability to produce a meaningful probability distribution over its predictions to guide the sampling process effectively.
Key Uncertainty Measures
In active learning, uncertainty sampling selects data points for labeling based on the model's lack of confidence. The specific measure of uncertainty determines which samples are considered most informative for improving the model.
Least Confidence
The least confidence strategy selects the instance where the model's predicted probability for the most likely class is lowest. It is defined as:
- Formula: (1 - P(\hat{y} | x)), where (\hat{y} = \arg\max_y P(y | x)).
- Intuition: The model is least sure about its top prediction.
- Example: For a 3-class problem, if predicted probabilities are [0.5, 0.3, 0.2], the least confidence score is (1 - 0.5 = 0.5).
- Use Case: Simple and computationally efficient, often used as a baseline in classification tasks.
Margin Sampling
Margin sampling selects instances where the difference between the predicted probabilities of the top two most likely classes is smallest. It is defined as:
- Formula: (P(\hat{y}_1 | x) - P(\hat{y}_2 | x)), where (\hat{y}_1) and (\hat{y}_2) are the first and second most probable classes.
- Intuition: A small margin indicates the model is struggling to discriminate between the two best candidates.
- Example: For probabilities [0.5, 0.45, 0.05], the margin is (0.5 - 0.45 = 0.05), indicating high uncertainty.
- Advantage: Often more effective than least confidence as it considers the relationship between the top two predictions.
Entropy
Entropy uses Shannon entropy over the entire predicted probability distribution as the uncertainty measure. It is defined as:
- Formula: (-\sum_{i=1}^{C} P(y_i | x) \log P(y_i | x)), where (C) is the number of classes.
- Intuition: High entropy indicates a flat, uniform distribution where the model is highly uncertain. Low entropy indicates a peaked, confident distribution.
- Example: The maximum entropy for a 3-class problem is (\log(3) \approx 1.099).
- Property: This is the most information-theoretic measure, capturing the overall shape of the predictive distribution.
Bayesian Active Learning (BALD)
Bayesian Active Learning by Disagreement (BALD) is used with Bayesian models to select points where the model parameters are most uncertain about the prediction. It aims to maximize the mutual information between the model parameters and the prediction.
- Intuition: It seeks samples where the model's many possible explanations (parameter settings) disagree the most.
- Implementation: Often approximated using techniques like Monte Carlo Dropout, where multiple stochastic forward passes are performed. The variance in the predictions across passes indicates epistemic uncertainty.
- Formula: (I[y, \omega | x, D] = H[y | x, D] - \mathbb{E}_{p(\omega|D)}[H[y | x, \omega]]), where (\omega) represents model parameters.
- Use Case: Particularly effective for capturing epistemic uncertainty and for deep learning models.
Query-by-Committee (QBC)
Query-by-Committee (QBC) maintains a committee (ensemble) of models and selects data points where the committee members disagree the most in their predictions.
- Core Mechanism: Measures vote entropy or average Kullback-Leibler (KL) divergence between committee members' predictions.
- Vote Entropy: (-\sum_{y} \frac{V(y)}{C} \log \frac{V(y)}{C}), where (V(y)) is the number of committee votes for class (y), and (C) is committee size.
- Intuition: High disagreement signifies the instance lies in a region of the feature space where the learned models have inconsistent hypotheses.
- Advantage: Naturally captures model (epistemic) uncertainty through ensemble diversity.
- Challenge: Requires training and maintaining multiple models.
Expected Model Change
Expected model change strategies select the instance that is expected to cause the greatest change to the current model parameters if its label were known and the model were retrained.
- Intuition: The most informative sample is the one that would most significantly alter the model's decision boundary.
- Approximation: Often measured by the expected gradient length. The sample is chosen where the gradient of the loss (with respect to model parameters) for the possible labels has the largest magnitude.
- Formula (Gradient-based): (\arg\max_x \sum_{y} P(y|x) , ||\nabla_\theta L(x, y; \theta)||).
- Use Case: More computationally intensive than probabilistic measures but can be very sample-efficient, directly targeting model improvement.
- Consideration: Requires differentiable models and efficient gradient computation.
Uncertainty Sampling vs. Other Active Learning Strategies
A comparison of primary active learning query strategies based on their selection criteria, computational cost, and typical use cases.
| Strategy / Feature | Uncertainty Sampling | Query-by-Committee | Expected Model Change | Diversity Sampling |
|---|---|---|---|---|
Primary Selection Criterion | Model's predictive uncertainty (e.g., entropy, margin) | Disagreement among an ensemble of models | Expected gradient length or impact on model parameters | Representativeness or diversity relative to unlabeled pool |
Uncertainty Type Targeted | Primarily predictive (aleatoric/epistemic mix) | Primarily epistemic (model disagreement) | Epistemic (model change) | Data distribution (representativeness) |
Computational Overhead | Low (single model inference) | High (multiple model inferences) | Very High (requires gradient computation) | Medium to High (requires similarity/distance matrix) |
Handles Redundant Data | ||||
Requires Model Ensemble | ||||
Typical Initialization | Random sampling | Random sampling | Random sampling | Clustering-based sampling |
Best For | Rapid accuracy gain with minimal labeling | Exploring model decision boundaries, reducing variance | Maximizing per-sample learning signal | Building a representative training set, covering data manifold |
Key Weakness | Can select outliers or ambiguous examples | High computational cost, sensitive to committee diversity | Extremely computationally expensive | May select easy, non-informative samples |
Practical Applications and Use Cases
Uncertainty sampling is a core active learning strategy for efficiently building high-quality datasets. It is deployed across industries to reduce labeling costs and improve model performance by strategically querying the most informative data points.
Medical Image Annotation
In medical imaging (e.g., MRI, CT scans), expert radiologist time is expensive and scarce. Uncertainty sampling drastically reduces labeling costs by prioritizing ambiguous cases for expert review.
- Key Application: Identifying tumor boundaries or rare pathologies where model predictions are least certain.
- Process: A pre-trained model segments scans; samples with the highest predictive entropy or lowest confidence score in their segmentation mask are sent for annotation.
- Impact: Can reduce required expert labels by 50-70% to achieve diagnostic-grade model performance, accelerating the development of computer-aided diagnosis (CAD) systems.
Autonomous Vehicle Perception
Self-driving systems require models robust to countless 'edge cases' (e.g., obscured pedestrians, unusual vehicles). Uncertainty sampling identifies these critical scenarios from vast unlabeled driving data.
- Key Application: Curating training data for object detection and semantic segmentation models.
- Process: The perception model processes hours of real-world footage; frames where detection confidence scores are low or epistemic uncertainty (via Monte Carlo Dropout) is high are flagged.
- Impact: Ensures the training dataset is enriched with challenging, safety-critical scenarios, improving model reliability and supporting out-of-distribution (OOD) detection for safer deployment.
Content Moderation & Sentiment Analysis
Moderating user-generated content or analyzing nuanced sentiment at scale requires understanding ambiguous language (sarcasm, cultural context). Uncertainty sampling finds these hard-to-classify examples.
- Key Application: Building and refining classifiers for hate speech, toxicity, or complex sentiment (e.g., 'mixed feelings').
- Process: A baseline model scores social media posts or reviews; texts with predictions near the decision boundary (e.g., ~0.5 probability for binary hate speech) are queued for human moderators.
- Impact: Creates a more balanced and challenging evaluation set, improving model performance on subtle cases and reducing false positives/negatives in production.
Document Processing & Entity Recognition
Processing legal contracts, invoices, or medical records involves extracting entities (names, dates, amounts) from diverse, unstructured formats. Uncertainty sampling targets documents where extraction is most error-prone.
- Key Application: Training named entity recognition (NER) and optical character recognition (OCR) models for domain-specific documents.
- Process: An initial model extracts entities; documents with low-confidence spans or high disagreement in deep ensemble predictions are selected for manual verification.
- Impact: Efficiently improves model accuracy on complex document layouts and rare entity types, which is foundational for Retrieval-Augmented Generation (RAG) systems that rely on accurate information extraction.
Scientific Discovery & Material Design
In fields like drug discovery or battery chemistry, experiments are costly. Uncertainty sampling guides which compound or material to synthesize and test next, acting as a Bayesian optimization heuristic.
- Key Application: Virtual screening of molecular libraries or optimizing chemical formulations.
- Process: A Bayesian Neural Network (BNN) predicts a property (e.g., drug efficacy, conductivity) and its associated uncertainty. Candidates with high epistemic uncertainty (model is unsure) or high predicted performance are prioritized for lab testing.
- Impact: Maximizes the information gain per experiment, accelerating the iterative design-test cycle and reducing R&D costs by orders of magnitude.
Interactive Machine Learning & Human-in-the-Loop
Uncertainty sampling is the engine behind interactive ML tools, where a model and a human expert collaborate in real-time to label data or make decisions.
- Key Application: Tools for data scientists to iteratively train models and for annotators to efficiently label complex datasets.
- Process: The model continuously retrains on newly labeled samples and immediately queries the next most uncertain point from the pool, creating a tight feedback loop.
- Impact: Enables rapid prototyping of models with minimal initial data. It directly operationalizes confidence scoring for outputs by using the scores to drive the annotation workflow itself.
Frequently Asked Questions
Uncertainty sampling is a core active learning strategy for efficiently building machine learning datasets. These FAQs address its mechanisms, practical implementation, and relationship to broader confidence scoring concepts.
Uncertainty sampling is an active learning query strategy where a machine learning model selects the next data point for human labeling based on its own uncertainty about the prediction for that point. It works by deploying a partially trained model on a pool of unlabeled data, calculating an uncertainty measure (like predictive entropy or margin) for each sample, and requesting a label for the sample where the model is most uncertain. This label is then added to the training set, and the model is retrained, creating a feedback loop that prioritizes informative data.
Core Mechanism:
- Train Initial Model: A model is trained on a small, initial labeled dataset.
- Score Unlabeled Pool: The model predicts on a large pool of unlabeled data.
- Calculate Uncertainty: An uncertainty metric is computed for each prediction.
- Query for Label: The sample with the highest uncertainty is sent to a human oracle for labeling.
- Retrain: The new labeled sample is added to the training set, and the model is retrained.
This cycle aims to maximize learning efficiency by minimizing the number of expensive labels needed to achieve a target performance level.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Uncertainty sampling operates within a broader ecosystem of techniques for measuring and managing model confidence. These related concepts define the mathematical frameworks, evaluation metrics, and complementary strategies used to quantify and act upon predictive uncertainty.
Uncertainty Quantification (UQ)
Uncertainty Quantification (UQ) is the overarching field of machine learning concerned with measuring and interpreting the different types of uncertainty inherent in a model's predictions. It provides the theoretical foundation for uncertainty sampling. Key distinctions include:
- Aleatoric Uncertainty: Irreducible noise inherent in the data (e.g., sensor error, label ambiguity).
- Epistemic Uncertainty: Reducible uncertainty from a lack of model knowledge, often due to limited training data. UQ methods, such as Bayesian Neural Networks or Deep Ensembles, generate the uncertainty estimates that active learning strategies like uncertainty sampling utilize to select informative data points for labeling.
Selective Classification
Selective Classification, also known as classification with a rejection option, is a paradigm where a model is allowed to abstain from making a prediction on inputs where its confidence is below a chosen threshold. It is a direct application of confidence scores for risk mitigation.
- In production, a model using selective classification will only output predictions it deems reliable, passing low-confidence cases to a human expert or a fallback system.
- This creates a risk-coverage trade-off: increasing the confidence threshold improves accuracy (lower risk) but reduces the fraction of samples on which a prediction is made (lower coverage). Uncertainty sampling can be used to acquire labels specifically for these abstained, high-uncertainty regions.
Expected Calibration Error (ECE)
Expected Calibration Error (ECE) is a key metric for evaluating the quality of a model's confidence scores. It quantifies miscalibration—the discrepancy between predicted confidence and empirical accuracy.
- Calculation involves binning predictions based on their reported confidence (e.g., 0.9-1.0, 0.8-0.9).
- For each bin, compute the difference between the average confidence and the actual accuracy of predictions in that bin.
- ECE is the weighted average of these absolute differences. A well-calibrated model has an ECE near zero, meaning a confidence score of 0.9 corresponds to a 90% chance of being correct. Poor calibration undermines the effectiveness of uncertainty sampling.
Conformal Prediction
Conformal Prediction is a model-agnostic, distribution-free framework that produces prediction sets (rather than single-point predictions) with guaranteed statistical coverage. It provides a rigorous, frequentist approach to uncertainty.
- For a user-specified confidence level (e.g., 90%), conformal prediction guarantees that the true label will be contained within the generated set for at least 90% of new samples.
- It works by calculating a nonconformity score on a held-out calibration set to determine the prediction set size. This framework is complementary to probabilistic uncertainty sampling, offering a different, provably valid perspective on uncertainty for decision-making.
Out-of-Distribution (OOD) Detection
Out-of-Distribution (OOD) Detection is the task of identifying whether a given input sample is statistically different from the data distribution the model was trained on. It is a critical safety component for deployed models.
- Models often make overconfident, incorrect predictions on OOD data because the softmax output is not a reliable measure of epistemic uncertainty.
- Specialized scores (e.g., based on Mahalanobis distance, maximum softmax probability, or energy-based models) are used to flag OOD samples. Uncertainty sampling strategies can be adapted to prioritize acquiring labels for detected OOD examples, helping to expand the model's known operational domain.
Deep Ensemble
A Deep Ensemble is a powerful and practical method for uncertainty estimation. It involves training multiple neural network models (e.g., 5-10) with different random initializations on the same dataset.
- The mean of the ensemble's predictions is typically more accurate than any single model.
- The variance (disagreement) among the models' predictions is a robust measure of epistemic uncertainty. High variance indicates regions where the models lack knowledge due to limited data.
- This variance measure is a highly effective query strategy for uncertainty sampling, often outperforming single-model entropy-based methods.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us