Inferensys

Glossary

Recall (Sensitivity)

Recall, also known as sensitivity or true positive rate, is a classification metric that measures the proportion of actual positive instances a model correctly identifies, quantifying its ability to find all relevant cases.
ML engineer managing model versions on laptop, version history visible, technical Git-like workflow.
PERFORMANCE METRIC DESIGN

What is Recall (Sensitivity)?

Recall, also known as sensitivity or true positive rate, is a fundamental classification metric that measures a model's ability to identify all relevant instances of a positive class.

Recall (Sensitivity) is a classification performance metric that calculates the proportion of actual positive instances correctly identified by a model. Formally, it is defined as True Positives / (True Positives + False Negatives). A high recall score indicates the model is effective at finding most of the relevant cases, minimizing false negatives. This metric is critical in domains like medical diagnosis or fraud detection, where missing a positive case (a disease or fraudulent transaction) is costlier than a false alarm.

Recall exists in a fundamental trade-off with precision, which measures the model's exactness when it makes a positive prediction. This trade-off is visualized using a Precision-Recall curve. To balance both concerns, practitioners often use the F1 Score, the harmonic mean of precision and recall. Evaluating recall is essential within Evaluation-Driven Development to ensure models meet specific business requirements for completeness, especially when dealing with imbalanced datasets where the positive class is rare.

BINARY CLASSIFICATION METRICS

Recall vs. Precision: Key Differences

A direct comparison of two fundamental classification metrics, highlighting their distinct purposes, mathematical definitions, and trade-offs in model evaluation.

FeatureRecall (Sensitivity)Precision

Core Definition

Proportion of actual positives correctly identified.

Proportion of positive predictions that are correct.

Primary Question

Of all the relevant items, how many did the model find?

Of all the items the model flagged, how many are actually relevant?

Mathematical Formula

TP / (TP + FN)

TP / (TP + FP)

Focus (Confusion Matrix)

False Negatives (FN)

False Positives (FP)

Ideal Use Case

When missing a positive case is costly (e.g., disease screening, fraud detection).

When a false alarm is costly (e.g., spam filtering, quality control).

Trade-off Relationship

Increasing recall typically decreases precision.

Increasing precision typically decreases recall.

Sensitivity to Class Imbalance

High; focuses on the minority (positive) class.

Moderate; can be high if the model is conservative.

Alternative Names

Sensitivity, True Positive Rate (TPR), Hit Rate

Positive Predictive Value (PPV)

RECALL (SENSITIVITY)

Key Applications and Use Cases

Recall is a critical metric for evaluating a model's ability to identify all relevant instances of a target class. Its importance is paramount in domains where missing a positive case is more costly than a false alarm.

01

Medical Diagnostics & Disease Screening

Recall is the primary optimization target in life-critical diagnostic systems. A high-recall model ensures minimized false negatives, meaning fewer missed cases of disease.

  • Example: A model screening for malignant tumors in radiology scans must prioritize finding all potential cancers, even at the cost of some false positives that can be ruled out by further tests.
  • Trade-off: Optimizing for recall often involves lowering the classification threshold, accepting a higher rate of false positives to capture nearly all true positives.
02

Information Retrieval & Search Systems

In search and retrieval-augmented generation (RAG) pipelines, recall measures the system's ability to retrieve all relevant documents from a corpus for a given query.

  • Core Function: A high-recall retrieval system ensures the foundational context provided to a language model is comprehensive, reducing the risk of answer omission or hallucination due to missing data.
  • Evaluation: Recall@k (e.g., Recall@10) is a standard metric, measuring the proportion of relevant documents found within the top k retrieved results.
03

Fraud Detection & Cybersecurity

Security systems are tuned for high recall to flag all potential threats, as the cost of missing a single fraudulent transaction or intrusion can be catastrophic.

  • Application: Anomaly detection models in network security or financial transaction monitoring are designed to be highly sensitive to suspicious patterns.
  • Operational Reality: The high volume of alerts generated (false positives) is then triaged by secondary systems or human analysts, a workflow justified by the critical need for high recall.
04

Legal e-Discovery & Document Review

In legal proceedings, models used for electronic discovery must achieve near-perfect recall to ensure no exculpatory or inculpatory evidence is overlooked.

  • Process: Machine learning classifiers sift through millions of documents to identify those relevant to a case. A missed document (false negative) could constitute a failure to produce evidence.
  • Standard: Legal teams often require recall levels exceeding 95% before deeming an AI-assisted review process defensible in court.
05

Manufacturing Defect Inspection

Automated visual inspection systems on production lines are calibrated for high recall to prevent defective products from reaching customers.

  • Quality Control: A model inspecting circuit boards or pharmaceutical packaging must catch all units with critical flaws, even if it means occasionally rejecting a functional item.
  • Cost Analysis: The financial and reputational cost of a defective product in the field typically far outweighs the cost of scrapping or re-checking a small percentage of false positives.
06

The Precision-Recall Trade-off & F1 Score

Recall cannot be evaluated in isolation; it exists in a fundamental trade-off with precision. Improving recall usually reduces precision, as the model casts a wider net.

  • Balancing Metric: The F1 Score is the harmonic mean of precision and recall, providing a single metric to optimize when both false negatives and false positives are important, but one cannot dominate the other.
  • Strategic Choice: The optimal point on the Precision-Recall curve is determined by the specific business cost function: is missing a positive case (low recall) or acting on a false alarm (low precision) more expensive?
RECALL (SENSITIVITY)

Frequently Asked Questions

Recall, also known as sensitivity or true positive rate, is a fundamental classification metric for evaluating a model's ability to identify all relevant positive instances. These questions address its calculation, trade-offs, and application in real-world machine learning systems.

Recall, also known as sensitivity or the true positive rate (TPR), is a classification performance metric that measures the proportion of actual positive instances that a model correctly identifies. It is calculated as the number of true positives (TP) divided by the sum of true positives and false negatives (FN): Recall = TP / (TP + FN). This formula quantifies a model's ability to find all relevant cases, making it critical in domains where missing a positive instance is costly, such as medical diagnosis or fraud detection. A recall of 1.0 (or 100%) indicates the model found every single positive case in the dataset, while a recall of 0.0 means it found none.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.