Glossary

Recall (Sensitivity)

Recall, also known as sensitivity or true positive rate, is a classification metric that measures the proportion of actual positive instances a model correctly identifies, quantifying its ability to find all relevant cases.

Get in touch Learn more

ML engineer managing model versions on laptop, version history visible, technical Git-like workflow.

PERFORMANCE METRIC DESIGN

What is Recall (Sensitivity)?

Recall, also known as sensitivity or true positive rate, is a fundamental classification metric that measures a model's ability to identify all relevant instances of a positive class.

Recall (Sensitivity) is a classification performance metric that calculates the proportion of actual positive instances correctly identified by a model. Formally, it is defined as True Positives / (True Positives + False Negatives). A high recall score indicates the model is effective at finding most of the relevant cases, minimizing false negatives. This metric is critical in domains like medical diagnosis or fraud detection, where missing a positive case (a disease or fraudulent transaction) is costlier than a false alarm.

Recall exists in a fundamental trade-off with precision, which measures the model's exactness when it makes a positive prediction. This trade-off is visualized using a Precision-Recall curve. To balance both concerns, practitioners often use the F1 Score, the harmonic mean of precision and recall. Evaluating recall is essential within Evaluation-Driven Development to ensure models meet specific business requirements for completeness, especially when dealing with imbalanced datasets where the positive class is rare.

BINARY CLASSIFICATION METRICS

Recall vs. Precision: Key Differences

A direct comparison of two fundamental classification metrics, highlighting their distinct purposes, mathematical definitions, and trade-offs in model evaluation.

Feature	Recall (Sensitivity)	Precision
Core Definition	Proportion of actual positives correctly identified.	Proportion of positive predictions that are correct.
Primary Question	Of all the relevant items, how many did the model find?	Of all the items the model flagged, how many are actually relevant?
Mathematical Formula	TP / (TP + FN)	TP / (TP + FP)
Focus (Confusion Matrix)	False Negatives (FN)	False Positives (FP)
Ideal Use Case	When missing a positive case is costly (e.g., disease screening, fraud detection).	When a false alarm is costly (e.g., spam filtering, quality control).
Trade-off Relationship	Increasing recall typically decreases precision.	Increasing precision typically decreases recall.
Sensitivity to Class Imbalance	High; focuses on the minority (positive) class.	Moderate; can be high if the model is conservative.
Alternative Names	Sensitivity, True Positive Rate (TPR), Hit Rate	Positive Predictive Value (PPV)

RECALL (SENSITIVITY)

Key Applications and Use Cases

Recall is a critical metric for evaluating a model's ability to identify all relevant instances of a target class. Its importance is paramount in domains where missing a positive case is more costly than a false alarm.

Medical Diagnostics & Disease Screening

Recall is the primary optimization target in life-critical diagnostic systems. A high-recall model ensures minimized false negatives, meaning fewer missed cases of disease.

Example: A model screening for malignant tumors in radiology scans must prioritize finding all potential cancers, even at the cost of some false positives that can be ruled out by further tests.
Trade-off: Optimizing for recall often involves lowering the classification threshold, accepting a higher rate of false positives to capture nearly all true positives.

Information Retrieval & Search Systems

In search and retrieval-augmented generation (RAG) pipelines, recall measures the system's ability to retrieve all relevant documents from a corpus for a given query.

Core Function: A high-recall retrieval system ensures the foundational context provided to a language model is comprehensive, reducing the risk of answer omission or hallucination due to missing data.
Evaluation: Recall@k (e.g., Recall@10) is a standard metric, measuring the proportion of relevant documents found within the top k retrieved results.

Fraud Detection & Cybersecurity

Security systems are tuned for high recall to flag all potential threats, as the cost of missing a single fraudulent transaction or intrusion can be catastrophic.

Application: Anomaly detection models in network security or financial transaction monitoring are designed to be highly sensitive to suspicious patterns.
Operational Reality: The high volume of alerts generated (false positives) is then triaged by secondary systems or human analysts, a workflow justified by the critical need for high recall.

Legal e-Discovery & Document Review

In legal proceedings, models used for electronic discovery must achieve near-perfect recall to ensure no exculpatory or inculpatory evidence is overlooked.

Process: Machine learning classifiers sift through millions of documents to identify those relevant to a case. A missed document (false negative) could constitute a failure to produce evidence.
Standard: Legal teams often require recall levels exceeding 95% before deeming an AI-assisted review process defensible in court.

Manufacturing Defect Inspection

Automated visual inspection systems on production lines are calibrated for high recall to prevent defective products from reaching customers.

Quality Control: A model inspecting circuit boards or pharmaceutical packaging must catch all units with critical flaws, even if it means occasionally rejecting a functional item.
Cost Analysis: The financial and reputational cost of a defective product in the field typically far outweighs the cost of scrapping or re-checking a small percentage of false positives.

The Precision-Recall Trade-off & F1 Score

Recall cannot be evaluated in isolation; it exists in a fundamental trade-off with precision. Improving recall usually reduces precision, as the model casts a wider net.

Balancing Metric: The F1 Score is the harmonic mean of precision and recall, providing a single metric to optimize when both false negatives and false positives are important, but one cannot dominate the other.
Strategic Choice: The optimal point on the Precision-Recall curve is determined by the specific business cost function: is missing a positive case (low recall) or acting on a false alarm (low precision) more expensive?

RECALL (SENSITIVITY)

Frequently Asked Questions

Recall, also known as sensitivity or true positive rate, is a fundamental classification metric for evaluating a model's ability to identify all relevant positive instances. These questions address its calculation, trade-offs, and application in real-world machine learning systems.

Recall, also known as sensitivity or the true positive rate (TPR), is a classification performance metric that measures the proportion of actual positive instances that a model correctly identifies. It is calculated as the number of true positives (TP) divided by the sum of true positives and false negatives (FN): Recall = TP / (TP + FN). This formula quantifies a model's ability to find all relevant cases, making it critical in domains where missing a positive instance is costly, such as medical diagnosis or fraud detection. A recall of 1.0 (or 100%) indicates the model found every single positive case in the dataset, while a recall of 0.0 means it found none.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

PERFORMANCE METRIC DESIGN

Related Terms

Recall is a core component of a broader ecosystem of classification and evaluation metrics. Understanding its relationship to these complementary measures is essential for comprehensive model assessment.

Precision

Precision measures the exactness of a model's positive predictions. It is calculated as the ratio of true positives to all predicted positives (true positives + false positives). While recall asks "Of all the actual positives, how many did we find?", precision asks "Of all the items we labeled positive, how many are actually correct?"

Key Trade-off: In many real-world scenarios (e.g., spam detection, medical screening), there is a direct trade-off between precision and recall. Increasing the classification threshold typically raises precision but lowers recall, and vice-versa.
Use Case: High precision is critical when the cost of a false positive is high, such as in fraud detection where incorrectly flagging a legitimate transaction damages customer trust.

F1 Score

The F1 Score is the harmonic mean of precision and recall, providing a single metric that balances both concerns. It is especially useful for evaluating performance on imbalanced datasets where one class significantly outnumbers the other.

Calculation: F1 = 2 * (Precision * Recall) / (Precision + Recall).
Interpretation: A high F1 score indicates that the model has both good precision and good recall. It is more informative than accuracy when the class distribution is skewed.
Application: Commonly used in information retrieval, document classification, and medical diagnostics where both false positives and false negatives carry significant cost.

Specificity (True Negative Rate)

Specificity, or the True Negative Rate, is the complement to recall. It measures a model's ability to correctly identify negative instances. It is calculated as the ratio of true negatives to all actual negatives (true negatives + false positives).

Formula: Specificity = TN / (TN + FP).
Relationship to Recall: Recall (Sensitivity) focuses on the positive class; Specificity focuses on the negative class. Together, they provide a complete picture of a binary classifier's performance for each class.
Critical Use: High specificity is paramount in tests where incorrectly labeling a healthy person as sick (a false positive) causes undue stress and further costly testing.

Confusion Matrix

A Confusion Matrix is the foundational table from which recall, precision, and other classification metrics are derived. It is a 2x2 (for binary classification) grid that summarizes the counts of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN).

Visual Foundation: All primary classification metrics are calculated directly from these four values. Recall = TP / (TP + FN).
Diagnostic Tool: It allows for immediate visual diagnosis of a model's error patterns. A model with low recall will have a high count in the False Negative cell.
Extension: For multi-class problems, the confusion matrix expands to an N x N table, enabling per-class calculation of recall (often called sensitivity for each class).

Precision-Recall Curve

A Precision-Recall (PR) Curve is a graphical plot that illustrates the trade-off between precision and recall for a binary classifier at different probability thresholds. It is particularly informative for imbalanced datasets where the positive class is rare.

Interpretation: The curve shows how precision typically drops as recall is increased. The area under the PR curve (AUC-PR) is a single-number summary; a higher area indicates better overall performance across thresholds.
Comparison to ROC: While the ROC curve plots sensitivity vs. (1 - specificity), the PR curve is often more telling when the class distribution is highly skewed, as it focuses directly on the performance concerning the positive (minority) class of interest.

Sensitivity Analysis

Sensitivity Analysis in the context of model evaluation refers to systematically testing how a model's performance metrics, like recall, change in response to variations in input data, model parameters, or classification thresholds.

Purpose: It assesses the robustness and stability of a model. For instance, analysts may measure how recall degrades when input data contains slight noise or when the model is applied to a slightly different population.
Engineering Practice: This goes beyond calculating a static metric. It involves probing the model's behavior under different conditions to understand its operational boundaries and failure modes, which is a cornerstone of rigorous Evaluation-Driven Development.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.