Intersection over Union (IoU), also known as the Jaccard Index, is an evaluation metric that quantifies the spatial overlap between a predicted region and a ground truth region in tasks like object detection and image segmentation. It is calculated as the area of overlap between the two regions divided by the area of their union, yielding a score between 0 (no overlap) and 1 (perfect alignment). This metric provides a single, normalized value that directly measures localization accuracy, making it a fundamental performance metric for benchmarking model outputs against human-annotated data.
Glossary
Intersection over Union (IoU)

What is Intersection over Union (IoU)?
A core metric for evaluating spatial predictions in computer vision tasks.
IoU is critical for model benchmarking suites and setting evaluation thresholds; a common standard, such as an IoU of 0.5, is often used to define a 'correct' detection in benchmarks like PASCAL VOC and COCO. In object detection, the metric Mean Average Precision (mAP) is calculated by averaging precision values across a range of IoU thresholds. For segmentation tasks, IoU is applied pixel-wise to predicted masks. Its simplicity and interpretability make it indispensable for experiment tracking and comparing the geometric fidelity of different model architectures or training runs.
Core Properties of IoU
Intersection over Union is a fundamental metric for evaluating spatial overlap in computer vision tasks. Its properties define its applicability, strengths, and limitations in benchmarking object detection and segmentation models.
Definition & Formula
Intersection over Union (IoU), also known as the Jaccard Index, is a ratio that quantifies the spatial overlap between a predicted region (bounding box or segmentation mask) and a ground truth region. It is calculated as:
IoU = Area of Overlap / Area of Union
- Area of Overlap: The number of pixels (for masks) or the spatial area (for boxes) where the prediction and ground truth coincide.
- Area of Union: The total area covered by both the prediction and the ground truth, calculated as
Area(Prediction) + Area(Ground Truth) - Area(Overlap).
This yields a score between 0.0 (no overlap) and 1.0 (perfect overlap). A common threshold for a "correct" detection in benchmarks like COCO is IoU ≥ 0.50.
Scale Invariance
IoU is scale-invariant, meaning its value depends only on the relative proportion of overlap, not the absolute size of the objects. A small object and a large object with the same proportional overlap will yield an identical IoU score.
- Implication for Evaluation: This property allows IoU to fairly compare model performance on objects of vastly different sizes within the same dataset, from small distant cars to large foreground pedestrians in an autonomous driving scene.
- Limitation: While the score is invariant, the practical difficulty of achieving high overlap is often greater for smaller objects due to pixel-level precision requirements in segmentation.
Threshold-Dependent Binary Classification
IoU transforms a continuous regression problem (predicting coordinates or a mask) into a binary classification task for evaluation. A prediction is deemed a True Positive (TP), False Positive (FP), or False Negative (FN) based on a chosen IoU threshold.
- Common Thresholds: 0.50 (PASCAL VOC standard), 0.75 (strict overlap), or a range like 0.50:0.95 (COCO standard, averaged across multiple thresholds).
- Trade-off: A higher threshold (e.g., 0.75) demands more precise localization, reducing the reported number of True Positives and increasing False Negatives, which yields a more stringent performance assessment.
- Metric Calculation: Precision, Recall, and the primary object detection metric mAP (mean Average Precision) are all computed from these threshold-based classifications.
Sensitivity to Localization Error
IoU is highly sensitive to localization errors, especially for small objects. A minor shift in a bounding box corner can cause a dramatic drop in the score.
- Example: For a 10x10 pixel object, a prediction offset by just 2 pixels might reduce IoU from 1.0 to ~0.7. The same absolute offset for a 100x100 pixel object might only reduce IoU to ~0.96.
- Comparison to L2 Loss: Unlike Mean Squared Error (MSE) on box coordinates, which penalizes error smoothly, IoU provides a more direct measure of functional overlap that correlates better with perceived detection quality.
- Consequence: This sensitivity makes IoU a demanding but appropriate loss function for training (e.g., IoU Loss, GIoU Loss), directly optimizing for the evaluation metric.
Handling of Non-Overlap (Zero Gradient)
A critical limitation of basic IoU as a training loss is its vanishing gradient problem when there is no overlap between prediction and ground truth (IoU = 0).
- Problem: If two boxes do not intersect, the IoU is zero regardless of distance. This provides no directional gradient to guide the optimizer on how to move the prediction to achieve overlap.
- Solution - Extended IoU Variants: This flaw led to the development of improved loss functions that provide gradients even for non-overlapping boxes:
- GIoU (Generalized IoU): Penalizes based on the smallest enclosing box. Provides a gradient to minimize the enclosing area.
- DIoU (Distance IoU): Adds a penalty for the normalized center-point distance.
- CIoU (Complete IoU): Incorporates DIoU plus an aspect ratio consistency term.
Application Spectrum
IoU is the cornerstone metric for several distinct but related computer vision tasks, with application-specific nuances.
- Object Detection: Compares predicted bounding boxes to ground truth boxes. The primary metric is mAP, which averages precision across classes at multiple IoU thresholds.
- Instance Segmentation: Compares predicted pixel-wise masks to ground truth masks. More computationally intensive as it requires pixel-level intersection/union calculations.
- Semantic Segmentation: Often uses mean IoU (mIoU), which is the average IoU across all semantic classes in the dataset, calculated per-class and then averaged.
- Image Generation/Registration: Used to evaluate the alignment of generated or transformed images with a target, by comparing corresponding segmented regions.
How is IoU Calculated and Used?
Intersection over Union (IoU) is a fundamental metric for quantifying the spatial accuracy of predictions in computer vision tasks like object detection and image segmentation.
Intersection over Union (IoU) is an evaluation metric that measures the spatial overlap between a predicted region and a ground truth region. It is calculated as the area of intersection between the two regions divided by the area of their union, yielding a score between 0 (no overlap) and 1 (perfect alignment). This Jaccard index is the standard for assessing the localization precision of bounding boxes in object detection and segmentation masks in instance or semantic segmentation.
IoU is used to define a match between a prediction and a ground truth, typically using a threshold like 0.5. Predictions above the threshold are considered true positives, forming the basis for calculating precision and recall. The metric is aggregated across all objects in a dataset to compute summary statistics like Average Precision (AP) and Mean Average Precision (mAP), providing a comprehensive view of a model's detection performance across all classes.
IoU in Practice: Use Cases & Examples
Intersection over Union (IoU) is a fundamental metric for quantifying spatial overlap, primarily used to evaluate the accuracy of object localization in computer vision tasks.
Object Detection Benchmarking
IoU is the standard metric for evaluating object detectors like YOLO, Faster R-CNN, and SSD. A predicted bounding box is considered a true positive only if its IoU with a ground truth box exceeds a predefined threshold (commonly 0.5 or 0.75).
- mAP Calculation: IoU is integral to computing Mean Average Precision (mAP). Precision-Recall curves are generated at specific IoU thresholds (e.g., [email protected], [email protected]:0.95).
- Threshold Selection: A higher IoU threshold (e.g., 0.75) demands more precise localization, leading to stricter evaluation and typically lower reported mAP scores.
Image Segmentation Evaluation
For segmentation tasks (semantic, instance, panoptic), IoU is adapted to measure pixel-wise overlap between predicted and ground truth masks.
- Semantic Segmentation: Evaluated via mean IoU (mIoU), calculated by averaging the IoU scores across all object classes in the dataset.
- Instance Segmentation: Each object instance is matched using IoU, and metrics like Average Precision for segmentation are derived, similar to object detection.
- Panoptic Segmentation: Combines both semantic and instance evaluation, where Panoptic Quality (PQ) can be decomposed into a segmentation quality term (SQ) based on IoU.
Non-Maximum Suppression (NMS)
IoU is a core algorithmic component in the post-processing stage of object detection pipelines. Non-Maximum Suppression uses IoU to eliminate redundant, overlapping bounding boxes that detect the same object.
- Process: The detector proposes many candidate boxes. NMS selects the box with the highest confidence score and suppresses all other boxes with an IoU greater than a set threshold (e.g., 0.45).
- Variants: Soft-NMS decays the scores of neighboring boxes as a continuous function of IoU instead of hard suppression, improving recall in crowded scenes.
Anchor Box Matching in Training
During the training of anchor-based detectors, IoU determines which anchor boxes are assigned to which ground truth objects, defining positive and negative training examples.
- Assignment Rule: An anchor is assigned as positive if its IoU with any ground truth box is above a high threshold (e.g., 0.7) and negative if its IoU is below a low threshold (e.g., 0.3).
- Bounding Box Regression Loss: The model is trained to adjust the coordinates of positive anchors to maximize their IoU with the matched ground truth, often using losses like GIoU Loss or CIoU Loss that are directly derived from IoU.
3D Object Detection & Robotics
IoU extends naturally to 3D spaces for evaluating LiDAR-based or RGB-D object detectors in autonomous driving and robotics.
- 3D IoU: Calculates the overlap of three-dimensional bounding cuboids in world coordinates. It is computationally more intensive but critical for safety.
- BEV IoU: A common simplification is to use Bird's Eye View IoU, projecting 3D boxes onto the 2D ground plane, which focuses on localization accuracy for navigation.
Limitations & Advanced Variants
Standard IoU has known limitations, leading to the development of more sophisticated overlap metrics for training and evaluation.
- Vanishing Gradient: If two boxes don't overlap, IoU=0 and provides no gradient for optimization. GIoU (Generalized IoU) addresses this by incorporating the enclosing shape.
- Shape & Aspect Ratio: DIoU (Distance IoU) and CIoU (Complete IoU) add penalties for central point distance and aspect ratio mismatch, leading to faster and more stable convergence during training.
- Evaluation Nuance: For tasks requiring extreme precision (e.g., medical imaging), very high IoU thresholds (0.9+) are used, while for coarse localization, lower thresholds may suffice.
IoU vs. Related Localization Metrics
A technical comparison of Intersection over Union (IoU) against other key metrics used to evaluate the spatial accuracy of bounding boxes or segmentation masks in computer vision tasks.
| Metric / Feature | Intersection over Union (IoU) | Dice Coefficient (F1 Score) | Average Precision (AP) / mAP | Pixel Accuracy |
|---|---|---|---|---|
Primary Use Case | Object detection, instance segmentation | Medical image segmentation, binary mask evaluation | Ranked object detection evaluation across thresholds | Semantic segmentation |
Mathematical Definition | Area of Overlap / Area of Union | 2 * |A ∩ B| / (|A| + |B|) | Area under the Precision-Recall curve | Correct Pixels / Total Pixels |
Output Range | 0 to 1 | 0 to 1 | 0 to 1 | 0 to 1 |
Handles Class Imbalance | ||||
Sensitive to Object Size | ||||
Standard Detection Threshold | ≥ 0.5 | ≥ 0.5 (common) | Integrated over IoU thresholds 0.5:0.95 | N/A |
Incorporates Precision/Recall Trade-off | ||||
Directly Measures Spatial Overlap | ||||
Common in Benchmark Datasets | COCO, PASCAL VOC | Medical Decathlon (BraTS) | COCO, PASCAL VOC | Cityscapes, PASCAL Context |
Frequently Asked Questions
Intersection over Union is a fundamental metric for evaluating object detection and image segmentation models. These questions address its calculation, interpretation, and role in machine learning workflows.
Intersection over Union is an evaluation metric that quantifies the spatial overlap between a predicted region and a ground truth region, expressed as the area of their intersection divided by the area of their union. The formula is IoU = (Area of Intersection) / (Area of Union). For a perfect match, where the prediction and ground truth are identical, the IoU score is 1.0. For predictions with no overlap, the score is 0.0. This calculation is performed for each object instance, making it a per-instance metric rather than a per-pixel or image-wide average.
In practice, for bounding boxes, the intersection and union are calculated using the coordinates of the rectangles. For segmentation masks, the calculation is performed at the pixel level, where the intersection is the count of overlapping foreground pixels and the union is the count of all pixels belonging to either the predicted or ground truth mask.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Intersection over Union (IoU) is a fundamental metric for spatial overlap, but evaluating a complete AI system requires a suite of complementary measures. These related terms define the broader landscape of quantitative model assessment.
Mean Average Precision (mAP)
Mean Average Precision (mAP) is the standard evaluation metric for object detection models, extending the concept of IoU to multi-object scenarios. It is calculated by:
- Computing the Average Precision (AP) for each object class, which integrates precision across all recall levels at a specific IoU threshold (e.g., 0.5).
- Taking the mean of these AP values across all classes to produce the final mAP score. A higher mAP indicates a model that is both precise (few false positives) and has high recall (finds most objects). For the COCO benchmark, mAP is often reported averaged over multiple IoU thresholds from 0.5 to 0.95.
Precision & Recall
Precision and Recall are core classification metrics that underpin IoU evaluation in object detection. When a predicted bounding box is matched to a ground truth box using an IoU threshold:
- Precision measures exactness:
True Positives / (True Positives + False Positives). A high precision means most detected objects are correct. - Recall measures completeness:
True Positives / (True Positives + False Negatives). A high recall means the model found most of the actual objects. IoU provides the rule for determining a True Positive match. The Precision-Recall curve, generated by varying the detection confidence threshold, visualizes their trade-off, and the area under this curve is used to calculate Average Precision.
Panoptic Quality (PQ)
Panoptic Quality (PQ) is a unified metric for panoptic segmentation, a task that requires classifying every pixel in an image as either a thing (countable object) or stuff (amorphous region). PQ combines segmentation quality with recognition quality. It is defined as the product of Segmentation Quality (SQ) and Recognition Quality (RQ): PQ = SQ * RQ.
- RQ is akin to F1 Score, calculated over matched segments (using IoU).
- SQ is the average IoU of only those matched segments. This metric addresses the limitations of using IoU alone for stuff classes or metrics like mAP alone for thing classes, providing a holistic score for scene understanding.
Dice Coefficient (F1 Score)
The Dice Coefficient, also known as the Sørensen–Dice Index or the F1 Score for image segmentation, is a metric closely related to IoU. It measures the overlap between two samples and is defined as: Dice = (2 * |A ∩ B|) / (|A| + |B|) where A and B are the predicted and ground truth masks.
- Relationship to IoU: The two metrics are monotonically related:
Dice = (2 * IoU) / (1 + IoU). For the same segmentation, the Dice score will always be equal to or higher than the IoU. It is commonly used in medical image segmentation due to its sensitivity to the size of the region of interest and its differentiability, which can be advantageous for loss functions.
Hausdorff Distance
Hausdorff Distance is a shape-aware metric used in medical and geometric image segmentation to complement area-based metrics like IoU. It measures the greatest distance from any point on the predicted boundary to the nearest point on the ground truth boundary (and vice versa).
- Purpose: While IoU penalizes overall area mismatch, Hausdorff Distance specifically penalizes large, localized segmentation errors, such as a "spur" or a "hole" that is far from the true boundary.
- Use Case: It is critical in applications where the precise contour of a structure is vital, such as tumor segmentation in radiotherapy planning. It is often reported as the 95th percentile Hausdorff Distance to be robust to small outliers.
Confidence Thresholding
Confidence Thresholding is the critical post-processing step that links a model's raw output to metrics like IoU and precision/recall. Object detectors output bounding boxes with an associated confidence score (the model's estimated probability that the box contains an object).
- Process: A threshold (e.g., 0.5) is applied to these scores. Boxes above the threshold are considered detections and evaluated against ground truth using an IoU threshold.
- Impact on Metrics: Varying this confidence threshold generates the Precision-Recall curve. A high threshold yields high precision but low recall; a low threshold yields high recall but low precision. The optimal operating point depends on the application's tolerance for false positives versus false negatives.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us