Inferensys

Glossary

Fairness Metric (Disparate Impact)

Disparate Impact is a quantitative fairness metric that compares selection rates across demographic groups to detect unintentional discrimination in AI systems.
Isolated secure server room with network cables physically disconnected, minimal lighting, security-focused environment.
MODEL BENCHMARKING SUITES

What is Fairness Metric (Disparate Impact)?

A quantitative measure for auditing AI systems for discriminatory outcomes, with disparate impact being a core legal test.

A fairness metric is a quantitative measure used to audit an artificial intelligence system for discriminatory outcomes across different demographic groups. Disparate impact is a specific legal and statistical fairness test that compares the selection or benefit rates between a protected group (e.g., based on race or gender) and a majority group, flagging a system if the ratio falls below a legally recognized threshold, often 80% in U.S. employment law. This metric is a cornerstone of algorithmic auditing and ethical bias auditing.

Unlike disparate treatment, which examines intent, disparate impact focuses solely on discriminatory outcomes, making it a critical tool for model benchmarking suites. It is calculated by dividing the selection rate for a protected group by the rate for the advantaged group. A result below 0.8 (the "80% rule") indicates a potential violation requiring remediation, such as adjusting the model or implementing bias mitigation techniques. This objective metric is essential for compliance with regulations like the EU AI Act.

FAIRNESS METRIC COMPARISON

Disparate Impact vs. Other Fairness Metrics

A comparison of disparate impact with other common fairness metrics, highlighting their legal basis, statistical formulation, and primary use cases in algorithmic auditing.

Metric / FeatureDisparate Impact (80% Rule)Equalized OddsDemographic ParityIndividual Fairness

Legal & Regulatory Basis

U.S. Civil Rights Law (Title VII), EU AI Act (High-Risk)

Primarily a research concept

Core Definition

Compares selection rates between protected and non-protected groups.

Requires equal true positive and false positive rates across groups.

Requires equal positive prediction rates across groups.

Requires similar individuals to receive similar predictions.

Primary Mathematical Test

Ratio(Selection Rate_Protected / Selection Rate_Non-Protected) ≥ 0.8

P(Ž=1|Y=1, A=a) = P(Ž=1|Y=1, A=b) AND P(Ž=1|Y=0, A=a) = P(Ž=1|Y=0, A=b)

P(Ž=1 | A=a) = P(Ž=1 | A=b)

D(Ž_i, Ž_j) ≤ ε * D(x_i, x_j) for a distance metric D

Requires Ground Truth Labels (Y)

Considers Model Accuracy

Use Case Example

Hiring algorithm screening resumes.

Criminal recidivism prediction tool.

College admissions pre-screening.

Credit scoring for individuals with similar financial profiles.

Key Trade-off / Limitation

Can conflict with model accuracy; ignores legitimate qualifications.

Very strict; often impossible to satisfy perfectly without perfect predictor.

Ignores underlying differences in qualification rates between groups.

Defining a similarity metric for individuals is highly non-trivial.

Common in Production Audits

FAIRNESS METRIC

Common Use Cases for Disparate Impact Analysis

Disparate impact analysis is a quantitative fairness audit used to detect unintentional discrimination in automated systems. These are the primary domains where it is applied to ensure equitable outcomes.

01

Hiring & Recruitment Algorithms

This is the canonical use case for disparate impact analysis, often scrutinized under the Uniform Guidelines on Employee Selection Procedures. Auditors apply the four-fifths rule (80% rule) to automated resume screening, video interview analysis, and skills assessment tools. For example, if a model recommends 50% of male applicants for interviews but only 30% of female applicants, the 60% ratio indicates a potential disparate impact. Analysis focuses on protected attributes like gender, race, and age to ensure selection rates are statistically equitable.

02

Credit Scoring & Loan Underwriting

Financial institutions and regulators (e.g., the Consumer Financial Protection Bureau) use disparate impact analysis to audit algorithmic credit models for compliance with the Equal Credit Opportunity Act (ECOA). The test compares approval rates, interest rates, and credit limits across demographic groups defined by race, national origin, or sex. A key challenge is distinguishing between legitimate risk-based pricing and discriminatory proxies, requiring analysis of feature attribution to see if zip code or shopping history acts as a surrogate for a protected class.

03

Policing & Criminal Justice Risk Assessments

Tools like COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) used for predicting recidivism have been extensively audited for disparate impact. Analysis examines whether the models assign higher risk scores to defendants of one race compared to another with similar criminal histories, potentially leading to harsher bail or sentencing recommendations. This domain highlights the tension between predictive parity (equal precision across groups) and demographic parity (equal selection rates), as optimizing for one can violate the other.

04

Healthcare Resource Allocation & Diagnostics

Disparate impact analysis is applied to algorithms that predict healthcare needs, prioritize patients for interventions, or aid in clinical diagnosis. For instance, a model used to enroll high-risk patients into care management programs must be audited to ensure it does not systematically exclude groups based on race or socioeconomic status, which could be correlated with training data gaps. This is critical for compliance with Section 1557 of the Affordable Care Act, which prohibits discrimination in health programs.

05

Advertising Delivery & Targeted Marketing

Platforms use disparate impact analysis to audit ad delivery algorithms for digital redlining, where housing, employment, or credit ads are disproportionately shown or withheld from users based on inferred demographic characteristics. This analysis often involves running controlled experiments (A/A tests) to measure delivery rates across groups when ad content and targeting parameters are held constant, ensuring compliance with laws like the Fair Housing Act.

06

Academic Admissions & Scholarship Screening

Educational institutions apply disparate impact analysis to automated systems that pre-screen applications or award scholarships. The four-fifths rule is used to compare recommendation rates across groups based on race, gender, or disability status. The analysis must account for legitimate educational qualifications while ensuring the model does not amplify historical biases present in training data, such as correlations between extracurricular activities and socioeconomic status.

FAIRNESS METRIC

Frequently Asked Questions

A fairness metric is a quantitative measure used to audit an AI system for discriminatory outcomes across different demographic groups, with disparate impact being a common legal test comparing selection rates. This FAQ addresses key technical and legal questions surrounding this critical evaluation concept.

Disparate impact is a quantitative fairness metric that measures whether an AI system's selection rate (e.g., for loans, hiring, or services) differs significantly across protected demographic groups, such as those defined by race or gender. It is calculated as the ratio of the selection rate for a disadvantaged group to the selection rate for the most advantaged group. A value of 1.0 indicates perfect parity, while a value below a legal threshold (often 0.8, known as the "80% rule") suggests a potentially discriminatory adverse impact. This metric originates from U.S. employment law and is a cornerstone of algorithmic auditing for group fairness, focusing on outcomes rather than intent.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.