Causal Fairness: Definition & Framework for AI Fairness

CAUSAL REASONING MODELS

What is Causal Fairness?

Causal fairness is a formal framework for assessing and ensuring algorithmic fairness by using causal models to define and measure discrimination along specific causal pathways.

Causal fairness is a framework for assessing and ensuring algorithmic fairness using causal models to define and measure discrimination along specific causal pathways, distinguishing between direct, indirect, and spurious effects of sensitive attributes like race or gender. Unlike statistical fairness metrics that rely on correlations, it uses tools like Structural Causal Models (SCMs) and causal graphs to answer counterfactual questions (e.g., 'Would the decision have been different if the individual's protected attribute were changed?'). This allows for precise, legally-grounded definitions of fairness, such as counterfactual fairness, which holds if an outcome is the same in the actual world and a counterfactual world where the protected attribute differs.

The framework requires specifying a causal model of the data-generating process, which includes confounders, mediators, and outcomes. Key tasks include causal identifiability—determining if a fairness quantity can be estimated from data—and applying criteria like the backdoor adjustment to block non-causal paths. It is critical for algorithmic explainability and auditing in high-stakes domains like lending or hiring, as it separates discriminatory mechanisms from legally permissible ones, such as a mediator like education level that may be influenced by a protected attribute but is a legitimate basis for decision-making.

CAUSAL REASONING MODELS

Core Concepts in Causal Fairness

Causal fairness is a framework for assessing and ensuring algorithmic fairness using causal models to define and measure discrimination along specific causal pathways, distinguishing between direct, indirect, and spurious effects of sensitive attributes.

Causal Fairness Definition

Causal fairness is a framework for assessing and ensuring algorithmic fairness using causal models to define and measure discrimination along specific causal pathways. Unlike statistical fairness metrics, it distinguishes between direct, indirect, and spurious effects of sensitive attributes (e.g., race, gender). It answers counterfactual questions like, 'Would this individual have received a different outcome if their protected attribute were different, all else being equal?' This approach provides a principled method to audit and remove unfair causal influences from automated decision systems.

Direct vs. Indirect Discrimination

Causal models explicitly separate different pathways of influence from a sensitive attribute to an outcome.

Direct Discrimination: The causal effect of the sensitive attribute on the outcome that does not pass through a mediator variable. This is often considered legally and ethically impermissible.
Indirect Discrimination: The effect that flows from the sensitive attribute through a mediator (e.g., zip code influencing credit score). Assessing whether this is fair depends on the justifiability of the mediator.
Spurious Association: A non-causal, statistical correlation caused by a confounder (a common cause of both the attribute and outcome). Causal fairness aims to isolate and remove direct and unjust indirect effects while preserving spurious associations that do not represent actual discrimination.

Counterfactual Fairness

A leading formal definition of causal fairness. A predictor is counterfactually fair if, for any individual, the prediction is the same in the actual world and in a counterfactual world where the individual's protected attribute (e.g., race) was changed. Formally: P(Ŷ_{A←a}(U) = y | X=x, A=a) = P(Ŷ_{A←a'}(U) = y | X=x, A=a), where A is the attribute, U represents latent background variables, and the do-operator sets the attribute. This ensures decisions are based on causally relevant factors unrelated to the protected attribute, providing a strong individual-level guarantee.

The Causal Graph & Fairness

The analysis is grounded in a Structural Causal Model (SCM) represented by a causal graph (a Directed Acyclic Graph).

Nodes represent variables (sensitive attribute A, outcome Y, features X, mediators M, confounders C).
Edges represent direct causal relationships.
Paths from A to Y are analyzed to determine fairness.
- Backdoor Paths: Non-causal paths opened by confounders. These are blocked by conditioning to isolate the true effect.
- Front-door Paths: Paths through mediators. The do-calculus is used to compute effects along these paths. The graph makes assumptions explicit and enables the use of tools like the backdoor criterion and front-door criterion to identify which variables to adjust for to measure specific types of discrimination.

Interventional Fairness

This family of metrics evaluates fairness from an interventional perspective (the 'do' level of the causal hierarchy). It measures the effect of intervening on the protected attribute.

Effect of Treatment on the Treated (ETT): The average effect for those who actually have a specific attribute value.
No Unresolved Discrimination: Requires that the protected attribute has no direct causal effect on the outcome. This is tested by checking if P(Y | do(A=a), X=x) is constant across a.
Path-Specific Effects: Allows finer-grained analysis by quantifying the effect flowing through specific causal pathways (e.g., only through an admissible mediator like qualifications, but not through an inadmissible one like neighborhood). This enables nuanced policies that remove unfair influences while preserving legitimate ones.

Challenges & Tools

Implementing causal fairness presents significant engineering and statistical challenges.

Graph Specification: The correctness of the analysis depends on an accurately specified causal graph, which requires domain expertise.
Unmeasured Confounding: Hidden common causes can bias estimates. Techniques like sensitivity analysis or the search for instrumental variables are used to bound possible bias.
Estimation from Data: Once a causal quantity is identified (e.g., a path-specific effect), it must be estimated from finite data using methods like propensity score matching, inverse probability weighting, or structural equation modeling.
Integration with ML: Methods are being developed to build fairness-aware algorithms that learn under causal constraints, such as models that enforce counterfactual fairness during training by leveraging inferred latent variables.

FAIRNESS FRAMEWORK COMPARISON

Causal vs. Statistical Fairness Metrics

This table contrasts the core principles, assumptions, and technical approaches of causal fairness metrics, which use causal models to isolate discrimination along specific pathways, with statistical (or observational) fairness metrics, which assess parity in outcomes based on statistical associations in the data.

Metric / Feature	Causal Fairness Metrics	Statistical Fairness Metrics	Key Distinction
Underlying Model	Structural Causal Model (SCM) / Causal Graph	Observational Probability Distributions	Causal metrics require a formal model of cause-and-effect; statistical metrics use correlations.
Core Question	"What is the causal effect of the sensitive attribute on the decision?"	"Is there a statistical disparity correlated with the sensitive attribute?"	Causal metrics ask 'why' a disparity exists; statistical metrics ask 'if' it exists.
Handling of Confounding	Explicitly models and adjusts for confounders (e.g., via backdoor adjustment).	Cannot distinguish correlation from causation; confounded associations are treated as discriminatory.	Causal metrics can separate direct, indirect, and spurious effects; statistical metrics conflate them.
Definition of Fairness	Defined via causal pathways (e.g., direct, indirect, total effects).	Defined via statistical parity (e.g., demographic parity, equalized odds).	Causal definitions are mechanistic; statistical definitions are associational.
Data Requirements	Requires causal assumptions/graph and often richer data to satisfy identifiability.	Can be computed directly from the observed input data and model predictions.	Causal metrics need a model of the world; statistical metrics need only the data at hand.
Interpretability & Explanation	Provides explanations in terms of causal mechanisms and paths (e.g., "discrimination flows through variable Z").	Provides a quantitative score of disparity but no mechanistic explanation for its cause.	Causal metrics support root-cause analysis; statistical metrics are diagnostic, not explanatory.
Policy & Intervention Guidance	Directly informs which levers to adjust (e.g., which causal path to interrupt) for fair outcomes.	Indicates a problem exists but does not specify how to achieve fairness without potentially introducing distortion.	Causal metrics are prescriptive; statistical metrics are primarily descriptive.
Robustness to Legitimate Factors	Can theoretically account for and permit disparities justified by mediators (e.g., qualifications).	Often requires trade-offs, as it may penalize disparities driven by legitimate, non-sensitive factors.	Causal metrics aim to isolate unfairness; statistical metrics may overcorrect.

CAUSAL FAIRNESS

Frequently Asked Questions

Causal fairness is a rigorous, model-based framework for assessing and ensuring algorithmic fairness. It uses causal models to define and measure discrimination along specific causal pathways, distinguishing between direct, indirect, and spurious effects of sensitive attributes like race or gender.

Causal fairness is a framework for assessing algorithmic fairness using structural causal models (SCMs) to define and measure discrimination along specific causal pathways, distinguishing between direct, indirect, and spurious effects of a sensitive attribute. It differs fundamentally from statistical fairness, which relies solely on correlations in observed data. Statistical metrics like demographic parity or equalized odds measure associations but cannot determine if an observed disparity is causally discriminatory (e.g., directly caused by gender) or a spurious result of a confounding variable (e.g., a correlation between gender and a legitimate hiring criterion like experience). Causal fairness moves beyond pattern-matching to answer why a disparity exists, enabling interventions that target the true root cause of unfairness.

CAUSAL REASONING MODELS

Related Terms

Causal fairness is a rigorous, model-based approach to algorithmic fairness. It requires formal definitions of fairness based on causal pathways, distinguishing between direct, indirect, and spurious effects of sensitive attributes like race or gender.

Structural Causal Model (SCM)

A Structural Causal Model (SCM) is the foundational mathematical framework for causal fairness. It represents causal relationships between variables (e.g., ZIP code, education, hiring decision) as a system of structural equations, typically visualized as a causal graph.

Provides the formal language to define fairness (e.g., "no direct effect of gender on hiring").
Enables the computation of counterfactual quantities (e.g., "What would this applicant's salary be if their gender were different?").
Distinguishes between observational data (what we see) and interventional data (what happens when we act).

Counterfactual Fairness

Counterfactual fairness is a strict, individual-level fairness criterion. An algorithm is counterfactually fair if, for any individual, its prediction is the same in the actual world and in a counterfactual world where that individual's protected attribute (e.g., race) was different, while all other circumstances remain the same.

Asks: "Would the decision have been the same if this person were of a different race, all else being equal?"
Requires modeling the underlying causal process to simulate these alternative worlds.
Considered a "gold standard" but is often difficult to satisfy in practice due to data and modeling requirements.

Path-Specific Fairness

Path-specific fairness decomposes the total effect of a sensitive attribute on an outcome into effects that travel along specific causal pathways in a graph. This allows for nuanced fairness policies.

Direct Effect: The effect of gender on hiring that does not pass through a mediator like "years of experience."
Indirect Effect: The effect of gender on hiring that does pass through a mediator (e.g., gender→education→hiring).
Enables definitions like: "We allow the effect of gender through education (an indirect effect) but prohibit any direct discrimination."
Requires specifying which pathways are considered fair or unfair.

Causal Mediation Analysis

Causal mediation analysis is the statistical technique used to implement path-specific fairness. It quantifies how much of a total effect (e.g., gender pay gap) operates through a specific intermediate variable, or mediator (e.g., job title, negotiation outcome).

Total Effect: The overall disparity in outcomes.
Natural Direct Effect (NDE): The portion of disparity not explained by the mediator.
Natural Indirect Effect (NIE): The portion of disparity explained by the mediator.
Tools include the mediation formula and methods based on the do-calculus to estimate these effects from observational data under assumptions.

Causal Confounding

Causal confounding is the primary obstacle to measuring true discrimination. It occurs when a common cause influences both a protected attribute (e.g., race) and the outcome (e.g., loan denial), creating a spurious, non-causal association.

Example: Neighborhood (confounder) influences both racial composition and average credit score.
A naive model might incorrectly attribute the effect of neighborhood to race.
The backdoor criterion is used to identify a set of variables to adjust for (e.g., income, location) to block these backdoor paths and isolate the true causal effect.
Unmeasured confounding remains a fundamental limitation.

Fairness Through Unawareness vs. Awareness

This contrast highlights the shift from simplistic to causal approaches.

Fairness Through Unawareness: The naive practice of simply removing a protected attribute (e.g., 'gender') from model inputs. It is ineffective because proxies (e.g., 'major,' 'hobbies') can reconstruct the attribute, leading to indirect discrimination.
Fairness Through Causal Awareness: The causal approach. It explicitly models the relationship between the protected attribute, its proxies, other covariates, and the outcome. It uses the causal model to define what constitutes unfair discrimination (e.g., direct effects) and then debiases the model or its predictions to satisfy that definition, often by simulating interventions.

The analysis is grounded in a Structural Causal Model (SCM) represented by a causal graph (a Directed Acyclic Graph).

Nodes represent variables (sensitive attribute A, outcome Y, features X, mediators M, confounders C).
Edges represent direct causal relationships.
Paths from A to Y are analyzed to determine fairness.
- Backdoor Paths: Non-causal paths opened by confounders. These are blocked by conditioning to isolate the true effect.
- Front-door Paths: Paths through mediators. The do-calculus is used to compute effects along these paths. The graph makes assumptions explicit and enables the use of tools like the backdoor criterion and front-door criterion to identify which variables to adjust for to measure specific types of discrimination.

Causal vs. Statistical Fairness Metrics

Metric / Feature

Causal Fairness Metrics

Statistical Fairness Metrics

Key Distinction

Underlying Model

Structural Causal Model (SCM) / Causal Graph

Observational Probability Distributions

Causal metrics require a formal model of cause-and-effect; statistical metrics use correlations.

Core Question

"What is the causal effect of the sensitive attribute on the decision?"

"Is there a statistical disparity correlated with the sensitive attribute?"

Causal metrics ask 'why' a disparity exists; statistical metrics ask 'if' it exists.

Handling of Confounding

Explicitly models and adjusts for confounders (e.g., via backdoor adjustment).

Cannot distinguish correlation from causation; confounded associations are treated as discriminatory.

Causal metrics can separate direct, indirect, and spurious effects; statistical metrics conflate them.

Definition of Fairness

Defined via causal pathways (e.g., direct, indirect, total effects).

Defined via statistical parity (e.g., demographic parity, equalized odds).

Causal definitions are mechanistic; statistical definitions are associational.

Data Requirements

Requires causal assumptions/graph and often richer data to satisfy identifiability.

Can be computed directly from the observed input data and model predictions.

Causal metrics need a model of the world; statistical metrics need only the data at hand.

Interpretability & Explanation

Provides explanations in terms of causal mechanisms and paths (e.g., "discrimination flows through variable Z").

Provides a quantitative score of disparity but no mechanistic explanation for its cause.

Causal metrics support root-cause analysis; statistical metrics are diagnostic, not explanatory.

Policy & Intervention Guidance

Directly informs which levers to adjust (e.g., which causal path to interrupt) for fair outcomes.

Indicates a problem exists but does not specify how to achieve fairness without potentially introducing distortion.

Causal metrics are prescriptive; statistical metrics are primarily descriptive.

Robustness to Legitimate Factors

Can theoretically account for and permit disparities justified by mediators (e.g., qualifications).

Often requires trade-offs, as it may penalize disparities driven by legitimate, non-sensitive factors.

Causal metrics aim to isolate unfairness; statistical metrics may overcorrect.