Inferensys

Glossary

Propensity Score

A propensity score is the conditional probability of receiving a treatment given observed covariates, used in causal inference to adjust for confounding and estimate treatment effects.
Developer testing AI inference on mobile phone in hand, laptop with optimization code visible, casual tech review moment.
CAUSAL REASONING MODELS

What is a Propensity Score?

A core statistical technique in causal inference used to estimate the effect of a treatment by adjusting for confounding variables.

A propensity score is the conditional probability of a unit (e.g., a patient, customer) receiving a particular treatment, given a set of observed pre-treatment covariates. It is formally defined as e(X) = P(T=1 | X), where T is the treatment indicator and X is a vector of covariates. This single scalar score summarizes all the information in the observed confounders, enabling the creation of balanced comparison groups for estimating causal effects like the Average Treatment Effect (ATE) from observational data where random assignment was not possible.

The primary utility of the propensity score lies in its use within adjustment methods such as matching, stratification, or inverse probability weighting (IPW). These techniques use the score to simulate the conditions of a randomized controlled trial by ensuring that, within strata of the propensity score, the distribution of covariates is similar between treated and untreated units. This adjustment mitigates confounding bias, allowing for a more credible comparison of outcomes across treatment groups, provided the critical assumptions of ignorability (no unmeasured confounders) and positivity are met.

CAUSAL INFERENCE

Core Properties of Propensity Scores

A propensity score is the conditional probability of receiving a treatment given observed covariates. Its core properties enable robust causal effect estimation from observational data by balancing treatment groups.

01

Conditional Probability Definition

The propensity score, denoted as e(X), is formally defined as the conditional probability of a unit receiving the treatment given its observed pre-treatment covariates: e(X) = P(T=1 | X). This scalar value, ranging from 0 to 1, summarizes all the information in the multi-dimensional covariate vector X that is relevant to treatment assignment. It is the fundamental object that enables methods like matching and weighting to adjust for confounding.

02

Balancing Property

The most critical property for causal inference is the balancing property. It states that, conditional on the propensity score, the distribution of observed covariates X is independent of treatment assignment: T ⟂ X | e(X). This means that within subgroups of units with the same propensity score, treated and control units are, on average, comparable in their observed characteristics. This property allows the propensity score to act as a sufficient statistic for confounding, enabling the creation of a quasi-experimental design from observational data.

03

Ignorability & Unconfoundedness

The validity of propensity score methods rests on the strong ignorability or unconfoundedness assumption. This requires two conditions:

  • Conditional Independence: The potential outcomes (Y(1), Y(0)) are independent of treatment assignment given the covariates: (Y(1), Y(0)) ⟂ T | X.
  • Positivity: Every unit has a non-zero probability of receiving either treatment: 0 < P(T=1 | X) < 1 for all X. If these hold, conditioning on the propensity score e(X) is sufficient to satisfy unconfoundedness, allowing for unbiased estimation of the Average Treatment Effect (ATE).
04

Common Estimation Methods

Propensity scores are not directly observed and must be estimated from data, typically using a binary classification model.

  • Logistic Regression: The most common method, modeling log-odds of treatment as a linear function of X.
  • Machine Learning Classifiers: Methods like boosted trees (e.g., XGBoost) or random forests can better capture non-linearities and interactions in high-dimensional settings, though they require careful cross-fitting to avoid overfitting bias.
  • Regularization (L1/L2): Used in high-dimensional settings to prevent overfitting. The goal is not prediction accuracy per se, but achieving covariate balance in the resulting matched or weighted sample.
05

Core Applications: Matching & Weighting

Estimated propensity scores are used in several key adjustment techniques:

  • Propensity Score Matching: Pairs treated units with control units that have similar propensity scores, creating a balanced sample for effect estimation.
  • Inverse Probability of Treatment Weighting (IPTW): Creates a pseudo-population by weighting each unit by the inverse of its probability of receiving the treatment it actually received. Weights are 1/e(X) for treated units and 1/(1-e(X)) for control units.
  • Stratification (Subclassification): Units are grouped into strata (e.g., quintiles) based on their propensity score, and treatment effects are estimated within each stratum before being aggregated.
06

Diagnostics & Overlap Assessment

After estimation, diagnostics are essential to validate the propensity score model's performance.

  • Covariate Balance: The primary diagnostic. Standardized mean differences, variance ratios, and empirical quantile-quantile plots for key covariates should be compared between treated and control groups after matching or weighting. Imbalance should be minimal (< 0.1 standard deviations).
  • Overlap/Common Support: Visualizing the distribution of propensity scores for treated and control units reveals the region of common support. Estimation should be limited to this region where both groups have substantial density, as extrapolation outside this region is unreliable.
CAUSAL INFERENCE

How Propensity Score Methods Work

A propensity score is a core statistical tool in causal inference, used to estimate the effect of a treatment or intervention from observational data by adjusting for confounding variables.

A propensity score is the estimated probability that a unit (e.g., a patient, customer) receives a specific treatment, conditional on its observed pre-treatment characteristics, or covariates. It is formally defined as e(X) = P(T=1 | X), where T is the treatment indicator and X is a vector of covariates. The core purpose is to create a balancing score; units with the same propensity score should have similar distributions of their observed covariates, whether they received the treatment or not. This property allows researchers to mimic a randomized controlled trial by comparing outcomes between treated and untreated groups that are comparable on all observed dimensions.

In practice, a model (typically logistic regression) is first trained to predict treatment assignment from covariates. The resulting scores are then used for adjustment via matching, stratification, or inverse probability weighting (IPW). For example, IPW creates a pseudo-population by weighting each unit by the inverse of its probability of receiving the treatment it actually received, thereby removing the association between covariates and treatment assignment. This adjustment, under the key assumptions of conditional ignorability (no unmeasured confounders) and positivity, allows for an unbiased estimate of the Average Treatment Effect (ATE) or other causal quantities from non-experimental data.

PROPENSITY SCORE

Frequently Asked Questions

A propensity score is a core technique in causal inference used to estimate the effect of a treatment, policy, or intervention by adjusting for confounding variables in observational data. These FAQs address its definition, calculation, applications, and limitations.

A propensity score is the conditional probability of a unit (e.g., a patient, customer, or user) receiving a particular treatment, given a set of observed pre-treatment covariates or characteristics. Formally, for a binary treatment T (1=treatment, 0=control) and a vector of covariates X, the propensity score is defined as e(X) = P(T=1 | X). It is a single scalar that summarizes the multidimensional X, enabling the creation of comparable groups for causal effect estimation. Its primary purpose is to adjust for confounding in observational studies, mimicking the randomization of a controlled experiment by balancing the distribution of covariates between treated and control groups.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.