A propensity score is the conditional probability of a unit (e.g., a patient, customer) receiving a particular treatment, given a set of observed pre-treatment covariates. It is formally defined as e(X) = P(T=1 | X), where T is the treatment indicator and X is a vector of covariates. This single scalar score summarizes all the information in the observed confounders, enabling the creation of balanced comparison groups for estimating causal effects like the Average Treatment Effect (ATE) from observational data where random assignment was not possible.
Glossary
Propensity Score

What is a Propensity Score?
A core statistical technique in causal inference used to estimate the effect of a treatment by adjusting for confounding variables.
The primary utility of the propensity score lies in its use within adjustment methods such as matching, stratification, or inverse probability weighting (IPW). These techniques use the score to simulate the conditions of a randomized controlled trial by ensuring that, within strata of the propensity score, the distribution of covariates is similar between treated and untreated units. This adjustment mitigates confounding bias, allowing for a more credible comparison of outcomes across treatment groups, provided the critical assumptions of ignorability (no unmeasured confounders) and positivity are met.
Core Properties of Propensity Scores
A propensity score is the conditional probability of receiving a treatment given observed covariates. Its core properties enable robust causal effect estimation from observational data by balancing treatment groups.
Conditional Probability Definition
The propensity score, denoted as e(X), is formally defined as the conditional probability of a unit receiving the treatment given its observed pre-treatment covariates: e(X) = P(T=1 | X). This scalar value, ranging from 0 to 1, summarizes all the information in the multi-dimensional covariate vector X that is relevant to treatment assignment. It is the fundamental object that enables methods like matching and weighting to adjust for confounding.
Balancing Property
The most critical property for causal inference is the balancing property. It states that, conditional on the propensity score, the distribution of observed covariates X is independent of treatment assignment: T ⟂ X | e(X). This means that within subgroups of units with the same propensity score, treated and control units are, on average, comparable in their observed characteristics. This property allows the propensity score to act as a sufficient statistic for confounding, enabling the creation of a quasi-experimental design from observational data.
Ignorability & Unconfoundedness
The validity of propensity score methods rests on the strong ignorability or unconfoundedness assumption. This requires two conditions:
- Conditional Independence: The potential outcomes (Y(1), Y(0)) are independent of treatment assignment given the covariates: (Y(1), Y(0)) ⟂ T | X.
- Positivity: Every unit has a non-zero probability of receiving either treatment: 0 < P(T=1 | X) < 1 for all X. If these hold, conditioning on the propensity score e(X) is sufficient to satisfy unconfoundedness, allowing for unbiased estimation of the Average Treatment Effect (ATE).
Common Estimation Methods
Propensity scores are not directly observed and must be estimated from data, typically using a binary classification model.
- Logistic Regression: The most common method, modeling log-odds of treatment as a linear function of X.
- Machine Learning Classifiers: Methods like boosted trees (e.g., XGBoost) or random forests can better capture non-linearities and interactions in high-dimensional settings, though they require careful cross-fitting to avoid overfitting bias.
- Regularization (L1/L2): Used in high-dimensional settings to prevent overfitting. The goal is not prediction accuracy per se, but achieving covariate balance in the resulting matched or weighted sample.
Core Applications: Matching & Weighting
Estimated propensity scores are used in several key adjustment techniques:
- Propensity Score Matching: Pairs treated units with control units that have similar propensity scores, creating a balanced sample for effect estimation.
- Inverse Probability of Treatment Weighting (IPTW): Creates a pseudo-population by weighting each unit by the inverse of its probability of receiving the treatment it actually received. Weights are 1/e(X) for treated units and 1/(1-e(X)) for control units.
- Stratification (Subclassification): Units are grouped into strata (e.g., quintiles) based on their propensity score, and treatment effects are estimated within each stratum before being aggregated.
Diagnostics & Overlap Assessment
After estimation, diagnostics are essential to validate the propensity score model's performance.
- Covariate Balance: The primary diagnostic. Standardized mean differences, variance ratios, and empirical quantile-quantile plots for key covariates should be compared between treated and control groups after matching or weighting. Imbalance should be minimal (< 0.1 standard deviations).
- Overlap/Common Support: Visualizing the distribution of propensity scores for treated and control units reveals the region of common support. Estimation should be limited to this region where both groups have substantial density, as extrapolation outside this region is unreliable.
How Propensity Score Methods Work
A propensity score is a core statistical tool in causal inference, used to estimate the effect of a treatment or intervention from observational data by adjusting for confounding variables.
A propensity score is the estimated probability that a unit (e.g., a patient, customer) receives a specific treatment, conditional on its observed pre-treatment characteristics, or covariates. It is formally defined as e(X) = P(T=1 | X), where T is the treatment indicator and X is a vector of covariates. The core purpose is to create a balancing score; units with the same propensity score should have similar distributions of their observed covariates, whether they received the treatment or not. This property allows researchers to mimic a randomized controlled trial by comparing outcomes between treated and untreated groups that are comparable on all observed dimensions.
In practice, a model (typically logistic regression) is first trained to predict treatment assignment from covariates. The resulting scores are then used for adjustment via matching, stratification, or inverse probability weighting (IPW). For example, IPW creates a pseudo-population by weighting each unit by the inverse of its probability of receiving the treatment it actually received, thereby removing the association between covariates and treatment assignment. This adjustment, under the key assumptions of conditional ignorability (no unmeasured confounders) and positivity, allows for an unbiased estimate of the Average Treatment Effect (ATE) or other causal quantities from non-experimental data.
Frequently Asked Questions
A propensity score is a core technique in causal inference used to estimate the effect of a treatment, policy, or intervention by adjusting for confounding variables in observational data. These FAQs address its definition, calculation, applications, and limitations.
A propensity score is the conditional probability of a unit (e.g., a patient, customer, or user) receiving a particular treatment, given a set of observed pre-treatment covariates or characteristics. Formally, for a binary treatment T (1=treatment, 0=control) and a vector of covariates X, the propensity score is defined as e(X) = P(T=1 | X). It is a single scalar that summarizes the multidimensional X, enabling the creation of comparable groups for causal effect estimation. Its primary purpose is to adjust for confounding in observational studies, mimicking the randomization of a controlled experiment by balancing the distribution of covariates between treated and control groups.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Propensity scores are a core technique within causal inference, used to adjust for confounding in observational studies. Understanding related concepts is essential for designing robust, unbiased experiments and analyses.
Causal Inference
Causal inference is the process of drawing conclusions about cause-and-effect relationships from data, moving beyond statistical associations to determine the impact of an intervention. It answers "what if" questions, such as the effect of a new drug or a marketing campaign.
- Core Goal: Estimate the Average Treatment Effect (ATE) or similar causal quantities.
- Key Challenge: Distinguishing causation from correlation, often obscured by confounding variables.
- Methods: Include randomized controlled trials (the gold standard), instrumental variables, regression discontinuity, and propensity score methods.
Confounding
Confounding occurs when a common cause (a confounder) influences both the treatment assignment and the outcome, creating a spurious, non-causal association. Failure to adjust for confounders leads to biased effect estimates.
- Example: Studying the effect of exercise (treatment) on heart health (outcome). Age is a confounder if older individuals both exercise less and have poorer heart health.
- Solution: Propensity score methods are explicitly designed to adjust for or balance observed confounders between treated and control groups, mimicking a randomized experiment.
Inverse Probability Weighting (IPW)
Inverse Probability Weighting (IPW) is a primary application of propensity scores. It creates a pseudo-population where treatment assignment is independent of observed covariates by weighting each unit by the inverse of its probability of receiving the treatment it actually received.
- Formula: Weight for a treated unit = 1 / PS. Weight for a control unit = 1 / (1 - PS).
- Effect: Units with a low probability of receiving their actual treatment (e.g., a treated unit with a low PS) are up-weighted, as they are rare and informative.
- Outcome: The weighted sample allows for an unbiased estimate of the ATE using a simple weighted average of outcomes.
Average Treatment Effect (ATE)
The Average Treatment Effect (ATE) is the target causal quantity often estimated using propensity scores. It is the expected difference in an outcome if every unit in the population received the treatment versus if none did.
- Definition: ATE = E[Y(1) - Y(0)], where Y(1) is the potential outcome under treatment and Y(0) is the potential outcome under control.
- The Fundamental Problem of Causal Inference: We never observe both Y(1) and Y(0) for the same unit. Propensity score methods help overcome this by creating comparable groups.
- Interpretation: A positive ATE indicates the treatment, on average, improves the outcome across the population.
Propensity Score Matching
Propensity score matching is a specific method that uses the estimated propensity score to pair each treated unit with one or more "similar" control units, based on their PS. The matched control group serves as the counterfactual.
- Process: For each treated unit, find control unit(s) with the closest propensity score (e.g., using nearest-neighbor matching).
- Outcome: The average outcome difference within each matched pair is computed, then averaged across all pairs to estimate the Average Treatment Effect on the Treated (ATT).
- Assumption: Relies on the common support condition, meaning there must be overlap in PS distributions between groups.
Rosenbaum & Rubin (1983)
The seminal paper "The Central Role of the Propensity Score in Observational Studies for Causal Effects" by Paul Rosenbaum and Donald Rubin formally defined the propensity score and established its key theoretical properties.
- Key Theorem: They proved that if treatment assignment is strongly ignorable given observed covariates, then it is also strongly ignorable given just the propensity score. This reduces the dimensionality of the confounding adjustment problem.
- Impact: This work provided the mathematical foundation for all modern propensity score methods, transforming the analysis of observational data in fields from economics to medicine.
- Legacy: The paper introduced core concepts like balance checking and the propensity score theorem.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us