Glossary

Propensity Score Matching

Propensity score matching is a quasi-experimental method in causal inference that reduces selection bias by matching treated and untreated units with similar probabilities of receiving a treatment based on observed covariates.

Get in touch Learn more

Stylish WeWork-like workspace with hot desks and document wall, professional searching through enterprise knowledge base on a mounted ultrawide display, warm industrial pendants overhead.

CAUSAL INFERENCE

What is Propensity Score Matching?

Propensity score matching is a quasi-experimental method used in causal inference to reduce selection bias by matching treated and untreated units with similar probabilities of receiving the treatment based on observed covariates.

Propensity score matching is a statistical technique for estimating causal effects from observational data by simulating the conditions of a randomized experiment. It reduces selection bias by matching units (e.g., users, patients) that received a 'treatment' (like a new AI model) with comparable control units that did not, based on their estimated probability—or propensity score—of receiving that treatment given their observed characteristics. This creates balanced comparison groups for a more reliable estimate of the Average Treatment Effect.

In A/B testing frameworks, PSM is used for post-hoc analysis to validate results or analyze non-randomized data, such as when users self-select into groups. The process involves estimating propensity scores (often via logistic regression), applying a matching algorithm (e.g., nearest neighbor), and checking for covariate balance. While powerful, its validity depends on the ignorability assumption—that all confounding variables are observed and included—making it a cornerstone of rigorous evaluation-driven development for CTOs assessing model impact.

CAUSAL INFERENCE METHOD

Key Characteristics of Propensity Score Matching

Bias Reduction via Balancing

The core function of propensity score matching is to reduce selection bias by creating a balanced comparison group. It does this by matching each treated unit with one or more control units that have a similar propensity score—the estimated probability of receiving the treatment given their observed covariates (e.g., age, income, prior behavior).

Goal: Make the treatment and control groups statistically comparable on observed variables, mimicking randomization.
Result: Differences in outcomes between the matched groups can be more credibly attributed to the treatment effect, not pre-existing differences.

The Propensity Score (e(X))

The propensity score, denoted as e(X), is a single scalar summary of all observed pre-treatment covariates (X). It is defined as the conditional probability of assignment to a treatment given the observed covariates: e(X) = Pr(T=1 | X).

Estimation: Typically estimated using a logistic regression or a machine learning classifier (e.g., gradient boosting) where the treatment assignment is the dependent variable and covariates are predictors.
Role: According to the Rosenbaum-Rubin Theorem, if treatment assignment is strongly ignorable given X, then it is also strongly ignorable given the propensity score e(X). This allows for matching on a single dimension.

Common Matching Algorithms

Once propensity scores are estimated, units are paired using specific algorithms. The choice affects the quality and variance of the causal estimate.

Nearest Neighbor Matching: Each treated unit is matched with the control unit whose propensity score is closest. Can be performed with or without replacement.
Caliper Matching: A tolerance level (caliper) is set (e.g., 0.2 standard deviations of the propensity score). Matches are only made if the score difference is within this caliper, improving match quality.
Stratification/Subclassification: Units are divided into strata (e.g., quintiles) based on their propensity score. The treatment effect is estimated within each stratum and then averaged.
Optimal Matching: Minimizes the total absolute distance across all matches, often producing more balanced samples than greedy nearest-neighbor.

Assumption of Strong Ignorability

Propensity score matching relies on the critical Strong Ignorability or Unconfoundedness assumption. This has two parts:

Conditional Independence: The potential outcomes (Y(1), Y(0)) are independent of the treatment assignment (T) given the observed covariates X. Formally: (Y(1), Y(0)) ⟂ T | X.
Positivity/Overlap: For all possible values of X, there is a positive probability of receiving either treatment or control. Formally: 0 < Pr(T=1 | X) < 1.

Implication: This assumption means there are no unobserved confounders. Violation of this assumption (i.e., hidden bias) invalidates the causal conclusions from PSM.

Post-Matching Diagnostics

After matching, analysts must check if balance was achieved. This is a crucial validation step.

Standardized Mean Difference (SMD): The primary metric. For each covariate, calculate the difference in means between treated and control groups, divided by the pooled standard deviation. An SMD below 0.1 is typically considered good balance.
Variance Ratios: The ratio of variances for each covariate between groups should be close to 1.
Visual Checks: Examine plots like love plots (forest plots of SMDs before/after matching) and propensity score distribution histograms to assess overlap improvement.

Contrast with Randomized Experiments

PSM is a quasi-experimental method used when randomized controlled trials (RCTs) are infeasible, unethical, or too costly.

RCT Gold Standard: Random assignment ensures groups are balanced on both observed and unobserved covariates on average.
PSM Limitation: Only balances observed covariates. It cannot adjust for unobserved confounders, which remains its fundamental weakness.
Use Case: Commonly applied in observational studies in economics (e.g., evaluating job training programs), healthcare (e.g., drug effectiveness from electronic health records), and marketing (e.g., measuring campaign impact from customer data).

METHOD COMPARISON

Propensity Score Matching vs. Other Causal Methods

A technical comparison of propensity score matching against other primary methodologies for estimating causal effects from observational data, highlighting core assumptions, implementation complexity, and typical use cases.

Feature / Dimension	Propensity Score Matching (PSM)	Regression Adjustment	Instrumental Variables (IV)	Difference-in-Differences (DiD)
Primary Goal	Reduce selection bias by creating a balanced comparison group	Statistically control for confounding variables	Address unobserved confounding via an external instrument	Control for time-invariant unobserved confounding
Key Assumption	Conditional Independence (Ignorability) & Overlap	Correct model specification (linearity, no omitted variables)	Valid instrument: Relevant & Excludable	Parallel trends in pre-treatment period
Handles Unobserved Confounders?
Data Requirements	Rich observed covariates for matching	Rich observed covariates for modeling	A valid instrumental variable	Panel or repeated cross-sectional data
Implementation Complexity	Medium (matching algorithm, balance checks)	Low (standard regression)	High (instrument validation, 2SLS)	Medium (pre/post period construction)
Output	Estimated Average Treatment Effect on the Treated (ATT)	Conditional Average Treatment Effect (CATE)	Local Average Treatment Effect (LATE)	Average Treatment Effect (ATE)
Common Use Case	Evaluating a medical treatment using patient records	Estimating price elasticity from sales data	Estimating effect of education on earnings using policy changes	Measuring impact of a new law across regions over time
Risk of Model Misspecification	Low (non-parametric matching)	High (relies on functional form)	Medium (relies on IV assumptions)	Medium (relies on parallel trends)

PROPENSITY SCORE MATCHING

Frequently Asked Questions

A quasi-experimental method for estimating causal effects from observational data by reducing selection bias.

Propensity score matching is a quasi-experimental method used in causal inference to estimate the effect of a treatment, policy, or intervention by reducing selection bias from observed confounding variables. It works by modeling the probability (the propensity score) that a unit (e.g., a user, patient, or customer) would receive the treatment based on its observed covariates. Treated units are then matched with untreated control units that have a similar propensity score, creating a balanced comparison group where the distribution of observed confounders is statistically equivalent. The average treatment effect on the treated is then estimated by comparing outcomes between the matched pairs, approximating the conditions of a randomized controlled trial.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

CAUSAL INFERENCE & EXPERIMENTATION

Related Terms

Propensity score matching is a core technique within causal inference, a field dedicated to estimating cause-and-effect from observational data. These related terms define the broader methodological and statistical landscape.

Causal Inference

Causal inference is the process of drawing conclusions about cause-and-effect relationships from data. Unlike correlation, it seeks to estimate the impact of an intervention (treatment). Core methods include:

Randomized Controlled Trials: The gold standard, where random assignment eliminates confounding.
Quasi-Experimental Methods: Used when randomization isn't possible (e.g., propensity score matching, difference-in-differences).
Structural Causal Models: A framework using directed acyclic graphs to encode assumptions about data-generating processes.

Average Treatment Effect

The Average Treatment Effect is the primary target of causal inference. It represents the average difference in outcomes between a treatment group and a control group across a population.

ATE: The effect for the entire population.
ATT: Average Treatment Effect on the Treated (the effect for those who actually received the treatment).
ATC: Average Treatment Effect on the Controls. Propensity score matching is often used to estimate the ATT, answering: 'What was the effect for those who received the treatment?'

Selection Bias

Selection bias is the systematic error that occurs when the treated and untreated groups differ in ways that affect the outcome, independent of the treatment itself. This is the fundamental problem propensity score matching aims to solve.

Sources: Self-selection, non-random program enrollment, or confounding variables.
Consequence: Observed correlation does not equal causation. For example, comparing outcomes of patients who chose a drug versus those who didn't may reflect underlying health differences, not just drug efficacy.

Confounding Variables

A confounding variable is a factor that influences both the treatment assignment and the outcome, creating a spurious association. It is the primary driver of selection bias.

Example: In studying a training program's effect on salary, prior education confounds the analysis if more educated individuals are both more likely to take the program and earn higher salaries regardless.
Role in PSM: Propensity score matching attempts to balance observed confounders (e.g., age, education, income) between the treated and matched control units, simulating randomization.

Stratified Sampling

Stratified sampling is a probability sampling technique where a population is divided into homogeneous subgroups (strata) based on key characteristics before sampling. It ensures all subgroups are adequately represented.

Relation to PSM: Propensity score matching can be viewed as a form of post-hoc stratification. Instead of pre-defining strata, units are grouped by their estimated propensity score (e.g., 0.0-0.1, 0.1-0.2), and matching occurs within these 'score strata' to improve balance.

Instrumental Variables

Instrumental Variables is an alternative causal inference method used when unobserved confounding is suspected. It relies on finding a variable (the instrument) that:

Correlates with the treatment assignment.
Affects the outcome only through its effect on the treatment (exclusion restriction).

Comparison to PSM: While PSM addresses observed confounding, IV methods aim to handle unobserved confounding. However, finding a valid instrument is often challenging. The methods are complementary tools in the causal inference toolkit.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.