Inferensys

Glossary

Intent-to-Treat Analysis

Intent-to-treat analysis is a principle for analyzing randomized experiments where all participants are analyzed according to the group to which they were originally assigned, regardless of whether they received or adhered to the intervention.
Research scientist tracking AI experiments on laptop, experiment results visible, casual lab environment.
A/B TESTING FRAMEWORKS

What is Intent-to-Treat Analysis?

Intent-to-Treat analysis is a foundational principle in the statistical evaluation of randomized controlled trials, including A/B tests for AI systems.

Intent-to-Treat analysis is a statistical evaluation principle for randomized experiments where all participants are analyzed according to the group to which they were originally randomly assigned, regardless of whether they received, adhered to, or completed the intended intervention. This method preserves the randomization process, which is critical for establishing causal inference and providing an unbiased estimate of the Average Treatment Effect in real-world conditions where non-adherence and dropouts occur. It is the gold standard for primary analysis in clinical trials and high-stakes A/B testing.

In A/B testing frameworks for AI models, ITT analysis evaluates the effect of offering a new model variant, not just its effect on users who fully interact with it. This prevents selection bias that can arise from analyzing only compliant users, which often overestimates treatment efficacy. By including all randomly assigned units—even those with protocol deviations or technical failures—ITT provides a conservative, pragmatic estimate of a model's impact when deployed at scale, directly informing production canary analysis and rollout decisions.

RANDOMIZED CONTROLLED TRIALS

Key Principles of Intent-to-Treat Analysis

Intent-to-Treat analysis is the gold standard for evaluating the effectiveness of interventions in randomized controlled trials, preserving the integrity of randomization by analyzing all participants according to their original group assignment.

01

Preservation of Randomization

The cardinal principle of ITT analysis. It mandates analyzing all randomized participants in the groups to which they were originally assigned, regardless of protocol deviations, non-compliance, or crossover. This maintains the baseline comparability achieved by randomization, preventing selection bias that occurs if only compliant participants are analyzed. For example, in a drug trial, a participant assigned to the treatment group who never takes the medication is still analyzed as part of the treatment group.

02

Estimates Effectiveness, Not Efficacy

ITT provides a pragmatic estimate of an intervention's effectiveness in real-world conditions, not its ideal efficacy under perfect adherence. It answers the question: "What is the effect of offering or assigning the treatment?" This includes the consequences of non-adherence and dropouts, which are realistic parts of any deployment. In contrast, a Per-Protocol analysis estimates efficacy under ideal conditions. For CTOs, ITT results are often more relevant for business decisions as they reflect expected real-world performance.

03

Conservative Bias (Typically)

ITT analysis generally introduces a conservative bias toward the null hypothesis (i.e., no difference between groups). By including non-compliant participants in the treatment group, it dilutes the measured treatment effect. This is considered a strength in hypothesis testing, as it provides a more rigorous test. If a significant benefit is found under ITT, it is a robust finding. However, this bias is not guaranteed; in scenarios with non-compliance in the control group (e.g., control participants finding alternative treatment), the bias can be unpredictable.

04

Handling Missing Data

A critical challenge for ITT is managing missing outcome data from participants who withdraw or are lost to follow-up. A true ITT analysis requires outcomes for all randomized subjects. Common strategies include:

  • Last Observation Carried Forward: Using the last available measurement.
  • Multiple Imputation: Statistically predicting missing values based on other data.
  • Worst/Best-Case Scenario Analysis: Imputing extreme values to test result robustness. The chosen method must be pre-specified in the trial protocol to avoid bias. Simply excluding participants with missing data violates the ITT principle.
05

Contrast with Per-Protocol & As-Treated

ITT is one of three primary analysis frameworks:

  • Intent-to-Treat (ITT): Analyzes by original assignment. Preserves randomization, estimates effectiveness.
  • Per-Protocol (PP): Analyzes only participants who perfectly followed the protocol. Estimates efficacy under ideal conditions but is prone to selection bias.
  • As-Treated (AT): Analyzes participants by the treatment they actually received, regardless of original assignment. Severely breaks randomization and introduces major confounding. Modern trial reporting typically presents both ITT and PP analyses. A large discrepancy between them signals significant protocol non-adherence.
06

Application in A/B Testing & ML

In technology A/B testing (e.g., comparing two AI models), the ITT principle is applied as analysis by assigned variant. A user who is assigned to see Model B but has a session error and sees nothing is still counted in Model B's results. This measures the true business impact of deploying Model B, accounting for real-world failures. Violating this—by analyzing only users who successfully received the treatment—can overstate benefits. ITT is crucial for accurate causal inference from online experiments, ensuring the estimated Average Treatment Effect is unbiased for the decision to launch.

EXPERIMENTAL ANALYSIS METHODS

ITT vs. Per-Protocol Analysis

A comparison of two primary statistical analysis frameworks for randomized controlled trials (RCTs) and A/B tests, highlighting their differing approaches to participant adherence and the resulting implications for bias and generalizability.

Analytical FeatureIntent-to-Treat (ITT) AnalysisPer-Protocol (PP) Analysis

Defining Principle

Analyzes all participants according to their original, randomly assigned group, regardless of adherence, crossover, or dropout.

Analyzes only the subset of participants who completed the study protocol exactly as assigned, excluding non-adherent subjects.

Primary Goal

To estimate the real-world effectiveness of assigning a treatment, preserving the benefits of randomization for causal inference.

To estimate the efficacy of the treatment under ideal, perfect-adherence conditions.

Handling of Non-Adherence/Protocol Deviations

Participants who deviate, switch groups, or drop out are retained in their originally assigned group for analysis.

Participants who deviate from the protocol are excluded from the analysis entirely.

Bias Risk from Exclusions

Minimizes selection bias by maintaining the original randomized groups, providing an unbiased estimate of the assignment effect.

Introduces high risk of selection bias, as excluded participants often differ systematically from those who adhere, breaking randomization.

Statistical Power

Generally lower power for detecting a treatment effect, as non-adherence dilutes the observed difference between groups.

Generally higher power for detecting a treatment effect, as it compares 'pure' groups, but this can be misleading due to bias.

Interpretation of Result

Reflects the pragmatic, policy-relevant effect of offering or deploying the intervention in a real-world setting.

Reflects the biological or theoretical maximum effect of the intervention if perfectly followed, often considered an efficacy estimate.

Generalizability

High generalizability to the target population intended to receive the intervention, as it accounts for typical usage patterns.

Low generalizability, as results apply only to a highly selected, ideal-adherence subpopulation that may not exist in practice.

Recommended Use Case

The gold-standard primary analysis for confirmatory RCTs and A/B tests to support causal claims about a treatment policy.

A secondary, exploratory analysis to understand potential efficacy, but its results must be interpreted with extreme caution due to bias.

A/B TESTING FRAMEWORKS

Intent-to-Treat Analysis in AI & Machine Learning

A core principle for analyzing randomized experiments that preserves the integrity of the original random assignment, critical for unbiased causal inference in model and feature testing.

01

Core Principle: Preserving Randomization

Intent-to-treat analysis is a statistical principle that mandates analyzing all participants in a randomized controlled trial according to the group to which they were originally assigned, regardless of whether they actually received, adhered to, or completed the intended intervention. This preserves the randomization, which is the foundation for unbiased causal inference. In AI A/B testing, this means a user assigned to the treatment group (e.g., a new recommendation model) is analyzed as part of that group even if a system error prevented the model from loading for them, or if they dropped out of the session immediately.

  • Purpose: To avoid selection bias that arises when analyzing only compliant participants.
  • Contrast with Per-Protocol Analysis: ITT is conservative but unbiased; per-protocol analysis (analyzing only those who perfectly received the treatment) can introduce bias by comparing non-equivalent groups.
02

Application in Model A/B Testing

In online experiments comparing AI models, ITT analysis ensures the estimated treatment effect reflects the real-world impact of offering a new model, not just its ideal performance. Key scenarios include:

  • Partial Feature Rollouts: If a feature flag fails for a subset of users assigned to the treatment, those users remain in the treatment group for analysis.
  • Model Latency or Failures: Users who experience timeouts or errors from the new model are not moved to the control group; their outcomes (e.g., no purchase) are attributed to the treatment.
  • User Attrition: Users who abandon an app during an experiment are included in the analysis, with their final observed state used (e.g., 'did not convert').

This approach measures the average treatment effect of deploying the model in a production environment with all its inherent imperfections, providing a realistic business impact estimate.

03

Contrast with As-Treated & Per-Protocol Analysis

Understanding ITT requires contrasting it with alternative analytical approaches:

  • As-Treated Analysis: Units are analyzed based on the treatment they actually received, not the one they were assigned. This breaks randomization and can severely bias results if the reasons for non-compliance are correlated with the outcome.
  • Per-Protocol Analysis: A subset of as-treated analysis that only includes units that perfectly adhered to the experimental protocol. This maximizes internal validity for the 'perfect scenario' but sacrifices generalizability and introduces bias.

In AI, as-treated analysis might involve analyzing only the queries where the new model successfully returned a response. This could bias results if the model only succeeds on easier queries. ITT, by including all failures, gives a true picture of the system-wide effect.

04

Implementation & Assignment Tracking

Faithful ITT analysis depends on rigorous experiment logging. The system must persistently record two key facts for every user or request:

  1. Assignment Variant: The group (Control A, Treatment B) determined by the initial deterministic hashing of a user ID during traffic splitting.
  2. Final Outcome Metric: The business metric (click-through rate, conversion value, error rate) regardless of treatment delivery status.

This data must be immutable for analysis. Modern experiment platforms handle this by logging the assignment at the start of a user session and joining all subsequent events to this original assignment, ensuring the analysis aligns with the intent to treat.

05

Role in Causal Inference & Estimating SLOs

ITT analysis is the gold standard for estimating the causal effect of a deployment decision. It answers the question: "What happens if we turn this feature on for everyone?"

  • Causal Foundation: By preserving randomization, ITT provides an unbiased estimate of the Average Treatment Effect on the entire population.
  • Informs SLOs/SLIs: The results, which include errors and latency, directly inform realistic Service Level Objectives for AI services. For example, if a new model increases 95th percentile latency by 200ms for 5% of users due to failures, ITT captures this degradation, whereas per-protocol analysis might miss it.
  • Guardrail Metrics: ITT is essential for accurately monitoring guardrail metrics (e.g., system error rate) during a canary launch or A/B test.
06

Common Pitfalls and the 'Peeking Problem'

Misapplying ITT can invalidate experiment results. Key pitfalls include:

  • Analyzing Only Successful Exposures: A common error is filtering analytics to events where the 'treatment exposed' flag is true. This is an as-treated analysis, not ITT.
  • The Peeking Problem: Repeatedly checking p-values before an experiment reaches its planned sample size inflates Type I error rates (false positives). While related to analysis timing, peeking is especially dangerous if combined with non-ITT filters, leading to premature and incorrect decisions.
  • Ignoring Non-Compliance in Design: Not accounting for expected non-compliance (e.g., known integration failure rates) during statistical power and minimum detectable effect calculations can lead to underpowered experiments.

Proper ITT requires pre-registering the analysis plan and adhering to it, using methods like sequential testing with adjusted boundaries if early stopping is required.

INTENT-TO-TREAT ANALYSIS

Frequently Asked Questions

Intent-to-treat analysis is a foundational principle in the statistical evaluation of randomized experiments, crucial for maintaining the integrity of causal conclusions in A/B testing and clinical trials.

Intent-to-treat analysis is a principle for analyzing randomized controlled trials where all participants are analyzed according to the group to which they were originally randomly assigned, regardless of whether they received, adhered to, or completed the intended intervention. This method preserves the benefits of randomization, which ensures that treatment and control groups are statistically equivalent at baseline, thereby providing an unbiased estimate of the Average Treatment Effect as it would occur in a real-world setting where non-adherence and dropouts are common. In the context of A/B Testing for AI models, this means analyzing users based on the variant they were initially assigned to, even if a system error prevented the model from executing or the user's session was cut short.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.