Intent-to-Treat analysis is a statistical evaluation principle for randomized experiments where all participants are analyzed according to the group to which they were originally randomly assigned, regardless of whether they received, adhered to, or completed the intended intervention. This method preserves the randomization process, which is critical for establishing causal inference and providing an unbiased estimate of the Average Treatment Effect in real-world conditions where non-adherence and dropouts occur. It is the gold standard for primary analysis in clinical trials and high-stakes A/B testing.
Glossary
Intent-to-Treat Analysis
What is Intent-to-Treat Analysis?
Intent-to-Treat analysis is a foundational principle in the statistical evaluation of randomized controlled trials, including A/B tests for AI systems.
In A/B testing frameworks for AI models, ITT analysis evaluates the effect of offering a new model variant, not just its effect on users who fully interact with it. This prevents selection bias that can arise from analyzing only compliant users, which often overestimates treatment efficacy. By including all randomly assigned units—even those with protocol deviations or technical failures—ITT provides a conservative, pragmatic estimate of a model's impact when deployed at scale, directly informing production canary analysis and rollout decisions.
Key Principles of Intent-to-Treat Analysis
Intent-to-Treat analysis is the gold standard for evaluating the effectiveness of interventions in randomized controlled trials, preserving the integrity of randomization by analyzing all participants according to their original group assignment.
Preservation of Randomization
The cardinal principle of ITT analysis. It mandates analyzing all randomized participants in the groups to which they were originally assigned, regardless of protocol deviations, non-compliance, or crossover. This maintains the baseline comparability achieved by randomization, preventing selection bias that occurs if only compliant participants are analyzed. For example, in a drug trial, a participant assigned to the treatment group who never takes the medication is still analyzed as part of the treatment group.
Estimates Effectiveness, Not Efficacy
ITT provides a pragmatic estimate of an intervention's effectiveness in real-world conditions, not its ideal efficacy under perfect adherence. It answers the question: "What is the effect of offering or assigning the treatment?" This includes the consequences of non-adherence and dropouts, which are realistic parts of any deployment. In contrast, a Per-Protocol analysis estimates efficacy under ideal conditions. For CTOs, ITT results are often more relevant for business decisions as they reflect expected real-world performance.
Conservative Bias (Typically)
ITT analysis generally introduces a conservative bias toward the null hypothesis (i.e., no difference between groups). By including non-compliant participants in the treatment group, it dilutes the measured treatment effect. This is considered a strength in hypothesis testing, as it provides a more rigorous test. If a significant benefit is found under ITT, it is a robust finding. However, this bias is not guaranteed; in scenarios with non-compliance in the control group (e.g., control participants finding alternative treatment), the bias can be unpredictable.
Handling Missing Data
A critical challenge for ITT is managing missing outcome data from participants who withdraw or are lost to follow-up. A true ITT analysis requires outcomes for all randomized subjects. Common strategies include:
- Last Observation Carried Forward: Using the last available measurement.
- Multiple Imputation: Statistically predicting missing values based on other data.
- Worst/Best-Case Scenario Analysis: Imputing extreme values to test result robustness. The chosen method must be pre-specified in the trial protocol to avoid bias. Simply excluding participants with missing data violates the ITT principle.
Contrast with Per-Protocol & As-Treated
ITT is one of three primary analysis frameworks:
- Intent-to-Treat (ITT): Analyzes by original assignment. Preserves randomization, estimates effectiveness.
- Per-Protocol (PP): Analyzes only participants who perfectly followed the protocol. Estimates efficacy under ideal conditions but is prone to selection bias.
- As-Treated (AT): Analyzes participants by the treatment they actually received, regardless of original assignment. Severely breaks randomization and introduces major confounding. Modern trial reporting typically presents both ITT and PP analyses. A large discrepancy between them signals significant protocol non-adherence.
Application in A/B Testing & ML
In technology A/B testing (e.g., comparing two AI models), the ITT principle is applied as analysis by assigned variant. A user who is assigned to see Model B but has a session error and sees nothing is still counted in Model B's results. This measures the true business impact of deploying Model B, accounting for real-world failures. Violating this—by analyzing only users who successfully received the treatment—can overstate benefits. ITT is crucial for accurate causal inference from online experiments, ensuring the estimated Average Treatment Effect is unbiased for the decision to launch.
ITT vs. Per-Protocol Analysis
A comparison of two primary statistical analysis frameworks for randomized controlled trials (RCTs) and A/B tests, highlighting their differing approaches to participant adherence and the resulting implications for bias and generalizability.
| Analytical Feature | Intent-to-Treat (ITT) Analysis | Per-Protocol (PP) Analysis |
|---|---|---|
Defining Principle | Analyzes all participants according to their original, randomly assigned group, regardless of adherence, crossover, or dropout. | Analyzes only the subset of participants who completed the study protocol exactly as assigned, excluding non-adherent subjects. |
Primary Goal | To estimate the real-world effectiveness of assigning a treatment, preserving the benefits of randomization for causal inference. | To estimate the efficacy of the treatment under ideal, perfect-adherence conditions. |
Handling of Non-Adherence/Protocol Deviations | Participants who deviate, switch groups, or drop out are retained in their originally assigned group for analysis. | Participants who deviate from the protocol are excluded from the analysis entirely. |
Bias Risk from Exclusions | Minimizes selection bias by maintaining the original randomized groups, providing an unbiased estimate of the assignment effect. | Introduces high risk of selection bias, as excluded participants often differ systematically from those who adhere, breaking randomization. |
Statistical Power | Generally lower power for detecting a treatment effect, as non-adherence dilutes the observed difference between groups. | Generally higher power for detecting a treatment effect, as it compares 'pure' groups, but this can be misleading due to bias. |
Interpretation of Result | Reflects the pragmatic, policy-relevant effect of offering or deploying the intervention in a real-world setting. | Reflects the biological or theoretical maximum effect of the intervention if perfectly followed, often considered an efficacy estimate. |
Generalizability | High generalizability to the target population intended to receive the intervention, as it accounts for typical usage patterns. | Low generalizability, as results apply only to a highly selected, ideal-adherence subpopulation that may not exist in practice. |
Recommended Use Case | The gold-standard primary analysis for confirmatory RCTs and A/B tests to support causal claims about a treatment policy. | A secondary, exploratory analysis to understand potential efficacy, but its results must be interpreted with extreme caution due to bias. |
Intent-to-Treat Analysis in AI & Machine Learning
A core principle for analyzing randomized experiments that preserves the integrity of the original random assignment, critical for unbiased causal inference in model and feature testing.
Core Principle: Preserving Randomization
Intent-to-treat analysis is a statistical principle that mandates analyzing all participants in a randomized controlled trial according to the group to which they were originally assigned, regardless of whether they actually received, adhered to, or completed the intended intervention. This preserves the randomization, which is the foundation for unbiased causal inference. In AI A/B testing, this means a user assigned to the treatment group (e.g., a new recommendation model) is analyzed as part of that group even if a system error prevented the model from loading for them, or if they dropped out of the session immediately.
- Purpose: To avoid selection bias that arises when analyzing only compliant participants.
- Contrast with Per-Protocol Analysis: ITT is conservative but unbiased; per-protocol analysis (analyzing only those who perfectly received the treatment) can introduce bias by comparing non-equivalent groups.
Application in Model A/B Testing
In online experiments comparing AI models, ITT analysis ensures the estimated treatment effect reflects the real-world impact of offering a new model, not just its ideal performance. Key scenarios include:
- Partial Feature Rollouts: If a feature flag fails for a subset of users assigned to the treatment, those users remain in the treatment group for analysis.
- Model Latency or Failures: Users who experience timeouts or errors from the new model are not moved to the control group; their outcomes (e.g., no purchase) are attributed to the treatment.
- User Attrition: Users who abandon an app during an experiment are included in the analysis, with their final observed state used (e.g., 'did not convert').
This approach measures the average treatment effect of deploying the model in a production environment with all its inherent imperfections, providing a realistic business impact estimate.
Contrast with As-Treated & Per-Protocol Analysis
Understanding ITT requires contrasting it with alternative analytical approaches:
- As-Treated Analysis: Units are analyzed based on the treatment they actually received, not the one they were assigned. This breaks randomization and can severely bias results if the reasons for non-compliance are correlated with the outcome.
- Per-Protocol Analysis: A subset of as-treated analysis that only includes units that perfectly adhered to the experimental protocol. This maximizes internal validity for the 'perfect scenario' but sacrifices generalizability and introduces bias.
In AI, as-treated analysis might involve analyzing only the queries where the new model successfully returned a response. This could bias results if the model only succeeds on easier queries. ITT, by including all failures, gives a true picture of the system-wide effect.
Implementation & Assignment Tracking
Faithful ITT analysis depends on rigorous experiment logging. The system must persistently record two key facts for every user or request:
- Assignment Variant: The group (Control A, Treatment B) determined by the initial deterministic hashing of a user ID during traffic splitting.
- Final Outcome Metric: The business metric (click-through rate, conversion value, error rate) regardless of treatment delivery status.
This data must be immutable for analysis. Modern experiment platforms handle this by logging the assignment at the start of a user session and joining all subsequent events to this original assignment, ensuring the analysis aligns with the intent to treat.
Role in Causal Inference & Estimating SLOs
ITT analysis is the gold standard for estimating the causal effect of a deployment decision. It answers the question: "What happens if we turn this feature on for everyone?"
- Causal Foundation: By preserving randomization, ITT provides an unbiased estimate of the Average Treatment Effect on the entire population.
- Informs SLOs/SLIs: The results, which include errors and latency, directly inform realistic Service Level Objectives for AI services. For example, if a new model increases 95th percentile latency by 200ms for 5% of users due to failures, ITT captures this degradation, whereas per-protocol analysis might miss it.
- Guardrail Metrics: ITT is essential for accurately monitoring guardrail metrics (e.g., system error rate) during a canary launch or A/B test.
Common Pitfalls and the 'Peeking Problem'
Misapplying ITT can invalidate experiment results. Key pitfalls include:
- Analyzing Only Successful Exposures: A common error is filtering analytics to events where the 'treatment exposed' flag is true. This is an as-treated analysis, not ITT.
- The Peeking Problem: Repeatedly checking p-values before an experiment reaches its planned sample size inflates Type I error rates (false positives). While related to analysis timing, peeking is especially dangerous if combined with non-ITT filters, leading to premature and incorrect decisions.
- Ignoring Non-Compliance in Design: Not accounting for expected non-compliance (e.g., known integration failure rates) during statistical power and minimum detectable effect calculations can lead to underpowered experiments.
Proper ITT requires pre-registering the analysis plan and adhering to it, using methods like sequential testing with adjusted boundaries if early stopping is required.
Frequently Asked Questions
Intent-to-treat analysis is a foundational principle in the statistical evaluation of randomized experiments, crucial for maintaining the integrity of causal conclusions in A/B testing and clinical trials.
Intent-to-treat analysis is a principle for analyzing randomized controlled trials where all participants are analyzed according to the group to which they were originally randomly assigned, regardless of whether they received, adhered to, or completed the intended intervention. This method preserves the benefits of randomization, which ensures that treatment and control groups are statistically equivalent at baseline, thereby providing an unbiased estimate of the Average Treatment Effect as it would occur in a real-world setting where non-adherence and dropouts are common. In the context of A/B Testing for AI models, this means analyzing users based on the variant they were initially assigned to, even if a system error prevented the model from executing or the user's session was cut short.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Intent-to-Treat analysis is a core principle within the broader framework of rigorous experimental design. The following terms are essential for understanding the statistical and methodological context in which ITT operates.
Per-Protocol Analysis
Per-Protocol analysis is the methodological counterpoint to Intent-to-Treat. It analyzes only the subset of participants who fully adhered to the experimental protocol as assigned. This includes factors like:
- Receiving the full intended intervention
- Completing all required follow-up measurements
- Not crossing over to the other experimental group
While it estimates the efficacy of the treatment under ideal conditions, it is vulnerable to selection bias, as adherent participants often differ systematically from non-adherers.
Average Treatment Effect
The Average Treatment Effect is the primary causal quantity estimated in a randomized experiment. It is defined as the average difference in outcomes between the treatment and control groups across the entire population. In the context of ITT:
- The ITT analysis provides an estimate of the ATE.
- This estimate reflects the effectiveness of offering or assigning the treatment, not just its biological or algorithmic efficacy.
- It is often described as a policy-relevant effect, answering the question: "What happens if we deploy this model to everyone, knowing some will not use it correctly?"
Non-Compliance & Protocol Deviations
Non-compliance refers to any departure from the assigned experimental protocol by participants. In AI A/B testing, this manifests as:
- Cross-over: Users assigned to Model A somehow receiving outputs from Model B.
- Non-adherence: Users ignoring or failing to trigger the new AI feature being tested.
- Missing Data: System failures causing loss of outcome measurements.
ITT analysis preserves the integrity of randomization by including these deviations in the analysis according to the original assignment, preventing bias from post-randomization confounding.
As-Treated Analysis
As-Treated analysis is a flawed but sometimes reported method where participants are analyzed based on the treatment they actually received, regardless of initial assignment. Key pitfalls include:
- It completely breaks the randomization, introducing severe confounding.
- Factors that influence whether a user actually receives the treatment (e.g., being more engaged) are often also related to the outcome.
- This analysis can produce misleading estimates of treatment effect. ITT is preferred for primary analysis to maintain internal validity.
Randomized Controlled Trial
A Randomized Controlled Trial is the gold-standard experimental design for establishing causality. Its core principles are:
- Random Assignment: Participants are allocated to treatment or control by chance.
- Control Group: Provides a counterfactual baseline.
- Blinding: Masking assignment from participants and/or evaluators to reduce bias.
Intent-to-Treat is the definitive analysis principle for RCTs. It ensures the estimate of the Average Treatment Effect is unbiased, as it compares groups as they were formed by randomization.
Selection Bias
Selection bias occurs when the participants analyzed in a study are not representative of the population they are meant to reflect, due to systematic differences in selection. ITT directly combats a specific form of this:
- By analyzing all randomly assigned participants, ITT prevents bias from post-randomization selection (e.g., dropping non-compliant users).
- Alternative analyses like Per-Protocol are susceptible to this bias, as the "protocol-compliant" subset is a self-selected group that may be healthier, more tech-savvy, or otherwise atypical.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us