Inferensys

Glossary

Null Hypothesis

The null hypothesis is a default statistical proposition that there is no effect or no difference between groups, which an experiment aims to test and potentially reject based on observed data.
Data scientist building training data pipeline on laptop, data preprocessing visible, technical workspace.
A/B TESTING FRAMEWORKS

What is a Null Hypothesis?

The foundational statistical assumption tested in controlled experiments like A/B tests.

A null hypothesis (H₀) is a default statistical proposition that there is no effect, no difference, or no relationship between defined groups or variables. In the context of A/B testing and experimental design, it is the assumption that any observed difference in a key metric between a treatment group (e.g., a new AI model) and a control group is due to random chance alone. The goal of an experiment is to gather sufficient evidence to reject the null hypothesis in favor of an alternative hypothesis (H₁) that asserts a true effect exists.

Rejecting the null hypothesis typically involves calculating a p-value and comparing it to a pre-defined significance level (alpha). A low p-value indicates that the observed data is unlikely under the null hypothesis, providing statistical grounds for rejection. Crucially, failing to reject the null does not prove it true; it merely indicates insufficient evidence for an effect. This framework is central to frequentist inference and underpins rigorous performance metric comparison in Evaluation-Driven Development.

STATISTICAL FOUNDATIONS

Key Characteristics of the Null Hypothesis

The null hypothesis (H₀) is the foundational assumption in statistical hypothesis testing, positing no effect or no difference. Understanding its formal properties is critical for designing valid A/B tests and interpreting p-values.

01

Default Position of No Effect

The null hypothesis is the default or status quo assumption that any observed difference in an experiment is due to random chance, not a systematic effect. It is a precise, testable statement about a population parameter (e.g., 'The mean click-through rate for variant A equals the mean for variant B'). The burden of proof lies on the alternative hypothesis (H₁) to provide sufficient evidence for rejection.

  • Purpose: Serves as a skeptical baseline that experimental data must overcome.
  • Example: In an A/B test for a new recommendation algorithm, H₀ states: 'The new algorithm does not increase average user engagement time compared to the old algorithm.'
02

Falsifiable and Precise

A properly formulated null hypothesis must be mathematically falsifiable. It makes a specific claim about a population parameter (like a mean, proportion, or variance) that can be contradicted by sample data. Vague statements cannot be tested.

  • Key Property: It is structured for potential rejection, not proof. You can never 'accept' or 'prove' H₀; you can only fail to reject it based on insufficient evidence.
  • Statistical Model: It defines the expected distribution of the test statistic under the assumption of no effect (e.g., a t-distribution centered at zero for a difference in means).
03

Direct Link to P-Value and Significance

The p-value is calculated directly under the assumption that H₀ is true. It represents the probability of observing data as extreme as, or more extreme than, the sample results, assuming the null hypothesis is correct. A small p-value indicates that the observed data is unlikely under H₀, leading to its rejection in favor of H₁.

  • Interpretation: A p-value of 0.03 means there is a 3% chance of seeing the observed effect (or larger) if H₀ were true.
  • Threshold: The pre-defined significance level (alpha, α)—commonly 0.05—is the threshold for rejecting H₀. If p ≤ α, H₀ is rejected.
04

Basis for Type I and Type II Errors

Hypothesis testing errors are defined in relation to the truth of H₀.

  • Type I Error (False Positive): Incorrectly rejecting a true null hypothesis. The probability of this error is controlled by the significance level (α).
  • Type II Error (False Negative): Failing to reject a false null hypothesis. The probability of this error is denoted by beta (β). Statistical power is 1 - β, the probability of correctly rejecting a false H₀.

These error trade-offs are fundamental to experiment design, determining required sample sizes via power analysis.

05

Not a Statement of Equality (Only)

While often an assertion of equality (e.g., μ₁ = μ₂), H₀ can also be formulated as a statement of 'no worse than' or using an inequality for one-sided tests.

  • Two-Sided H₀: μ₁ = μ₂ (Tests for any difference)
  • One-Sided H₀: μ₁ ≤ μ₂ (Tests specifically for an increase)

The choice between one-sided and two-sided tests must be made a priori, based on the research question, as it affects the p-value calculation and interpretation.

06

Operational Role in A/B Testing

In A/B testing frameworks, the null hypothesis is the engine of decision-making. It allows the translation of a business question ('Does this new UI improve conversion?') into a statistical procedure.

  • Assignment & Measurement: Users are randomly assigned to control (A) and treatment (B) groups. A metric (e.g., conversion rate) is measured for both.
  • Test Execution: A statistical test (e.g., a chi-squared test for proportions, a t-test for means) computes a test statistic and corresponding p-value under H₀.
  • Decision Rule: If p-value ≤ α, reject H₀ and conclude a statistically significant difference exists. The experiment then shifts to analyzing the estimated effect size and its confidence interval for practical significance.
A/B TESTING FRAMEWORKS

How Null Hypothesis Testing Works in AI Experiments

The null hypothesis is the foundational concept of statistical hypothesis testing, providing the formal baseline against which experimental results in AI and machine learning are rigorously evaluated.

The null hypothesis (H₀) is a default statistical proposition that there is no effect, no difference, or no relationship between defined groups or variables. In an AI experiment, such as an A/B test comparing two model versions, the null hypothesis typically states that any observed performance difference is due to random chance. The experiment's goal is to gather sufficient evidence to reject the null hypothesis in favor of an alternative hypothesis (H₁) that a true effect exists, using metrics like the p-value and a pre-defined significance level (alpha).

The testing framework involves calculating a test statistic from the observed data (e.g., the difference in mean accuracy between model groups) and determining the probability (p-value) of seeing a result this extreme if the null hypothesis were true. If this p-value is less than the alpha threshold (commonly 0.05), the null is rejected, indicating a statistically significant result. This formal mechanism controls the rate of Type I errors (false positives) and is central to causal inference from experimental data, providing a mathematically rigorous alternative to heuristic performance comparisons.

STATISTICAL HYPOTHESIS TESTING

Null Hypothesis vs. Alternative Hypothesis

A comparison of the two opposing statements that form the foundation of statistical inference in A/B testing and experimentation.

FeatureNull Hypothesis (H₀)Alternative Hypothesis (H₁ or Hₐ)

Core Definition

A default statement of 'no effect' or 'no difference' between groups or conditions.

A statement proposing a specific effect, difference, or relationship that the experiment aims to find evidence for.

Assumed Truth at Start

Goal of Statistical Test

To gather evidence against it, with the aim of rejection.

To gather evidence for it, by rejecting the null.

Typical Mathematical Form

Equality: e.g., μ₁ = μ₂, p₁ = p₂, θ = 0

Inequality or difference: e.g., μ₁ ≠ μ₂, p₁ > p₂, θ ≠ 0

Relationship to P-Value

The p-value is calculated assuming H₀ is true. A small p-value indicates the observed data is unlikely under H₀.

The p-value is not directly calculated for H₁. Rejecting H₀ provides indirect support for H₁.

Outcome of Test (α=0.05)

"Fail to reject H₀" (p-value ≥ 0.05). Evidence is insufficient to discard the default position.

"Reject H₀ in favor of H₁" (p-value < 0.05). Statistically significant evidence for an effect.

Risk of Incorrect Conclusion (Type I Error)

Probability = α (significance level). Falsely rejecting a true null hypothesis (false positive).

Risk of Incorrect Conclusion (Type II Error)

Probability = β. Failing to reject a false null hypothesis (false negative). Power = 1 - β.

Role in Experiment Design

Defines the baseline for calculating test statistics and p-values. Essential for determining sample size and power.

Defines the minimum detectable effect (MDE) used in power analysis to determine the required sample size.

NULL HYPOTHESIS

Frequently Asked Questions

The null hypothesis is a foundational concept in statistical hypothesis testing, forming the default assumption that any observed effect in an experiment is due to random chance. This FAQ addresses its role in A/B testing, machine learning evaluation, and rigorous experimentation.

The null hypothesis (H₀) is a default statistical proposition that there is no effect, no difference, or no relationship between defined groups or variables in an experiment. In the context of A/B testing for AI models, it typically states that the performance metric (e.g., click-through rate, accuracy) for the new model variant (Treatment B) is equal to that of the baseline model (Control A). The experiment's goal is to gather sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis (H₁), which posits a real difference exists. Failing to reject H₀ does not prove it true; it merely indicates insufficient data to confidently claim an effect.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.