Inferensys

Glossary

Deterministic Hashing

Deterministic hashing is a method used in experiment assignment where a user's identifier is passed through a hash function to produce a consistent, repeatable output, ensuring the user is always assigned to the same experimental variant.
Overhead shot of a beautifully lit strategy meeting in a modern WeWork hot desk area, designers and executives gathered around a live AI system diagram projected on smart table surface.
A/B TESTING FRAMEWORKS

What is Deterministic Hashing?

A core technique for consistent user assignment in online experiments.

Deterministic hashing is a method for experiment assignment where a user's unique identifier is passed through a cryptographic hash function to produce a consistent, repeatable numeric output, ensuring the user is always assigned to the same experimental variant across sessions and devices. This technique provides deterministic allocation, which is essential for maintaining user experience consistency and ensuring clean, unconfounded data in A/B testing and multi-armed bandit frameworks. It operates independently of any server-side state, relying solely on the immutable properties of the hash function and the input identifier.

The process is foundational to traffic splitting and works by mapping the hash output to a specific bucket or range corresponding to a test variant. Common hash functions include MD5, SHA-256, or MurmurHash, chosen for speed and uniform distribution. This method contrasts with random assignment, as it guarantees idempotent assignments, preventing users from toggling between variants, which could distort cohort analysis and average treatment effect measurements. It is a critical component for building reliable, stateless experimentation platforms.

A/B TESTING FRAMEWORKS

Key Characteristics of Deterministic Hashing

Deterministic hashing is a foundational technique for consistent user assignment in online experiments. Its core properties ensure reliable, repeatable, and scalable traffic allocation.

01

Consistent Assignment

The primary function of deterministic hashing is to guarantee that a given user identifier (e.g., a user ID, session ID, or device ID) is always mapped to the same experimental variant. This is achieved by passing the identifier through a cryptographic hash function like SHA-256, which produces a fixed-length output (a hash). The hash is then converted into a number used for assignment. This consistency is critical for preventing assignment churn, where a user flickers between variants, corrupting experiment data and user experience.

02

Uniform Distribution

A high-quality hash function distributes inputs uniformly across its output space. This property ensures that user IDs are spread evenly across all possible hash values, leading to a near-perfect random assignment. For an A/B test with a 50/50 split, approximately half of all possible hash values will map to variant A and half to variant B. This uniformity is essential for creating statistically equivalent groups, minimizing bias, and ensuring that any observed outcome differences are due to the treatment effect, not uneven group composition.

03

Determinism vs. Randomness

This technique is deterministic, not random, for a given input. However, from the perspective of an individual user whose ID is unknown or unpredictable, the assignment appears random. This is a crucial distinction:

  • Deterministic: Hash(User_ID_123) always equals variant_b.
  • Random-Like: The set of all users is assigned as if randomly shuffled. This pseudo-randomness satisfies the requirement for randomized controlled trials while providing the engineering benefit of repeatable assignments, which is vital for debugging and replaying experiment logs.
04

Salting for Isolation

To prevent inter-experiment correlation—where a user's assignment in one experiment dictates their assignment in another—a salt (a unique experiment identifier) is appended to the user ID before hashing. For example: Hash(User_ID + "experiment_123_salt"). This ensures assignments are independent across experiments. Without salting, a user assigned to the control group in one test would always be in the control group for all tests using the same hashing logic, creating systemic bias and making it impossible to run concurrent, orthogonal experiments.

05

Stateless and Scalable

The assignment logic requires no central state or database lookup. Any service or edge device can independently compute a user's variant by applying the same hash function, salt, and allocation rule. This makes the system:

  • Highly scalable, as there is no coordination overhead or stateful service bottleneck.
  • Fault-tolerant, as assignment logic is decentralized.
  • Fast, involving only a single hash computation. This is a key enabler for high-throughput applications like web servers, mobile apps, and content delivery networks, where assignment decisions must be made in milliseconds for millions of concurrent users.
06

Precise Traffic Control

By converting the hash output to a numeric value within a known range (e.g., 0 to 9999), engineers can implement exact traffic splits. For a 15% allocation to a new model, users whose hash-derived number falls in the range 0-1499 are assigned to the treatment. This allows for granular control (e.g., 1%, 0.1% canary launches) and easy reallocation. It also enables sticky bucketing, where users remain in their assigned "bucket" even if the experiment's traffic percentage is adjusted mid-flight, preserving cohort integrity for longitudinal analysis.

A/B TESTING FRAMEWORKS

How Deterministic Hashing Works

A core technique for consistent user assignment in online experiments.

Deterministic hashing is a method for experiment assignment where a user's unique identifier is passed through a cryptographic hash function to produce a consistent, repeatable numeric output. This output is then mapped to a specific experimental variant (e.g., A or B), ensuring the same user is always assigned to the same variant across sessions and devices. This determinism is critical for maintaining consistent user experiences and preventing assignment bias in A/B tests.

The process relies on functions like SHA-256 or MurmurHash which are designed to be pseudorandom, distributing users uniformly across buckets. The hash output modulo the number of buckets determines the assignment. This method provides reproducibility and statistical integrity, as user cohorts remain stable. It is a foundational component of traffic splitting systems, enabling reliable comparison of model performance and feature rollouts without user-level contamination.

EXPERIMENT ASSIGNMENT

Deterministic Hashing vs. Alternative Assignment Methods

A technical comparison of core methodologies for assigning users to variants in A/B testing and other controlled experiments.

Feature / CharacteristicDeterministic HashingTrue Random AssignmentRound-Robin Assignment

Assignment Consistency

Requires Stateful Tracking

Guarantees Equal Distribution

Handles User Re-Identification

Statistical Independence

Implementation Complexity

Low

Medium

Very Low

Typical Use Case

A/B Testing, Feature Rollouts

Clinical Trials, Academic Studies

Load Balancing, Simple Demos

Vulnerability to Assignment Bias

Low (if hash is uniform)

None

High (predictable sequence)

DETERMINISTIC HASHING

Frequently Asked Questions

Deterministic hashing is a foundational technique for consistent user assignment in online experiments. These questions address its core mechanics, applications, and trade-offs within A/B testing and broader AI evaluation frameworks.

Deterministic hashing is a method for assigning users to experimental variants by passing a stable user identifier through a cryptographic hash function to produce a consistent, repeatable output that determines group placement.

It works via a straightforward pipeline:

  1. Input: A unique, immutable user identifier (e.g., user_id, session_id, or device_id) is selected.
  2. Hashing: This identifier is passed through a hash function like SHA-256 or MurmurHash3. For a given input, the function always produces the same fixed-length output (hash digest).
  3. Assignment: The hash digest is converted into a number (e.g., by taking the modulo). This number is mapped to a predefined range corresponding to experimental variants (e.g., 0-49 for control, 50-99 for treatment).

This ensures that the same user, identified by the same key, is guaranteed to be placed into the same variant every time they are encountered, providing consistency across sessions and devices.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.