Inferensys

Glossary

Locality-Sensitive Hashing (LSH)

Locality-Sensitive Hashing (LSH) is a family of hashing techniques that map similar input items to the same 'buckets' with high probability, enabling fast, approximate nearest neighbor search.
Developer reviewing semantic search engine results on laptop, relevance scores visible, technical search demo.
ALGORITHM

What is Locality-Sensitive Hashing (LSH)?

Locality-Sensitive Hashing (LSH) is a foundational algorithmic family for approximate nearest neighbor search, enabling fast similarity lookups in high-dimensional spaces like those created by embedding models.

Locality-Sensitive Hashing (LSH) is a probabilistic hashing technique designed so that similar input data points map to the same hash value, or 'bucket,' with high probability, while dissimilar points map to different buckets. Unlike cryptographic hashing, which maximizes output differences, LSH functions are constructed to preserve similarity in the hashed space. This property enables efficient approximate nearest neighbor (ANN) search by drastically reducing the number of candidate vectors that must be compared, trading perfect accuracy for massive gains in speed and memory efficiency when searching over large sets of vector embeddings.

The core mechanism involves applying multiple, randomized hash functions from an LSH family, such as those for cosine similarity or Euclidean distance. Candidates are retrieved from buckets where the query hashes collide. This process is often repeated with independent hash tables to increase recall. LSH is a critical component in embedding model integration, providing the scalable search backend for semantic similarity lookups in vector database infrastructure and Retrieval-Augmented Generation (RAG) architectures, where it works alongside graph-based indices like HNSW.

APPROXIMATE NEIGHBOR SEARCH

Key Characteristics of LSH

Locality-Sensitive Hashing (LSH) is defined by its probabilistic nature and trade-offs between speed, accuracy, and memory. These characteristics make it distinct from exact search algorithms and other approximate nearest neighbor (ANN) methods.

01

Probabilistic Guarantees

LSH provides probabilistic, not deterministic, guarantees for finding nearest neighbors. The core property is that the probability of collision (hashing to the same bucket) is higher for similar items than for dissimilar ones. This is formalized for a hash family H and a similarity function S where, for any two points p and q, Pr[h(p) = h(q)] is proportional to S(p, q). System accuracy is tuned by adjusting parameters like the number of hash tables (L) and hash functions per table (k), allowing engineers to trade recall for speed.

02

Sub-Linear Query Time

The primary advantage of LSH is achieving sub-linear query time relative to the dataset size N. Instead of comparing a query to all N items (O(N) time), LSH only searches within the hash buckets the query maps to. This makes search time roughly O(L + N^(1/(1+ρ))) for well-tuned parameters, where ρ is a quality parameter of the hash family. This is critical for searching billion-scale vector databases where linear scans are infeasible.

  • Key Benefit: Enables real-time semantic search over massive embedding sets.
  • Comparison: Contrasts with exact k-NN (linear time) and graph-based HNSW (logarithmic time).
03

Hash Function Families

LSH is not a single algorithm but a framework defined by the choice of hash function family, which is tied to a specific distance or similarity metric. Each family is designed to preserve locality for that metric.

Common LSH families include:

  • SimHash (Cosine Similarity): Uses random hyperplanes to hash vectors based on their angular similarity.
  • p-stable LSH (Lp Distance): Uses p-stable distributions (e.g., Gaussian for L2/Euclidean distance) to project and quantize vectors.
  • MinHash (Jaccard Similarity): Designed for set similarity, operating on the minimum permutation values of sets. Selecting the correct family is the first engineering decision when implementing LSH.
04

Amplification via AND/OR Constructions

LSH performance is engineered using AND and OR constructions to amplify the gap between the collision probabilities of similar and dissimilar points.

  • AND Construction (Concatenation): Combines k hash functions; a collision requires agreement on all k. This increases precision by making buckets finer-grained, reducing false positives.
  • OR Construction (Multiple Tables): Uses L independent hash tables. An item is a candidate if it collides in any table. This increases recall by providing multiple chances for similar items to hash together. The standard practice is to use L tables, each defined by an AND construction of k functions, allowing precise control over the recall-precision curve.
05

Memory vs. Accuracy Trade-off

LSH explicitly trades memory footprint for query speed and accuracy. Storing L hash tables increases memory overhead linearly with L. Each table requires storing the hash buckets and the indices of the vectors within them. This is a direct trade-off: more tables (higher L) boost recall but demand more RAM. Conversely, reducing memory by using fewer tables lowers recall. This characteristic differentiates LSH from memory-efficient graph-based methods like HNSW, which often provide higher recall for the same memory but can have more expensive index construction.

06

Application in Vector Search Pipelines

In modern Retrieval-Augmented Generation (RAG) and semantic search architectures, LSH is typically used as a first-stage retriever. Its role is to rapidly filter a billion-scale vector database down to a manageable candidate set (e.g., hundreds or thousands of vectors). This candidate set is then passed to a more accurate but expensive re-ranking stage, often using a cross-encoder or exact k-NN within the small candidate pool. This hybrid approach combines the speed of LSH with the precision of more sophisticated models, optimizing overall system latency and accuracy.

LOCALITY-SENSITIVE HASHING (LSH)

Frequently Asked Questions

Locality-Sensitive Hashing (LSH) is a cornerstone algorithm for fast, approximate similarity search in high-dimensional spaces like those created by embedding models. These questions address its core mechanics, trade-offs, and role in modern AI systems.

Locality-Sensitive Hashing (LSH) is a probabilistic hashing algorithm designed to map similar high-dimensional data points to the same hash bucket with high probability, enabling fast approximate nearest neighbor (ANN) search. It works by applying a family of hash functions that are "sensitive" to the distance metric of interest (e.g., cosine similarity, Euclidean distance).

Core Mechanism:

  • Hash Function Design: LSH functions are constructed so that the probability of collision (hashing to the same value) is high for nearby points and low for distant points. For cosine similarity, a common LSH family uses random hyperplanes: a hash bit is determined by which side of a random hyperplane a vector falls on (sign(dot(r, x))).
  • Amplification: A single hash function is too noisy. To increase precision, multiple functions are combined into a single "hash signature." To increase recall (finding more true neighbors), multiple independent hash tables are constructed. A query is hashed with all functions, and candidates are retrieved from the matching buckets across all tables.
  • Candidate Verification: Retrieved candidates from the hash buckets are then ranked using the exact distance metric (e.g., cosine similarity) to return the final, most similar results.

The process trades perfect accuracy for orders-of-magnitude speed and memory efficiency compared to exhaustive search, making it fundamental for searching large-scale embedding databases.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.