A sharded index is a distributed search architecture where a large vector index is partitioned into smaller, manageable pieces called shards across multiple machines or nodes. This design parallelizes query processing, allowing simultaneous searches across all shards to reduce latency and aggregate results. Crucially, it enables horizontal scaling, overcoming the memory and compute limits of a single server to handle billion-scale vector datasets. Sharding is a core technique in production vector database infrastructure.
