Reranking is a two-stage retrieval process where a large set of candidate documents is first retrieved using a fast, approximate method like vector search or BM25, and then a more powerful, computationally expensive model re-scores this candidate list to produce a final, more precise ranking. This cost-quality trade-off is fundamental, as the initial first-stage retrieval (e.g., using an Approximate Nearest Neighbor search) maximizes recall by quickly finding a broad set of potentially relevant items from a massive corpus. The subsequent second-stage reranker, often a cross-encoder model, performs a deep, joint analysis of the query against each candidate to compute a refined relevance score, dramatically improving the order of the final top results.
