A cross-encoder is a transformer-based neural network that processes two input sequences—such as a query and a candidate document—simultaneously through a single encoder with full cross-attention between all tokens, outputting a single, fine-grained relevance score or classification label. Unlike bi-encoders that produce separate embeddings for approximate search, cross-encoders perform exhaustive, joint reasoning over the input pair, enabling them to capture complex semantic interactions and subtle linguistic nuances that determine true relevance. This architecture is the core component of the reranking stage in a two-stage retrieval pipeline, where it refines results from a fast, approximate first-pass search.
