A bi-encoder is a neural network architecture, typically based on a transformer, that processes two input sequences (like a query and a document) independently through the same or twin encoders to produce separate, fixed-dimensional vector embeddings. This design enables the pre-computation and indexing of one set of embeddings (e.g., a document corpus), allowing for highly efficient retrieval via approximate nearest neighbor (ANN) search. The similarity between two inputs is then calculated as a simple, fast function (like cosine similarity) of their two independent embeddings.
