A bi-encoder is a neural network architecture for dense retrieval where a query (e.g., a user question) and a document (e.g., a memory chunk) are encoded independently into fixed-dimensional vector embeddings by two separate, but often identical, encoder models. This architecture enables pre-computation and indexing of all document embeddings, allowing for extremely fast similarity search via a vector database using metrics like cosine similarity. It is the core engine behind scalable semantic search in Retrieval-Augmented Generation (RAG) systems.
