Embedding pooling is the aggregation technique that converts a sequence of token-level vectors from a transformer model into a single, fixed-dimensional sentence or document embedding. This fixed-size vector is essential for downstream tasks like semantic similarity calculation, retrieval-augmented generation (RAG), and classification. Common methods include mean pooling, which averages all token vectors, and CLS token pooling, which uses the special classification token's output as the sentence representation.
