BM25 (Best Matching 25) is a probabilistic ranking function used by search engines to estimate the relevance of documents to a given search query. It builds upon the TF-IDF (Term Frequency-Inverse Document Frequency) framework but introduces critical non-linear term frequency saturation and document length normalization. This prevents very long documents from dominating results simply by containing many query term repetitions, making BM25 more robust for real-world corpora with varied document lengths.
