Semantic similarity is a quantitative measure of how closely the meanings of two pieces of text, images, or other data align, based on their conceptual or contextual likeness rather than superficial lexical overlap. In machine learning systems, this is typically calculated by comparing the vector embeddings—dense numerical representations—of the inputs, using metrics like cosine similarity or Euclidean distance to gauge their proximity in a shared embedding space. This capability is fundamental to Retrieval-Augmented Generation (RAG), semantic search, and clustering, allowing models to retrieve contextually relevant information.
