InfoNCE loss is a contrastive learning objective that maximizes the mutual information between positive pairs of data points while minimizing it for randomly sampled negative pairs. It frames representation learning as a classification problem, where a model must identify the single correct positive example from a set of distractors. This mechanism is foundational for training models like CLIP to align different modalities into a unified embedding space.
