Pairwise comparisons are a data collection methodology in machine learning where an annotator—human or AI—is presented with two candidate responses to the same prompt and selects the one they prefer. This binary choice generates the foundational preference data used to train reward models and optimize policies via algorithms like Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF). The technique transforms subjective preference into a structured, machine-readable format for alignment.
