Preference modeling is the process of training a machine learning model, typically called a reward model, to predict a scalar score representing the desirability of an output based on learned preferences. It is the foundational step in alignment pipelines like Reinforcement Learning from Human Feedback (RLHF), where the model learns from datasets of pairwise comparisons or ranked responses, effectively distilling qualitative human judgments into a quantitative, differentiable signal for optimization.
