Preference elicitation is the foundational process of querying a source—often a human or an auxiliary AI model—to discover and formalize its underlying preferences. The goal is to construct a structured dataset or a mathematical reward function that accurately reflects these preferences, which is then used to train or align a target AI model. This process is critical for Reinforcement Learning from Human Feedback (RLHF) and its AI-assisted variant, Reinforcement Learning from AI Feedback (RLAIF).
