Kahneman-Tversky Optimization (KTO) is a machine learning algorithm for aligning language models that directly optimizes a policy using a loss function derived from prospect theory. Unlike methods like Direct Preference Optimization (DPO) that require explicit pairwise comparisons, KTO uses a simpler binary signal—whether a single response is desirable or undesirable—and models the perceived gain or loss relative to a reference point. This makes it more data-efficient and robust to noisy or imbalanced preference labels.
