Kahneman-Tversky Optimization (KTO) is a preference optimization algorithm for aligning large language models that uses a loss function based on prospect theory. Unlike Reinforcement Learning from Human Feedback (RLHF) or Direct Preference Optimization (DPO), which require datasets of paired comparisons (A is preferred to B), KTO trains using simple binary signals indicating only whether a single output is desirable or undesirable. This significantly reduces the complexity and cost of human feedback collection.
