Reasoning Distillation is a supervised fine-tuning technique where a smaller student model is trained to mimic not just the final answers, but the explicit step-by-step reasoning process of a larger, more capable teacher model. The teacher, often using Chain-of-Thought (CoT) prompting, generates detailed reasoning traces for a dataset of problems. These traces, paired with the problems and final answers, form the training data used to teach the student to produce similar logical sequences, thereby compressing advanced reasoning into a more efficient model.
