Chain-of-Thought Fine-Tuning is a supervised learning technique that adapts a pre-trained language model by training it on examples where the desired output includes not just a final answer, but a complete, step-by-step reasoning trace. Unlike Chain-of-Thought Prompting, which elicits reasoning at inference time, CoT-FT bakes the reasoning capability directly into the model's parameters. This is achieved by constructing a dataset of (problem, reasoning_chain, answer) tuples and performing standard fine-tuning or instruction tuning, teaching the model to autoregressively generate the logical scaffolding before the conclusion.
