Process Supervision is a machine learning training paradigm where a model receives feedback, typically in the form of a reward or correctness signal, for each individual step in its reasoning chain, rather than solely for the final output. This contrasts with outcome supervision, which only evaluates the end result. The core objective is to train models to produce more reliable, transparent, and logically sound step-by-step reasoning, directly improving the faithfulness and correctness of intermediate inference steps. It is a foundational technique for building robust Chain-of-Thought capabilities in language models and autonomous agents.
