Best-of-N sampling is an inference-time alignment technique where, for a given prompt, a base language model generates N candidate responses. A separate reward model or preference model, trained on human or AI feedback, then scores and ranks these candidates, selecting the single highest-scoring output for final delivery. This process acts as a filter, leveraging the reward model's learned preferences to consistently surface higher-quality, safer, or more helpful responses from the base model's distribution.
