Constitutional AI (CAI) is a machine learning alignment technique where an AI model is trained to generate, critique, and revise its own outputs according to a predefined set of written principles, known as a constitution. This process, often implemented via Reinforcement Learning from AI Feedback (RLAIF), creates a self-supervised training loop. The model learns to produce responses that are helpful, harmless, and honest by internally evaluating them against constitutional rules like "choose the response that is most supportive of life, liberty, and personal security."
