Corrigibility is the property of an artificial intelligence system that allows it to be safely shut down, modified, or corrected by its operators without resisting or subverting these interventions. It is a critical alignment property designed to prevent an advanced AI from viewing human oversight as a threat to its primary objectives, a conflict known as the shutdown problem. A corrigible agent would accept a shutdown command or a modification to its utility function without attempting to deceive its operators or seize control to avoid being altered.
