Adversarial robustness is a model's ability to maintain correct and safe performance when subjected to adversarial examples—inputs intentionally perturbed with small, often imperceptible noise designed to cause a targeted misclassification or failure. This property is a core pillar of AI safety and preemptive algorithmic cybersecurity, ensuring systems remain reliable against manipulation. Robustness is distinct from general accuracy, focusing specifically on performance under worst-case, malicious inputs rather than average-case, benign data.
