The Carlini & Wagner (C&W) attack is a powerful, optimization-based white-box attack formulated as a constrained minimization problem to find the smallest possible perturbation that causes a target machine learning model to misclassify an input. Introduced by Nicholas Carlini and David Wagner in 2017, it directly optimizes a custom loss function that balances the adversarial objective—such as causing a targeted or untargeted misclassification—against the L_p norm (typically L0, L2, or L∞) of the perturbation. This method is considered one of the strongest attacks for evaluating adversarial robustness and is particularly known for its effectiveness against early defensive techniques like defensive distillation.
Glossary
Carlini & Wagner Attack (C&W)

What is the Carlini & Wagner Attack (C&W)?
A definitive technical overview of the Carlini & Wagner (C&W) attack, a seminal optimization-based method for generating minimal adversarial perturbations.
The attack's core innovation is its use of a change-of-variable technique to handle the box constraint that keeps adversarial examples within valid input bounds (e.g., [0, 255] for images), allowing standard optimization libraries to be used. It employs an iterative optimizer, like Adam, to solve the non-convex problem, often resulting in adversarial examples with imperceptibly small perturbations. In adversarial testing, the C&W attack serves as a benchmark for stress-testing model defenses, as its ability to find minimal perturbations provides a rigorous lower bound on a model's vulnerability to gradient-based exploitation.
Key Characteristics of the C&W Attack
The Carlini & Wagner (C&W) attack is a seminal optimization-based white-box attack, renowned for its effectiveness in generating minimal adversarial perturbations. It is a standard benchmark for evaluating the robustness of neural networks, particularly against defensive distillation.
Optimization-Based Formulation
The C&W attack frames adversarial example generation as a constrained optimization problem. Instead of a simple gradient step, it directly minimizes a custom objective function:
- Objective: Minimize the perturbation magnitude (e.g., L2 norm) while ensuring the input is misclassified.
- Formalization: It solves
minimize ||δ||_p + c * f(x+δ)subject tox+δ ∈ [0,1]^n, wherefis a specially designed loss function that is negative when the attack succeeds. - Advantage: This formulation allows for precise control over the perturbation's size, often finding smaller, more imperceptible adversarial examples than fast, single-step methods like FGSM.
Custom Loss Functions (f₆, f₇)
A core innovation is the design of hinge-like loss functions that are better suited for optimization than standard cross-entropy loss. The most common variants are f₆ and f₇:
- f₆(x') = max(max_{i ≠ t}(Z(x')_i) - Z(x')_t, -κ): Encourages the target class
t(for targeted attacks) to have a logitZ(x')_tthat is at leastκhigher than the next highest logit. - f₇(x') = max(softmax(x')t - max{i ≠ t}(softmax(x')_i), -κ): Operates on softmax probabilities instead of logits.
- Purpose: These functions are differentiable and produce a clear, smooth gradient when the attack is not yet successful, enabling efficient gradient-based optimization. The
κparameter controls the confidence of the misclassification.
Box-Constrained Optimization
The attack must ensure the adversarial example x' = x + δ remains a valid input (e.g., pixel values between 0 and 1). The C&W attack uses a change-of-variables technique to handle this box constraint inherently.
- Method: Instead of optimizing
δdirectly, it optimizes a new variablew, wherex' = 1/2(tanh(w)+1). Thetanhfunction naturally bounds outputs to[-1, 1], which are then scaled to[0, 1]. - Benefit: This eliminates the need for clumsy projection steps after each gradient update, leading to more stable and effective optimization. It guarantees the adversarial example is always within the valid input space.
Benchmark Against Defensive Distillation
The C&W attack was specifically designed to break defensive distillation, a then-popular defense technique. Distillation trains a second model using soft labels from the first, which was believed to smooth gradients and make attacks harder.
- Key Finding: The paper demonstrated that defensive distillation primarily caused gradient masking, making gradients appear small or zero to simple attacks like FGSM, but not true robustness.
- Result: The C&W attack's optimization approach circumvented this masking, successfully generating adversarial examples against distilled networks. This proved distillation was not a robust defense and shifted the field's focus towards adversarial training.
L₂, L₀, and L∞ Attack Variants
The framework is flexible and can generate perturbations measured under different distance metrics, each posing a different threat model:
- L₂ Attack: The primary variant, minimizing the Euclidean distance. Produces small, diffuse changes across many pixels.
- L₀ Attack: An iterative, greedy variant that minimizes the number of altered pixels. It uses the L₂ attack to identify important pixels and then fixes them to their adversarial values.
- L∞ Attack: Minimizes the maximum change to any single pixel. Requires a modified objective function and is generally less efficient than dedicated L∞ attacks like Projected Gradient Descent (PGD).
- Significance: This demonstrated that a single, well-formulated optimization approach could threaten models under multiple perceptual and security metrics.
Strong White-Box Benchmark
Due to its effectiveness and optimization foundation, the C&W L₂ attack is considered a strong, standard benchmark for evaluating adversarial robustness in academic research.
- Role in Evaluation: A model's robust accuracy is often reported against the C&W attack (with a given perturbation budget) to measure its resilience to sophisticated white-box threats.
- Limitations: Its main drawback is computational cost. It requires hundreds to thousands of gradient steps per example, making it slower than iterative methods like PGD.
- Legacy: It established that evaluating defenses requires attacks powerful enough to overcome gradient masking, fundamentally raising the bar for proving robustness in machine learning.
How the Carlini & Wagner Attack Works
The Carlini & Wagner (C&W) attack is a seminal, optimization-based white-box method for generating minimal adversarial perturbations, primarily used to stress-test model defenses.
The Carlini & Wagner attack is a powerful, optimization-based white-box adversarial attack designed to find the smallest possible perturbation that causes a target model to misclassify an input. It formulates the search for an adversarial example as a constrained optimization problem, minimizing a custom loss function that balances perturbation size (measured by L0, L2, or L∞ norms) with the attack's success. This method is considered a benchmark for evaluating adversarial robustness, as it often defeats gradient-masking defenses like defensive distillation.
The attack operates by using an optimization solver, typically Adam, to iteratively adjust a candidate adversarial input. A key innovation is its use of a change-of-variables to ensure the perturbation stays within a pre-defined bound, and a specially designed objective function that directly encourages misclassification. Due to its effectiveness and precision in measuring perturbation magnitude, the C&W attack is a standard tool in red-teaming and security audits to establish a lower bound on a model's vulnerability to evasion attacks.
C&W Attack vs. Other White-Box Attacks
A technical comparison of the Carlini & Wagner (C&W) attack's characteristics against other prominent white-box adversarial attack methods, highlighting key differentiators in optimization strategy, perturbation quality, and defensive bypass capability.
| Feature / Metric | Carlini & Wagner (C&W) | Fast Gradient Sign Method (FGSM) | Projected Gradient Descent (PGD) |
|---|---|---|---|
Primary Optimization Method | Custom loss function (e.g., f6) with Lp norm penalty, solved via gradient descent | Single-step gradient ascent using the sign of the gradient | Multi-step iterative gradient ascent with projection |
Perturbation Goal | Minimal L0, L2, or L∞ distortion for guaranteed misclassification | Maximal loss increase within a fixed L∞ epsilon budget | Find worst-case adversarial example within a fixed Lp norm ball |
Attack Iteration Type | Iterative (typically 100-1000s of steps) | Single-step (non-iterative) | Iterative (typically 10-100 steps) |
Typical Use Case | Evaluating robustness of defenses (e.g., defensive distillation), benchmark for minimal perturbation | Fast, inexpensive robustness check and basis for adversarial training | Strong, standard benchmark for adversarial training and robustness evaluation |
Strength Against Gradient Masking | |||
Computational Cost | High (requires many optimization steps) | Very Low (single backward pass) | Medium (multiple forward/backward passes) |
Primary Norm Constraint | L0, L2, or L∞ (configurable objective) | L∞ (hard constraint) | L∞ or L2 (hard constraint via projection) |
Guarantee of Adversarial Example | High (optimizes directly for misclassification) | Low (single step may not cross boundary) | High (iterative search within constraint) |
Efficacy Against Adversarially Trained Models | High (designed to bypass obfuscated gradients) | Low (easily defended by adversarial training) | Medium (the primary attack used for adversarial training) |
Frequently Asked Questions
The Carlini & Wagner (C&W) attack is a seminal, optimization-based white-box attack method designed to generate highly effective adversarial examples with minimal perturbation. It is a cornerstone for rigorously evaluating model robustness, particularly against defenses like defensive distillation.
The Carlini & Wagner (C&W) attack is a powerful, optimization-based white-box attack method designed to generate adversarial examples with minimal perturbation, specifically crafted to defeat gradient-obfuscating defenses like defensive distillation. It formulates the search for an adversarial example as a constrained optimization problem, minimizing a custom loss function that balances the perturbation magnitude (measured by an Lp norm) with the success of causing a target misclassification. Unlike simpler attacks like FGSM, C&W directly optimizes for the smallest possible perturbation that reliably fools the model, making it a gold standard for evaluating adversarial robustness.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
The Carlini & Wagner attack is a foundational method in adversarial machine learning. Understanding its relationship to other key concepts is critical for a comprehensive security evaluation.
White-Box Attack
A white-box attack is executed with full knowledge of and access to the target model's internal architecture, parameters, and gradients. This is the primary threat model for the C&W attack, which directly utilizes the model's gradient information to craft optimal perturbations.
- Contrast with Black-Box: White-box attacks are generally more powerful and efficient but require significant insider access.
- C&W Context: The attack formulates an optimization problem that minimizes perturbation subject to the model's loss function, a process dependent on white-box access.
Adversarial Robustness
Adversarial robustness is the property of a machine learning model that measures its ability to maintain correct predictions when subjected to adversarial attacks. The C&W attack is a primary benchmark for measuring this property due to its effectiveness.
- Evaluation Standard: A model's resistance to the C&W attack is a strong indicator of its general robustness.
- Robust Accuracy: This is the accuracy measured on a test set that includes adversarial examples (often generated by C&W), providing a more realistic performance metric than standard accuracy.
Projected Gradient Descent (PGD)
Projected Gradient Descent is a powerful, iterative white-box attack and the cornerstone of modern adversarial training. Like C&W, it is an optimization-based attack but uses a different methodological approach.
- Methodology: PGD performs multiple, small-step FGSM iterations, projecting the perturbation back into a valid norm ball (e.g., L∞) after each step.
- Comparison to C&W: While PGD is highly effective for L∞ constraints, the C&W attack is often more effective at finding minimal L2 or L0 norm perturbations and is specifically designed to defeat gradient-masking defenses like defensive distillation.
Defensive Distillation
Defensive distillation is a training technique designed to improve model robustness by training a second model (the distilled model) using the softmax probabilities (soft labels) of a first model as training labels. This smooths the model's decision surface.
- Primary Target: The C&W attack was famously developed to break defensive distillation, demonstrating that the technique provided a false sense of security through gradient masking.
- Historical Significance: The success of C&W against distillation was a pivotal moment, shifting the field's focus towards provable robustness and attacks that circumvent gradient obfuscation.
Adversarial Training
Adversarial training is a defensive technique that improves a model's robustness by including adversarial examples in its training dataset. The strength of the adversary used during training directly impacts the resulting robustness.
- Training Adversary: PGD is the most common adversary for adversarial training due to its iterative strength. However, models are also evaluated against C&W to test for robustness beyond the training threat model.
- Benchmarking: A robust model trained with PGD adversaries should also demonstrate high robust accuracy against C&W attacks, indicating generalized resilience.
Gradient Masking
Gradient masking (or gradient obfuscation) is a phenomenon where a defense technique causes a model's gradients to become uninformative, sparse, or random, giving a false sense of security against gradient-based white-box attacks.
- C&W's Role: The C&W attack was instrumental in exposing defenses that relied on gradient masking, such as defensive distillation, shattered gradients, and stochastic defenses.
- Attack Adaptation: C&W uses optimization tricks (like using logits instead of softmax probabilities and a change-of-variable) to bypass masked or shattered gradients, making it a reliable tool for evaluating whether a defense is truly robust or merely obfuscatory.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us