Glossary

Carlini & Wagner Attack (C&W)

The Carlini & Wagner (C&W) attack is a powerful, optimization-based white-box adversarial attack designed to generate adversarial examples with minimal perturbation, primarily used to rigorously evaluate model defenses like defensive distillation.

Get in touch Learn more

ML engineer managing model versions on laptop, version history visible, technical Git-like workflow.

ADVERSARIAL TESTING

What is the Carlini & Wagner Attack (C&W)?

A definitive technical overview of the Carlini & Wagner (C&W) attack, a seminal optimization-based method for generating minimal adversarial perturbations.

The Carlini & Wagner (C&W) attack is a powerful, optimization-based white-box attack formulated as a constrained minimization problem to find the smallest possible perturbation that causes a target machine learning model to misclassify an input. Introduced by Nicholas Carlini and David Wagner in 2017, it directly optimizes a custom loss function that balances the adversarial objective—such as causing a targeted or untargeted misclassification—against the L_p norm (typically L0, L2, or L∞) of the perturbation. This method is considered one of the strongest attacks for evaluating adversarial robustness and is particularly known for its effectiveness against early defensive techniques like defensive distillation.

The attack's core innovation is its use of a change-of-variable technique to handle the box constraint that keeps adversarial examples within valid input bounds (e.g., [0, 255] for images), allowing standard optimization libraries to be used. It employs an iterative optimizer, like Adam, to solve the non-convex problem, often resulting in adversarial examples with imperceptibly small perturbations. In adversarial testing, the C&W attack serves as a benchmark for stress-testing model defenses, as its ability to find minimal perturbations provides a rigorous lower bound on a model's vulnerability to gradient-based exploitation.

ADVERSARIAL TESTING

Key Characteristics of the C&W Attack

The Carlini & Wagner (C&W) attack is a seminal optimization-based white-box attack, renowned for its effectiveness in generating minimal adversarial perturbations. It is a standard benchmark for evaluating the robustness of neural networks, particularly against defensive distillation.

Optimization-Based Formulation

The C&W attack frames adversarial example generation as a constrained optimization problem. Instead of a simple gradient step, it directly minimizes a custom objective function:

Objective: Minimize the perturbation magnitude (e.g., L2 norm) while ensuring the input is misclassified.
Formalization: It solves minimize ||δ||_p + c * f(x+δ) subject to x+δ ∈ [0,1]^n, where f is a specially designed loss function that is negative when the attack succeeds.
Advantage: This formulation allows for precise control over the perturbation's size, often finding smaller, more imperceptible adversarial examples than fast, single-step methods like FGSM.

Custom Loss Functions (f₆, f₇)

A core innovation is the design of hinge-like loss functions that are better suited for optimization than standard cross-entropy loss. The most common variants are f₆ and f₇:

f₆(x') = max(max_{i ≠ t}(Z(x')_i) - Z(x')_t, -κ): Encourages the target class t (for targeted attacks) to have a logit Z(x')_t that is at least κ higher than the next highest logit.
f₇(x') = max(softmax(x')t - max{i ≠ t}(softmax(x')_i), -κ): Operates on softmax probabilities instead of logits.
Purpose: These functions are differentiable and produce a clear, smooth gradient when the attack is not yet successful, enabling efficient gradient-based optimization. The κ parameter controls the confidence of the misclassification.

Box-Constrained Optimization

The attack must ensure the adversarial example x' = x + δ remains a valid input (e.g., pixel values between 0 and 1). The C&W attack uses a change-of-variables technique to handle this box constraint inherently.

Method: Instead of optimizing δ directly, it optimizes a new variable w, where x' = 1/2(tanh(w)+1). The tanh function naturally bounds outputs to [-1, 1], which are then scaled to [0, 1].
Benefit: This eliminates the need for clumsy projection steps after each gradient update, leading to more stable and effective optimization. It guarantees the adversarial example is always within the valid input space.

Benchmark Against Defensive Distillation

The C&W attack was specifically designed to break defensive distillation, a then-popular defense technique. Distillation trains a second model using soft labels from the first, which was believed to smooth gradients and make attacks harder.

Key Finding: The paper demonstrated that defensive distillation primarily caused gradient masking, making gradients appear small or zero to simple attacks like FGSM, but not true robustness.
Result: The C&W attack's optimization approach circumvented this masking, successfully generating adversarial examples against distilled networks. This proved distillation was not a robust defense and shifted the field's focus towards adversarial training.

L₂, L₀, and L∞ Attack Variants

The framework is flexible and can generate perturbations measured under different distance metrics, each posing a different threat model:

L₂ Attack: The primary variant, minimizing the Euclidean distance. Produces small, diffuse changes across many pixels.
L₀ Attack: An iterative, greedy variant that minimizes the number of altered pixels. It uses the L₂ attack to identify important pixels and then fixes them to their adversarial values.
L∞ Attack: Minimizes the maximum change to any single pixel. Requires a modified objective function and is generally less efficient than dedicated L∞ attacks like Projected Gradient Descent (PGD).
Significance: This demonstrated that a single, well-formulated optimization approach could threaten models under multiple perceptual and security metrics.

Strong White-Box Benchmark

Due to its effectiveness and optimization foundation, the C&W L₂ attack is considered a strong, standard benchmark for evaluating adversarial robustness in academic research.

Role in Evaluation: A model's robust accuracy is often reported against the C&W attack (with a given perturbation budget) to measure its resilience to sophisticated white-box threats.
Limitations: Its main drawback is computational cost. It requires hundreds to thousands of gradient steps per example, making it slower than iterative methods like PGD.
Legacy: It established that evaluating defenses requires attacks powerful enough to overcome gradient masking, fundamentally raising the bar for proving robustness in machine learning.

ADVERSARIAL TESTING

How the Carlini & Wagner Attack Works

The Carlini & Wagner (C&W) attack is a seminal, optimization-based white-box method for generating minimal adversarial perturbations, primarily used to stress-test model defenses.

The Carlini & Wagner attack is a powerful, optimization-based white-box adversarial attack designed to find the smallest possible perturbation that causes a target model to misclassify an input. It formulates the search for an adversarial example as a constrained optimization problem, minimizing a custom loss function that balances perturbation size (measured by L0, L2, or L∞ norms) with the attack's success. This method is considered a benchmark for evaluating adversarial robustness, as it often defeats gradient-masking defenses like defensive distillation.

The attack operates by using an optimization solver, typically Adam, to iteratively adjust a candidate adversarial input. A key innovation is its use of a change-of-variables to ensure the perturbation stays within a pre-defined bound, and a specially designed objective function that directly encourages misclassification. Due to its effectiveness and precision in measuring perturbation magnitude, the C&W attack is a standard tool in red-teaming and security audits to establish a lower bound on a model's vulnerability to evasion attacks.

OPTIMIZATION-BASED ATTACK COMPARISON

C&W Attack vs. Other White-Box Attacks

A technical comparison of the Carlini & Wagner (C&W) attack's characteristics against other prominent white-box adversarial attack methods, highlighting key differentiators in optimization strategy, perturbation quality, and defensive bypass capability.

Feature / Metric	Carlini & Wagner (C&W)	Fast Gradient Sign Method (FGSM)	Projected Gradient Descent (PGD)
Primary Optimization Method	Custom loss function (e.g., f6) with Lp norm penalty, solved via gradient descent	Single-step gradient ascent using the sign of the gradient	Multi-step iterative gradient ascent with projection
Perturbation Goal	Minimal L0, L2, or L∞ distortion for guaranteed misclassification	Maximal loss increase within a fixed L∞ epsilon budget	Find worst-case adversarial example within a fixed Lp norm ball
Attack Iteration Type	Iterative (typically 100-1000s of steps)	Single-step (non-iterative)	Iterative (typically 10-100 steps)
Typical Use Case	Evaluating robustness of defenses (e.g., defensive distillation), benchmark for minimal perturbation	Fast, inexpensive robustness check and basis for adversarial training	Strong, standard benchmark for adversarial training and robustness evaluation
Strength Against Gradient Masking
Computational Cost	High (requires many optimization steps)	Very Low (single backward pass)	Medium (multiple forward/backward passes)
Primary Norm Constraint	L0, L2, or L∞ (configurable objective)	L∞ (hard constraint)	L∞ or L2 (hard constraint via projection)
Guarantee of Adversarial Example	High (optimizes directly for misclassification)	Low (single step may not cross boundary)	High (iterative search within constraint)
Efficacy Against Adversarially Trained Models	High (designed to bypass obfuscated gradients)	Low (easily defended by adversarial training)	Medium (the primary attack used for adversarial training)

ADVERSARIAL TESTING

Frequently Asked Questions

The Carlini & Wagner (C&W) attack is a seminal, optimization-based white-box attack method designed to generate highly effective adversarial examples with minimal perturbation. It is a cornerstone for rigorously evaluating model robustness, particularly against defenses like defensive distillation.

The Carlini & Wagner (C&W) attack is a powerful, optimization-based white-box attack method designed to generate adversarial examples with minimal perturbation, specifically crafted to defeat gradient-obfuscating defenses like defensive distillation. It formulates the search for an adversarial example as a constrained optimization problem, minimizing a custom loss function that balances the perturbation magnitude (measured by an Lp norm) with the success of causing a target misclassification. Unlike simpler attacks like FGSM, C&W directly optimizes for the smallest possible perturbation that reliably fools the model, making it a gold standard for evaluating adversarial robustness.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

ADVERSARIAL TESTING

Related Terms

The Carlini & Wagner attack is a foundational method in adversarial machine learning. Understanding its relationship to other key concepts is critical for a comprehensive security evaluation.

White-Box Attack

A white-box attack is executed with full knowledge of and access to the target model's internal architecture, parameters, and gradients. This is the primary threat model for the C&W attack, which directly utilizes the model's gradient information to craft optimal perturbations.

Contrast with Black-Box: White-box attacks are generally more powerful and efficient but require significant insider access.
C&W Context: The attack formulates an optimization problem that minimizes perturbation subject to the model's loss function, a process dependent on white-box access.

Adversarial Robustness

Adversarial robustness is the property of a machine learning model that measures its ability to maintain correct predictions when subjected to adversarial attacks. The C&W attack is a primary benchmark for measuring this property due to its effectiveness.

Evaluation Standard: A model's resistance to the C&W attack is a strong indicator of its general robustness.
Robust Accuracy: This is the accuracy measured on a test set that includes adversarial examples (often generated by C&W), providing a more realistic performance metric than standard accuracy.

Projected Gradient Descent (PGD)

Projected Gradient Descent is a powerful, iterative white-box attack and the cornerstone of modern adversarial training. Like C&W, it is an optimization-based attack but uses a different methodological approach.

Methodology: PGD performs multiple, small-step FGSM iterations, projecting the perturbation back into a valid norm ball (e.g., L∞) after each step.
Comparison to C&W: While PGD is highly effective for L∞ constraints, the C&W attack is often more effective at finding minimal L2 or L0 norm perturbations and is specifically designed to defeat gradient-masking defenses like defensive distillation.

Defensive Distillation

Defensive distillation is a training technique designed to improve model robustness by training a second model (the distilled model) using the softmax probabilities (soft labels) of a first model as training labels. This smooths the model's decision surface.

Primary Target: The C&W attack was famously developed to break defensive distillation, demonstrating that the technique provided a false sense of security through gradient masking.
Historical Significance: The success of C&W against distillation was a pivotal moment, shifting the field's focus towards provable robustness and attacks that circumvent gradient obfuscation.

Adversarial Training

Adversarial training is a defensive technique that improves a model's robustness by including adversarial examples in its training dataset. The strength of the adversary used during training directly impacts the resulting robustness.

Training Adversary: PGD is the most common adversary for adversarial training due to its iterative strength. However, models are also evaluated against C&W to test for robustness beyond the training threat model.
Benchmarking: A robust model trained with PGD adversaries should also demonstrate high robust accuracy against C&W attacks, indicating generalized resilience.

Gradient Masking

Gradient masking (or gradient obfuscation) is a phenomenon where a defense technique causes a model's gradients to become uninformative, sparse, or random, giving a false sense of security against gradient-based white-box attacks.

C&W's Role: The C&W attack was instrumental in exposing defenses that relied on gradient masking, such as defensive distillation, shattered gradients, and stochastic defenses.
Attack Adaptation: C&W uses optimization tricks (like using logits instead of softmax probabilities and a change-of-variable) to bypass masked or shattered gradients, making it a reliable tool for evaluating whether a defense is truly robust or merely obfuscatory.

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.

Talk to Us

Carlini & Wagner Attack (C&W)

What is the Carlini & Wagner Attack (C&W)?

Key Characteristics of the C&W Attack

Optimization-Based Formulation

Custom Loss Functions (f₆, f₇)

Box-Constrained Optimization

Benchmark Against Defensive Distillation

L₂, L₀, and L∞ Attack Variants

Strong White-Box Benchmark

How the Carlini & Wagner Attack Works

C&W Attack vs. Other White-Box Attacks

Frequently Asked Questions

Intelligent Analysis, Decision & Execution

Search across company data

Automate internal workflows

Add AI to products and internal tools

Prasad Kumkar

Partnered with leading AI, data, and software stack.

Custom AI workflows for your Business

Review the use case

Pick the right approach

Build the first useful version

Improve from there