Infidelity is an explanation metric that quantifies the degree to which an explanation fails to accurately reflect the model's output when the input is perturbed according to the explanation's importance scores. It is a perturbation-based validation technique that directly tests if the feature attributions provided by an explanation method (like SHAP or Integrated Gradients) correspond to the model's actual sensitivity to those features. A low infidelity score indicates a faithful explanation, meaning the attributed importance aligns with the model's true causal behavior for that prediction.
Glossary
Infidelity

What is Infidelity?
Infidelity is a quantitative metric for validating the faithfulness of post-hoc model explanations.
The metric is computed by applying meaningful perturbations to the input—such as removing or altering features deemed important by the explanation—and measuring the resulting deviation in the model's prediction. It is a core component of post-hoc explanation validation, complementing metrics like completeness and stability. High infidelity signals that an explanation may be misleading, which is critical for algorithmic explainability audits in regulated domains where understanding model decisions is mandatory.
Key Characteristics of Infidelity
Infidelity is a quantitative metric for evaluating the faithfulness of post-hoc model explanations. It measures the discrepancy between a model's actual output and the output predicted by the explanation when the input is perturbed.
Core Definition & Formula
Infidelity quantifies the expected squared error between the model's true output change and the change predicted by a linear approximation based on the explanation's importance scores. Formally, for a model f, input x, explanation vector Φ(x), and a perturbation distribution I, it is defined as:
Infidelity(Φ, f, x) = E_I[(I^T Φ(x) - (f(x) - f(x - I)))^2]
Iis a random perturbation vector (e.g., masking or noise).Φ(x)is the feature attribution vector (e.g., from SHAP or Integrated Gradients).f(x) - f(x - I)is the model's actual output difference due to the perturbation.I^T Φ(x)is the explanation's linear prediction of that output difference. A lower infidelity score indicates a more faithful explanation.
Perturbation-Based Validation
The metric is fundamentally grounded in perturbation analysis. It validates an explanation by testing its predictive power under input changes.
- Method: Systematically apply random perturbations
Ito the inputxand measure two things: 1) the actual change in the model's prediction, and 2) the change predicted by a linear model using the explanation's importance scores as coefficients. - Key Insight: A faithful explanation should serve as a good local linear approximator of the complex model. If the importance scores in
Φ(x)are correct, then the dot productI^T Φ(x)should closely match the true output differencef(x) - f(x - I)for many perturbations. - Contrast with Completeness: While completeness checks if attributions sum to the total output, infidelity checks if they correctly predict output changes.
Relation to Faithfulness & Local Fidelity
Infidelity is a direct, quantitative measure of local fidelity and is a primary method for calculating a faithfulness score.
- Local Fidelity: This property requires an explanation to accurately reflect the model's behavior in the neighborhood of a specific input. Infidelity operationalizes this by averaging over many local perturbations (
I). - Faithfulness Score: Often defined as
1 - Infidelity(after normalization) or simply the negative correlation between the explanation's prediction and the model's actual behavior. A high-faithfulness explanation has low infidelity. - Distinction from Plausibility: An explanation can be plausible to a human (make sense) but have high infidelity if it does not match the model's true internal reasoning process. Infidelity targets faithfulness, not human agreement.
Model-Agnostic Property
Infidelity is a model-agnostic evaluation metric. It can be applied to any explanation method (Φ) and any model (f), provided you can query the model's output.
- Black-Box Evaluation: It only requires the ability to perturb inputs and observe outputs. No internal model weights, gradients, or architecture knowledge is needed.
- Broad Applicability: It can evaluate explanations for deep neural networks, tree-based models (like XGBoost), and ensemble methods equally.
- Explanation Method Agnostic: It tests the output of any attribution method—including SHAP, LIME, Integrated Gradients, or simple gradient-based saliency—against the same ground truth: the model's perturbed behavior.
Sensitivity to Perturbation Design
The computed infidelity score is highly sensitive to the choice of perturbation distribution I. This is both a key characteristic and a critical consideration.
- Distribution Choice: Common choices include:
- Masking: Setting features to a baseline value (zero, mean, random).
- Gaussian Noise: Adding small, normally distributed noise.
- Blurring/Occlusion: For image data, applying Gaussian blur or gray patches.
- Semantic Meaning: Perturbations should be meaningful in the input domain. Adding Gaussian noise to a tabular feature might be valid, but applying it to a one-hot encoded category is not.
- Interpretation: Scores are only comparable when using the same perturbation family. A method may have low infidelity under one perturbation (e.g., masking) but high under another (e.g., noise), revealing aspects of its robustness.
Use in Benchmarking & Tooling
Infidelity is a standard metric in explainability benchmarking suites and research libraries for quantitatively comparing explanation methods.
- Quantitative Benchmarking: In papers and toolkits, explanation methods are often ranked on datasets by their average infidelity scores across many inputs.
- Implementation Libraries: It is implemented in major XAI libraries:
- Captum (PyTorch):
infidelityfunction. - SHAP: Can be computed using the
Explainerobjects and perturbation. - TensorFlow Explainability: Can be implemented via custom loops.
- Captum (PyTorch):
- Production Monitoring: In MLOps pipelines, tracking infidelity scores over time can detect explanation drift, where an explanation method becomes less faithful as the underlying model or data distribution evolves.
Infidelity vs. Faithfulness: Core Differences
A comparison of two core quantitative metrics used to evaluate the quality of post-hoc explanations for model predictions, focusing on their opposing objectives and validation methodologies.
| Evaluation Dimension | Infidelity Score | Faithfulness Score | Interpretation |
|---|---|---|---|
Primary Objective | Measures explanation error | Measures explanation accuracy | Infidelity quantifies failure; Faithfulness quantifies success. |
Core Validation Method | Perturbation of important features | Perturbation of important features | Both rely on systematic input perturbation guided by the explanation. |
Ideal Score | 0.0 (lower is better) | 1.0 (higher is better) | A perfect explanation has Infidelity=0 and Faithfulness=1. |
Mathematical Relationship | Often defined as 1 - Faithfulness | Often defined as 1 - Infidelity | They are frequently complementary metrics, summing to 1. |
What a High Score Indicates | The explanation poorly reflects model logic. | The explanation accurately reflects model logic. | High Infidelity is bad; High Faithfulness is good. |
Perturbation Direction | Increase/Decrease important features. | Remove/Ablate important features. | Infidelity often perturbs towards a baseline; Faithfulness removes features. |
Sensitivity to Explanation Sparsity | High | High | Both scores are highly dependent on which features the explanation deems important. |
Use in Model Debugging | Identifies misleading explanations. | Validates trustworthy explanations. | Infidelity flags explanations for review; Faithfulness confirms reliable ones. |
Practical Applications of Infidelity
Infidelity is a core metric for validating post-hoc explanations. These applications demonstrate how it is used to ensure explanations are robust, reliable, and useful for debugging and compliance.
Debugging Feature Attribution Methods
Infidelity is used to benchmark and compare different explanation techniques like SHAP, LIME, and Integrated Gradients. By measuring how much each method's importance scores degrade under perturbation, engineers can identify which method produces the most faithful local approximations of the model. This is critical for selecting the right tool for model auditing.
- A low infidelity score indicates the explanation's feature weights accurately predict model output changes.
- High infidelity flags an unreliable explanation method that should not be trusted for critical decisions.
Validating Explanations for Regulatory Compliance
In regulated industries (finance, healthcare), demonstrating that an AI's decision-making process is understandable is often a legal requirement. Infidelity provides a quantitative, auditable metric to prove that the explanations provided to regulators or customers are not arbitrary. A model card can report infidelity scores to show that its feature attributions have been rigorously tested for faithfulness, supporting claims of transparency under frameworks like the EU AI Act.
Improving Model Robustness via Explanation Analysis
Systematically high infidelity for certain input types can reveal model weaknesses. If an explanation claims certain features are important, but perturbing them doesn't change the output, it may indicate the model is relying on spurious correlations or is unstable in that region of the input space. This insight directs engineers to collect more representative training data or apply regularization techniques to improve model generalization and robustness.
Guiding Human-in-the-Loop Refinement
In active learning or human-AI collaboration systems, infidelity helps prioritize which model predictions require expert review. A high infidelity score signals that the model's reasoning (as explained) is internally inconsistent, making the prediction less trustworthy. This allows human reviewers to focus their attention on the most uncertain or poorly explained cases, making the refinement process more efficient and targeted.
Assessing Explanation Stability and Trust
Infidelity is closely related to explanation robustness. A good explanation should be stable—similar inputs should yield similar explanations. By measuring infidelity across a set of semantically similar perturbations, practitioners can assess if an explanation method produces consistent results. Low and consistent infidelity builds user trust, as the explanation appears reliable and non-random, which is essential for user adoption of AI-assisted tools.
Integration into Automated MLOps Pipelines
Infidelity can be calculated as a continuous validation check in production MLOps pipelines. As new model versions are deployed, their explanation infidelity scores can be monitored alongside traditional performance metrics. A significant drift in the average infidelity score can trigger alerts, indicating that the new model's decision logic may have changed in a way that makes existing explanation methods less faithful, prompting a re-evaluation of the interpretability stack.
Frequently Asked Questions
Infidelity is a core metric for validating the quality of post-hoc explanations in machine learning. These questions address its definition, calculation, and role in ensuring explanations are faithful to the underlying model.
Infidelity is a quantitative metric that measures the degree to which a feature attribution explanation fails to accurately reflect a model's output when the input is perturbed according to the explanation's own importance scores. It directly tests an explanation's local fidelity by simulating how the model's prediction changes when the most important features (as indicated by the explanation) are altered, with higher infidelity scores indicating a less faithful explanation.
The core intuition is that a high-quality explanation should predict how the model behaves. If an explanation claims a feature is very important, then perturbing that feature should cause a large change in the model's output. Infidelity formalizes this by comparing the explanation's predicted importance to the actual change in model output under a set of meaningful perturbations, such as blurring an image region or setting a text token to zero.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Infidelity is evaluated alongside other core metrics that assess the quality, robustness, and utility of explanations for AI model predictions.
Faithfulness Score
A faithfulness score is a quantitative metric that measures how accurately an explanation reflects the true reasoning process or causal factors of the underlying model for a given prediction. It is a broader category under which infidelity falls.
- Core Concept: Evaluates if the explanation's highlighted features are genuinely important to the model's internal computation.
- Relationship to Infidelity: Infidelity is a specific, perturbation-based method for measuring faithfulness. A high infidelity score indicates low faithfulness.
- Example: If an explanation highlights a tumor in a medical scan, a faithful explanation means the model's prediction of 'malignant' changes significantly if that tumor region is altered.
Perturbation Analysis
Perturbation analysis is an explanation validation technique that systematically modifies or removes input features to observe the resulting changes in the model's output. Infidelity is a formal metric derived from this family of techniques.
- Methodology: Involves creating perturbed versions of an input (e.g., masking pixels, zeroing out tokens) based on an explanation's importance scores.
- Purpose: To empirically test the causal relationship between explained features and the model's prediction.
- Key Distinction: While perturbation analysis is the general approach, infidelity provides a specific, quantitative score summarizing the result of these perturbations.
Explanation Robustness
Explanation robustness refers to the property of an explanation method to produce consistent and stable attributions for a given prediction when the input or model is subjected to minor, semantically-preserving perturbations.
- Focus: Assesses the reliability of the explanation method itself, not its faithfulness to the model.
- Contrast with Infidelity: Infidelity measures if an explanation is faithful to one specific model prediction. Robustness measures if the explanation method yields similar explanations for similar inputs.
- Example: A robust explanation method would highlight the same object in two slightly different photos of a scene. A method with low robustness might highlight different objects.
Sensitivity Analysis
Sensitivity analysis in explainability evaluates how small changes in the input features affect both the model's prediction and the generated explanation. It encompasses the evaluation of both prediction stability and explanation stability.
- Dual Measurement: Tracks two outputs: 1) change in model prediction (related to infidelity), and 2) change in the explanation map.
- Broader Scope: Infidelity is specifically concerned with the first output—the change in prediction. Sensitivity analysis often studies the correlation or divergence between these two outputs.
- Use Case: Helps determine if an explanation method is sensitive to noise, which can indicate instability or overfitting to insignificant features.
Sufficiency
Sufficiency is an explanation metric that measures whether the subset of features identified as most important by an explanation is, by itself, sufficient for the model to make its original prediction.
- Test Method: The top-k most important features (per the explanation) are isolated, and the model is run on this ablated input.
- Interpretation: A high sufficiency score means the model's prediction remains unchanged using only the explained features, suggesting the explanation captured the critical factors.
- Complement to Infidelity: While infidelity measures the impact of removing important features, sufficiency measures the predictive power of keeping only those features. A good explanation should score well on both.
Completeness Score
A completeness score is a metric that evaluates whether an explanation accounts for all features or factors that contributed significantly to a model's prediction. It ensures the explanation is not missing critical components.
- Mathematical Basis: Often defined as the sum of the attribution scores for all features. For methods like SHAP, this sum equals the difference between the model's prediction and its baseline expectation.
- Relationship to Infidelity: An explanation can be complete (account for 100% of the prediction) but still unfaithful (infidel) if it misallocates importance among the features. Completeness is a necessary but not sufficient condition for a high-quality explanation.
- Practical Implication: Catches explanations that highlight only a dramatic but incomplete subset of reasons for a prediction.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us