Explainability (XAI) techniques for Deep Learning and limitations
10 Min Read
November 26, 2024

Explainable AI (XAI) is critical for bridging the gap between complex, black-box models and human understanding, establishing trust and facilitating successful AI deployment. This blog discusses the salient explainability methods for XAI deep learning models, the constraints of current methods, and the data-specific challenges that need to be overcome to develop more robust and understandable AI systems.
1. Visualization-Based Methods
These methods try to visually illustrate how model inputs influence outputs, helping users understand the inner decision-making process of complex models.
A. Backpropagation-Based Methods
Backpropagation-based methods trace input influences back through the network layers to highlight data components most relevant to the output. They can leverage the network’s structure and specific layers to generate explanations. Common techniques include:
- Activation Maximization: Generates images that maximize neuron activations, exposing patterns in the data that trigger certain model responses.
- Class Activation Maps (CAM) and Grad-CAM: Show regions of input data (often images) that strongly contribute to the output. Grad-CAM builds heatmaps, highlighting important areas of an image for specific predictions.
- Layer-Wise Relevance Propagation (LRP): Distributes relevance scores from the output layer back to the input, pinpointing critical features. Assigns relevance scores on input features by propagating the output in reverse direction. Applied for deep architectures such as CNNs and RNNs.
- DeepLIFT and Integrated Gradients: Compare the sample to a reference and calculate the impact of changes on output. DeepLIFT calculates differences in activation between components, while Integrated Gradients integrate gradients along a path from the reference to the sample.
B. Perturbation-Based Methods
These methods treat the model as a "black box," changing or eliminating portions of the input to see how the output shifts. Perturbation-based methods include:
- Occlusion Sensitivity: Perturbs (or deletes) certain components of the input to check the model's sensitivity towards them.
- Representation Erasure: Performs data-agnostic changes, deleting features such as words (in text data) or certain image pixels in order to test effect.
- SHAP (SHapley Additive exPlanations): Game-theoretic method that estimates the marginal contribution of every feature towards the output by permuting feature values across samples.
- RISE (Randomized Input Sampling for Explanations): Random masks to produce saliency maps pointing towards feature importance
- Heatmaps and Saliency Maps (e.g., SmoothGrad, Extremal Perturbations): Visual and numerical explanations merged to pointtowards decision-making regions.
- Temporal Masks for Video: Masks over sequences to analyze saliency in temporal models.
2. Distillation-Based Methods
Distillation techniques create simpler, interpretable models that approximate the behavior of complex neural networks, using either local or global approaches.
A. Local Approximation
Local approximation techniques describe model behavior on a small, but representative subset of data. Such techniques operate by focusing on individual instances and locally approximate the model's behavior:
- LIME (Local Interpretable Model-Agnostic Explanations): Constructs a surrogate model (usually linear) around a particular prediction to describe important features in that local neighborhood.
B. Model Translation
Model translation methods aim to approximate the whole model with a simpler model, mapping input features to predictions:
- Tree-Based Models, Graphs, and Rule-Based Systems: Translate complex neural networks into interpretable formats like decision trees or rule-based representations, allowing a more transparent understanding of decision processes. The process, however, often oversimplifies the continuous model behavior, which is approximated by discrete rules, which might not capture all nuances.
- Methods like SARFA (Saliency-Aware Reinforcement Learning Framework) are applied for RL entities.
Current Methods of Explainable AI (XAI) fail to capture the complexities of deep learning models
Several methods have been developed to make complex AI systems more interpretable. Despite the advancements in methods, XAI faces critical challenges in achieving full transparency and reliability:
1. Black-Box Nature of DNNs
Deep neural networks (DNNs) often function as opaque systems, making it difficult to understand their decision-making processes. This lack of interpretability can lead to issues like bias, as seen in adversarial scenarios such as biased outputs in image classification.
2. Scalability and Computational Complexity
- Perturbation-Based Methods: Computationally expensive, particularly for high-dimensional inputs like video or sequence data.
- Global Methods (e.g., SHAP): Struggle to scale for large, high-dimensional models due to intensive resource requirements. SHAP calculates the contribution of each feature to a prediction, and the complexity grows exponentially with the number of features, demanding significant resources. Similarly, LIME generates local approximations by perturbing input data, but its reliance on random feature perturbations can result in inconsistent explanations.
- Occlusion and RISE: Require pixel-by-pixel computations, which are time-intensive.
- Temporal Masks: Increase computational burden with longer sequence lengths.
3. Architecture Dependency
Some methods, such as Gradient-weighted Class Activation Mapping (Grad-CAM) and Layer-Wise Relevance Propagation (LRP), are limited to specific architectures like CNNs, restricting their applicability to diverse models.
4. Input Modality Challenges
- Text and Discrete Data: Perturbation-based methods often struggle with handling discrete data like text.
- Video and Temporal Dependencies: Methods like Temporal Masks face difficulties in interpreting long video sequences and require task-specific adaptations.
5. Evaluation Challenges
- Subjective Metrics: Reliance on human annotation (e.g., pointing games) introduces inconsistencies.
- Lack of Standards: Heatmaps and saliency maps lack universally accepted evaluation criteria, making interpretation subjective and inconsistent.
6. Generalizability
- Limited Scope: Many techniques are designed for specific tasks (e.g., RL-focused methods) or data types (e.g., images), reducing their broader applicability.
- Sub-optimal Granularity: High-granularity methods may accurately highlight regions but lose semantic meaning, while patch-based methods smooth over fine details.
7. Vulnerability to Adversarial Noise
Perturbation granularity can expose methods to adversarial attacks, undermining their robustness and reliability.
8. Interpretation Complexity
- Post-Hoc Methods (e.g., SHAP, Grad-CAM): May fail to provide intuitive explanations, particularly for non-technical users.
- LIME: Highly dependent on sampling strategies, which can lead to oversimplification and misrepresentation of global behaviors.
9. Accuracy vs. Interpretability Trade-offs
- Model Translation Techniques: Translating black-box models into rule-based or tree-based systems often sacrifices accuracy for interpretability.
- Distillation Methods: They may simplify explanations but fail to capture complex global behaviors.
10. Data Dependence
Most global interpretability methods require extensive data and computational resources, limiting their practicality in data-scarce scenarios.
Data-Specific Challenges in Explainability Methods
Apart from the challenges mentioned above, explainability methods also face unique challenges across various data types.
In large language models (LLMs) and text data, attention weights, commonly used for interpretations, often fail to represent true reasoning paths, making their explanations unreliable. Attributions are further complicated by dealing with long-range dependencies, while perturbation-based methods risk disrupting semantic or syntactic meaning, leading to misleading outcomes.
In computer vision, saliency maps and Grad-CAM tend to be imprecise, indicating large areas instead of pointing towards causal features. High-resolution granularity is required for tasks such as object detection and medical imaging, something existing methods cannot deliver. Additionally, adversarial sensitivity, where minor input variations greatly impact predictions, undermines the reliability of explanations. For sequential and temporal data, long-range temporal dependencies among steps of time make it challenging to assign predictions to particular times, while computational requirements for models that handle video or time-series data exacerbate challenges.
Finally, in multi-modal data, current approaches find it hard to put together interactions between modalities, e.g., text and images, thus rendering explanations incomplete. Modality-specific biases can overwhelm predictions, further hiding the contributions of other data types.
Conclusion
Though several techniques including visualization-based methods and distillation and perturbation-based methods have advanced our understanding of complex models, enterprises still face significant adoption challenges. Opaque behavior of deep learning systems coupled with computational overhead as well as a lack of support for diverse data modalities like text, images and multi-modal inputs is posing an obstacle towards the adoption of trustworthy AI systems.
To address these gaps, future research must prioritize methods that balance interpretability and accuracy, scale efficiently for large models, and provide robust explanations across data types.
This is Part I of our blog on the AI explainability series. In the next blog, we will explore the importance of explainability and the opportunities it can unlock.
References:
- Explainable AI workshop by AryaXAI
- Explainable Artificial Intelligence: a Systematic Review
- Explaining deep neural networks: A survey on the global interpretation methods
- Perturbation-based methods for explaining deep neural networks: A survey
- Explaining Deep Neural Networks and Beyond:A Review of Methods and Applications
SHARE THIS
Discover More Articles
Explore a curated collection of in-depth articles covering the latest advancements, insights, and trends in AI, MLOps, governance, and more. Stay informed with expert analyses, thought leadership, and actionable knowledge to drive innovation in your field.

Is Explainability critical for your AI solutions?
Schedule a demo with our team to understand how AryaXAI can make your mission-critical 'AI' acceptable and aligned with all your stakeholders.