DLBacktrace: Model Agnostic Explainability For Any Deep Learning Models

DLBacktrace is an innovative interpretability technique designed to enhance transparency in deep learning models

Vinay Kumar

November 20, 2024

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Files and Preview

Paper link: https://arxiv.org/abs/2411.12643

Abstract:

The rapid advancement of artificial intelligence has led to increasingly sophisticated deep learning models, which frequently operate as opaque “black boxes” with limited transparency in their decision making processes. This lack of interpretability presents considerable challenges, especially in high-stakes applications where understanding the rationale behind a model’s outputs is as essential as the outputs themselves. This study addresses the pressing need for interpretability in AI systems, emphasizing its role in fostering trust, ensuring accountability, and promoting responsible deployment in mission-critical fields. To address the interpretability challenge in deep learning, we introduce DLBacktrace, an innovative technique developed by the AryaXAI team to illuminate model decisions across a wide array of domains, including simple Multi Layer Perceptron (MLPs), Convolutional Neural Networks (CNNs), Large Language Models (LLMs), Computer Vision Models, and more. We provide a comprehensive overview of the DL-Backtrace algorithm and present benchmarking results, comparing its performance against established interpretability methods, such as SHAP, LIME, GradCAM, Integrated Gradients, SmoothGrad, and Attention Rollout, using diverse task-based metrics. The proposed DL-Backtrace technique is compatible with various model architectures built in PyTorch and TensorFlow, supporting models like Llama 3.2, other NLP architectures such as BERT and LSTMs, computer vision models like ResNet and U-Net, as well as custom deep neural network (DNN) models for tabular data. This flexibility underscores DL-Backtrace’s adaptability and effectiveness in enhancing model transparency across a broad spectrum of applications. The library is open-sourced and available at https://github.com/AryaXAI/DLBacktrace.

Introduction

Despite significant advancements in artificial intelligence, particularly with the evolution of deep learning architectures, even the most sophisticated models face a persistent challenge: they often function as "black boxes," with internal processes that are opaque and difficult to interpret. While these models produce highly accurate predictions, their decision-making processes remain unclear, which raises concerns, particularly in high-stakes fields like healthcare and finance. In healthcare, for instance, AI-driven diagnostics must be interpretable to ensure trust and enable effective decision-making. Additionally, regulations like the EU's GDPR demand explainability for automated decisions, making it both an ethical and regulatory imperative.

Despite a growing demand for transparency, the focus has primarily been on maximizing performance, particularly with large language models (LLMs) like OpenAI's ChatGPT and Meta's LLaMA. These models, although highly accurate, often lack transparency, with their proprietary architectures exacerbating the issue. This has led to efforts to enhance model interpretability through methods such as LIME and SHAP, which offer feature importance scores. However, these methods struggle with complex data types (e.g., images and text) and are computationally expensive.

For deep learning models handling complex data, techniques like Grad-CAM, Integrated Gradients, and SmoothGrad provide some interpretability but have limitations. Attention-based methods like Attention Rollout and BertViz offer insights into model decisions but can be difficult to analyze and do not always correlate with feature importance.

In response to the pressing challenges of interpretability in deep learning, we introduce DLBacktrace, a model-agnostic method that traces relevance from output to input. By assigning relevance scores across layers, DLBacktrace reveals feature importance, information flow, and potential biases in predictions. Operating independently of auxiliary models or baselines, DLBacktrace ensures consistent, deterministic interpretations across diverse architectures and data types, including images, text, and tabular data. This approach supports both local (instance-specific) and global (aggregate) analysis, enhancing transparency and reliability, and providing a robust solution for detailed model interpretation and validation.

In this work, we make the following contributions:

Introduction of DLBacktrace: A detailed methodology outlining the model-agnostic and deterministic approach of DLBacktrace for achieving enhanced interpretability in AI systems.
Comprehensive Benchmarking: We benchmark DLBacktrace against widely used interpretability methods (e.g., LIME, SHAP, Grad-CAM, Integrated Gradients and more) across different tasks.
Cross-Modality Applications: DLBacktrace’s adaptability is illustrated across various data types, including tabular, image, and text, addressing limitations in current interpretability methods within these domains.
Framework for Reliable Interpretability: By providing consistent relevance scores, DLBacktrace contributes to more reliable, regulatory-compliant AI systems, supporting ethical and responsible AI deployment.

Importance of eXplainable AI (XAI)

XAI for Responsible and Trustworthy AI: Responsible AI ensures that AI systems are aligned with ethical standards, particularly in high-impact sectors like healthcare, finance, and law enforcement. It emphasizes fairness, transparency, accountability, privacy, and ethical alignment to mitigate bias, protect rights, and foster trust. Regulatory frameworks such as the EU's GDPR enforce transparency in decision-making. While tools like SHAP, LIME, and Grad-CAM help improve interpretability, challenges persist in providing meaningful insights for complex models, especially with rapidly evolving technologies like large language models (LLMs).

XAI for Safe AI: AI safety focuses on ensuring systems are predictable, controllable, and aligned with human values. Explainability is crucial in this context, as it helps developers understand model behavior, detect risks, and implement safeguards. Explainable AI (XAI) supports safety by identifying potential issues like reward hacking or catastrophic forgetting and improving model robustness through clear decision processes, enhancing risk mitigation and safety in dynamic environments.

XAI for Regulatory AI: XAI is essential for regulatory compliance, especially in sectors like finance, healthcare, and law. Transparent, interpretable models help ensure fairness, accountability, and ethical standards, crucial for protecting user rights and maintaining trust. XAI methods like SHAP and LIME are used in finance to clarify decisions on credit scoring and fraud detection, while in healthcare, they help interpret AI-driven diagnoses and treatments, ensuring ethical and regulatory adherence.

Explainability Methods

Tabular Data

In high-stakes fields like finance and healthcare, simpler machine learning models such as regression and decision trees are preferred over deep learning models due to their interpretability. Methods like LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive Explanations) are used to enhance transparency. LIME creates interpretable models around data points, while SHAP assigns importance scores to features. However, both methods face challenges: LIME can provide inconsistent explanations, and SHAP is computationally expensive and less accurate for complex models like deep neural networks.

Image Data

For image-based tasks, gradient-based methods like GradCAM, Vanilla Gradient, SmoothGrad, and Integrated Gradients are commonly used. GradCAM produces heatmaps based on gradients but can miss fine details. Vanilla Gradient faces the "saturation problem" with small gradients, while SmoothGrad improves clarity at a computational cost. Integrated Gradients avoid saturation but also require significant computation. Vision Transformers (ViTs), which rely on attention mechanisms, require specialized interpretability methods like TokenTM, which aggregates token transformations and attention weights to provide clearer explanations.

Textual Data

For text-based models, LIME and SHAP are foundational, and gradient-based methods like GradCAM and Integrated Gradients are also used. In text generation, challenges such as tokenization effects and randomness are addressed with probabilistic explanations. LACOAT clusters word representations to provide context-aware explanations. Mechanistic interpretability is also key in understanding large language models (LLMs). Research has identified the roles of neurons and attention heads in model behavior, highlighting the need for deeper insights into model operations. These advancements in mechanistic interpretability are crucial for improving transparency and trust, especially as LLMs are deployed in critical applications.

Backtrace

Backtrace is a method for analyzing neural networks by tracing the relevance of each component from the output back to the input, helping to understand how each element contributes to the final prediction. It provides insights into feature importance, information flow, and potential biases, aiding model interpretation and validation.

Key advantages of Backtrace include:

No reliance on sample selection algorithms: Relevance is calculated using only the focused sample, avoiding variations caused by different datasets.
No dependency on secondary white-box algorithms: Relevance is determined directly from the network, avoiding variations due to external algorithm assumptions and hyperparameters.
Deterministic nature: Relevance scores remain consistent across repeated calculations for the same sample, making it suitable for live environments or training workflows.

Figure 1: Illustration Depicting Backtrace Calculation for a Sample Network

Benchmarking:

We present a comparative study to benchmark our proposed Backtrace algorithm, evaluating its performance against various existing explainability methods across three data modalities: tabular, image, and text.

1. Experimental Setup:

Tabular Modality: A binary classification task using the Lending Club dataset with a four-layer MLP neural network.
Image Modality: Multi-class classification with the CIFAR-10 dataset, using a fine-tuned ResNet-34 model.
Text Modality: Binary sentiment classification with the SST-2 dataset using a pre-trained BERT model.

2. Evaluation Metrics:

Tabular Modality: Maximal Perturbation Robustness Test (MPRT) and Complexity Metric.
Image Modality: Faithfulness Correlation, Max Sensitivity, and Pixel Flipping.
Text Modality: Token Perturbation for Explanation Quality (ToPEQ), LeRF AUC, MoRF AUC, and Delta AUC.

3. Results:

Tabular Modality: Backtrace outperforms LIME and SHAP with better interpretability and robustness but has higher computational complexity.

Table 1: Explanation Performance metrics for explanation methods - LIME,SHAP and Backtrace, including Mean values and feature contributions across different layers (fc1 to fc4). Lower values in MPRT and Complexity indicate better performance

Image Modality: Backtrace achieves superior performance across key metrics (e.g., Faithfulness Correlation and Max Sensitivity), outperforming traditional methods like Grad-CAM and Integrated Gradients.

Table 2: Performance metrics of various explanation methods for a subset of CIFAR10 test set samples. Higher values (↑) of Faithfulness Correlation indicate better performance, while lower values (↓) of Max Sensitivity and Pixel Flipping suggest improved robustness. (*) - Indicates the presence of infinite values in some batches, for which a non-infinite mean was used to calculate the final value.

‍

Text Modality: Integrated Gradients (IG) performed best, but Backtrace showed balanced results, with potential for improvement in differentiating relevant features.

Table 3: Token Perturbation for Explanation Quality metrics for various explanation methods. Lower MoRF AUC values indicate better performance, while higher LeRF AUC and Delta AUC values suggest greater robustness and better differentiation between relevant and irrelevant features.

4. Observations:

Model performance significantly influences the quality of explanations. Well-performing models lead to more stable and sparse explanations, while lower performance causes higher entropy and instability. Additionally, inference time depends on model size and computational infrastructure.

Advantages of DLBacktrace:

Network Analysis: Current solutions like distribution graphs and heatmaps focus on node activations but fail to capture the broader impact of individual nodes on final predictions. Existing methods also struggle to differentiate the effect of input data versus internal biases.
Feature Importance: Backtrace allows for a precise quantification of the contribution of each input source to the final prediction, offering a more granular understanding than traditional methods like Integrated Gradients or Shapley Values, which have limitations (e.g., reliance on baselines or dataset subsets).
Uncertainty: Instead of just using final predictions, Backtrace enables decision-making based on the weight distribution of nodes relative to prior prediction outcomes, enhancing decision validity.

Applicability:

Interpreting Model Outcomes: Local feature importance is inferred from input data, while global importance is calculated by normalizing and averaging local importance across all samples.
Network Analysis: Backtrace provides insights into each network layer, including activation saturation, bias to input ratio, and positive/negative relevance, which can guide modifications to the network architecture to reduce biases and variability.
Fairness and Bias Analysis: The global importance of sensitive features (e.g., gender, age) helps assess potential bias in the model or data.
Process Compliance: Ranking features based on local and global importance aids in verifying that the model's behavior aligns with business and regulatory requirements.
Validating Model Outcomes: Backtrace supports outcome validation by analyzing layer-wise relevance, which can help ensure the accuracy and reliability of predictions, especially in real-world deployments.

Conclusion:

DLBacktrace is a robust and reliable method for improving the interpretability of deep learning models. It traces relevance from output back to input, providing clearer insights into feature importance and model behavior. Unlike existing methods, it remains stable and applicable in critical sectors like finance, healthcare, and regulatory compliance. Benchmarking results show that DLBacktrace outperforms other methods in terms of robustness and accuracy, promoting transparency and trust in AI models, particularly in high-stakes applications.

Files and Preview

Related Research Papers

Interpretability-Aware Pruning for Efficient Medical Image Analysis

An attribution-guided pruning method for medical AI that preserves accuracy and interpretability

Vinay Kumar

July 29, 2025

xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods

xai_evals is a comprehensive Python package designed to facilitate the generation, benchmarking, and evaluation of model explanations.

Vinay Kumar

February 18, 2025

Bridging the Gap in XAI—The Need for Reliable Metrics in Explainability and Compliance

This position paper highlights the lack of standardized, reliable XAI metrics, undermining its practical value, trustworthiness, and regulatory compliance.

Vinay Kumar

February 12, 2025

See how AryaXAI improves
ML Observability

Learn how to bring transparency & suitability to your AI Solutions, Explore relevant use cases for your team, and Get pricing information for XAI products.

Schedule a demo

AryaXAI is a full stack ML Observability tool for mission-critical AI functions. Designed by Arya.ai, it is aimed to deliver much required common platform between stakeholders and deliver trust, transparency and auditability.

PRODUCTS

RESOURCES

COMPANY

Knowledge Hub

DLBacktrace: Model Agnostic Explainability For Any Deep Learning Models

Research paper

Vinay Kumar

November 20, 2024

Files and Preview

Paper link: https://arxiv.org/abs/2411.12643