Vision Transformer vs. ResNet-101: An Explainable Deep Learning Approach for Breast Cancer Detection in Ultrasound Images

Abstract

Breast cancer remains a significant global health concern, where early and accurate diagnosis is paramount for improving patient survival rates. This paper presents a comparative analysis of two deep learning architectures, the Convolutional Neural Network (CNN) based ResNet-101 and the Vision Transformer (ViT), for the classification of breast ultrasound images into benign, malignant, and normal categories. Addressing the common challenge of limited data, we employed a data augmentation strategy to expand a benchmark dataset of 780 images to over 10,000 images, creating a robust training set. Both models were trained on this augmented dataset, achieving test accuracies of 98.64% for the Transformer model and 97.57% for Resnet-101 model. The result indicates that the ViT model achieved higher accuracy than the ResNet-101. Furthermore, the existing Deep learning models are black box models. To enhance model transparency and build clinical trust, Gradient-weighted Class Activation Mapping (Grad-CAM), an Explainable AI (XAI) technique, is utilized to generate visual heatmaps, highlighting the specific regions in the ultrasound images that were most influential in the models’ diagnostic decisions. The proposed model harnesses GPU-based parallel infrastructure.

Speakers

Lipismita Panigrahi

Assistant Professor

SRM University-Amaravati

Details

Type

Online

Model

OFFLINE

Language

EN

Timezone

UTC+8

Views

146

Likes

31