Deep Learning-Based Skin Care Detection with Multi-method Explainability: Grad-CAM, Lime, and Occlusion Sensitivity

Tooba Khan; Muhammad Zeeshan Ul Haque; Gul Munir; Irfan Ahmed Usmani

doi:10.31436/iiumej.v27i1.4049

Authors

Tooba Khan Salim Habib University https://orcid.org/0009-0007-7387-1578
Muhammad Zeeshan Ul Haque Salim Habib University https://orcid.org/0009-0000-5998-7231
Gul Munir Salim Habib University https://orcid.org/0009-0002-2755-2545
Irfan Ahmed Usmani Salim Habib University https://orcid.org/0000-0002-8518-2899

DOI:

https://doi.org/10.31436/iiumej.v27i1.4049

Keywords:

Deep Learning, CNN, Explainable AI, Transfer learning, Skin cancer detection

Abstract

Skin cancer is one of the most common malignancies worldwide, where early detection significantly improves treatment outcomes. While deep learning models show promise for automated skin lesion classification, their lack of interpretability limits clinical adoption. This study presents a comprehensive comparative analysis of three convolutional neural networks, ResNet-50, GoogLeNet, and SqueezeNet, for binary skin lesion classification (benign vs. malignant), integrating three explainable AI (XAI) methods (Grad-CAM, LIME, and Occlusion Sensitivity) to enhance clinical interpretability. We trained and evaluated these architectures on the Kaggle Skin Cancer dataset, which contains 2,637 dermoscopic images (1,440 benign, 1,197 malignant). Transfer learning employed ImageNet pre-trained weights with two-stage fine-tuning. Performance was assessed using accuracy, precision, recall, F1-score, specificity, and AUC-ROC metrics. ResNet-50 achieved the highest accuracy of 91.36% with an excellent AUC of 0.9721, demonstrating superior balanced performance. GoogLeNet achieved 88.94% accuracy with 73% fewer parameters, offering an optimal accuracy-efficiency trade-off. The proposed lightweight CNN, despite having the fewest parameters (1.2M), achieved 85.45% accuracy and a malignancy detection sensitivity of 92.7%, making it well-suited for screening applications. Training times ranged from 1.5 minutes (SqueezeNet) to 3 minutes 39 seconds (ResNet-50), demonstrating feasibility for resource-constrained settings. All XAI methods successfully generated clinically meaningful explanations, with models consistently focusing on lesion centers, color variations, and irregular borders. This study demonstrates that combining deep learning with XAI enables accurate and interpretable skin cancer detection. ResNet-50 is well-suited to well-resourced clinical settings, GoogLeNet offers balanced performance for resource-constrained deployments, and SqueezeNet enables mobile telemedicine applications with superior sensitivity.

ABSTRAK: Kanser kulit merupakan antara malignansi yang paling lazim di seluruh dunia, dan pengesanan awal terbukti dapat meningkatkan keberkesanan rawatan secara signifikan. Walaupun model pembelajaran mendalam menunjukkan potensi tinggi dalam pengelasan automatik lesi kulit, kekurangan kebolehinterpretasian telah mengehadkan penerimaan klinikal. Kajian ini membentangkan analisis perbandingan menyeluruh terhadap tiga rangkaian neural konvolusi, iaitu ResNet-50, GoogLeNet, dan SqueezeNet, bagi pengelasan binari lesi kulit (jinak vs. malignan), digabungkan dengan tiga kaedah kecerdasan buatan boleh jelas (XAI), iaitu Grad-CAM, LIME, dan Kepekaan Halangan, bagi menyokong interpretasi klinikal. Model dilatih dan dinilai menggunakan set data Kanser Kulit Kaggle yang mengandungi 2,637 imej dermoskopi, dengan menggunakan pembelajaran pindahan berasaskan pemberat pralatih ImageNet dan penalaan halus dua peringkat. Penilaian prestasi menggunakan metrik ketepatan, ketepatan ramalan, kepekaan, skor F1, pengkhususan, dan AUC-ROC menunjukkan bahawa ResNet-50 mencapai prestasi tertinggi dengan ketepatan 91.36% dan AUC 0.9721, manakala GoogLeNet menawarkan keseimbangan optimum antara ketepatan dan kecekapan dengan pengurangan parameter sebanyak 73%. SqueezeNet, walaupun paling ringan, mencapai kepekaan pengesanan malignan tertinggi sebanyak 92.7%, menjadikannya sesuai untuk aplikasi saringan dan teleperubatan mudah alih. Semua kaedah XAI berjaya menghasilkan penjelasan bermakna secara klinikal, dengan fokus konsisten pada pusat lesi, variasi warna, dan sempadan tidak sekata. Secara keseluruhan, kajian ini membuktikan bahawa penggabungan pembelajaran mendalam dan XAI membolehkan pengesanan kanser kulit yang tepat, boleh ditafsir, dan sesuai dalam pelbagai kekangan sumber klinikal.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Siegel, R. L., Miller, K. D., Fuchs, H. E., & Jemal, A. (2021). Cancer statistics, 2021. CA: A Cancer Journal for Clinicians, 71(1), 7-33.

American Cancer Society. (2021). Cancer Facts & Figures 2021. Atlanta: American Cancer Society.

Vestergaard, M. E., Macaskill, P., Holt, P. E., & Menzies, S. W. (2008). Dermoscopy compared with naked eye examination for the diagnosis of primary melanoma: a meta-analysis of studies performed in a clinical setting. British Journal of Dermatology, 159(3), 669-676.

Haenssle, H. A., Fink, C., Schneiderbauer, R., Toberer, F., Buhl, T., Blum, A., ... & Reader Study Level-I and Level-II Groups. (2018). Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of Oncology, 29(8), 1836-1842.

D. Adla, G. V. R. Reddy, P. Nayak, and G. Karuna, “Deep learning-based computer aided diagnosis model for skin cancer detection and classification,” Distributed and Parallel Databases, vol. 40, no. 4, pp. 717–736, 2022.

T.?M. Alam, K. Shaukat, W.?A. Khan, I.?A. Hameed, L. A. Almuqren, M. A. Raza, M. Aslam and S. Luo, “An Efficient Deep Learning Based Skin Cancer Classifier for an Imbalanced Dataset,” Diagnostics, vol. 12, no. 9, p. 2115, 2022.

K. I. Chibueze, A. F. Didiugwu, N. G. Ezeji, and N. V. Ugwu, “A CNN based model for heart disease detection,” Scientia Africana, vol. 23, no. 3, pp. 429–442, 2024.

S. Panigrahi, J. Das, and T. Swarnkar, “Capsule network based analysis of histopathological images of oral squamous cell carcinoma,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 7, pp. 4546–4553, 2022.

Argenziano, G., Soyer, H. P., Chimenti, S., Talamini, R., Corona, R., Sera, F., ... & Kleine, H. (2003). Dermoscopy of pigmented skin lesions: results of a consensus meeting via the Internet. Journal of the American Academy of Dermatology, 48(5), 679-693.

Kimball, A. B., & Resneck Jr, J. S. (2008). The US dermatology workforce: a specialty remains in shortage. Journal of the American Academy of Dermatology, 59(5), 741-745.

Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115-118.

Brinker, T. J., Hekler, A., Enk, A. H., Klode, J., Hauschild, A., Berking, C., ... & von Kalle, C. (2019). Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. European Journal of Cancer, 113, 47-54.

Holzinger, A., Biemann, C., Pattichis, C. S., & Kell, D. B. (2017). What do we need to build explainable AI systems for the medical domain? arXiv preprint arXiv:1712.09923.

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206-215.

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. IEEE International Conference on Computer Vision, 618-626.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?" Explaining the predictions of any classifier. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144.

Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. European Conference on Computer Vision, 818-833.

Fanconic. (2019). Skin Cancer: Malignant vs. Benign. Kaggle. https://www.kaggle.com/datasets/fanconic/skin-cancer-malignant-vs-benign

Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(1), 1-54.

Rogers, H. W., Weinstock, M. A., Feldman, S. R., & Coldiron, B. M. (2015). Incidence estimate of nonmelanoma skin cancer (keratinocyte carcinomas) in the US population, 2012. JAMA Dermatology, 151(10), 1081-1086.

Barata, C., Ruela, M., Francisco, M., Mendonça, T., & Marques, J. S. (2015). Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE Systems Journal, 8(3), 965-979.

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence, 14(2), 1137-1145.

Gundersen, O. E., & Kjensmo, S. (2018). State of the art: Reproducibility in artificial intelligence. AAAI Conference on Artificial Intelligence, 32(1).

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, 770-778.

Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157-166.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211-252.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition, 1-9.

Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXiv preprint arXiv:1312.4400.

Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678.

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv preprint arXiv:1602.07360. Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(1), 1-54.

Wahl, B., Cossy-Gantner, A., Germann, S., & Schwalbe, N. R. (2018). Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Global Health, 3(4), e000798.

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274-2282.

Hajian-Tilaki, K. (2013). Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian Journal of Internal Medicine, 4(2), 627.

Nachbar, F., Stolz, W., Merkle, T., Cognetta, A. B., Vogt, T., Landthaler, M., ... & Plewig, G. (1994). The ABCD rule of dermatoscopy: high prospective value in the diagnosis of doubtful melanocytic skin lesions. Journal of the American Academy of Dermatology, 30(4), 551-559.

McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153-157.

Zech, J. R., Badgeley, M. A., Liu, M., Costa, A. B., Titano, J. J., & Oermann, E. K. (2018). Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Medicine, 15(11), e1002683.

Carli, P., De Giorgi, V., Chiarugi, A., Nardini, P., Weinstock, M. A., Crocetti, E., ... & Giannotti, B. (2002). Addition of dermoscopy to conventional naked-eye examination in melanoma screening: a randomized study. Journal of the American Academy of Dermatology, 46(5), 683-689.

Han, S. S., Kim, M. S., Lim, W., Park, G. H., Park, I., & Chang, S. E. (2018). Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. Journal of Investigative Dermatology, 138(7), 1529-1538.

Lucieri, A., Bajwa, M. N., Braun, S. A., Malik, M. I., Dengel, A., & Ahmed, S. (2020). On interpretability of deep learning-based skin lesion classifiers using concept activation vectors. International Joint Conference on Neural Networks, 1-10.

Hekler, A., Utikal, J. S., Enk, A. H., Solass, W., Schmitt, M., Klode, J., ... & Brinker, T. J. (2019). Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images. European Journal of Cancer, 118, 91-96.