Cross-CAM: Focused Visual Explanations for Deep Convolutional Networks via Training-Set Tracing

Sun, Yu; Ma, Kailang; Liu, Xuanxin; Cui, Jian

doi:10.1007/978-3-031-10983-6_56

Yu Sun¹²,
Kailang Ma ORCID: orcid.org/0000-0002-8285-6077¹²,
Xuanxin Liu¹³ &
…
Jian Cui¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13368))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1735 Accesses

Abstract

In recent years, the widely used deep learning technologies have always been controversial in terms of reliability and credibility. Class Activation Map (CAM) has been proposed to explain the deep learning models. Existing CAM-based algorithms highlight critical portions of the input image, but they don’t go any farther in tracing the neural network’s decision-basis. This work proposes Cross-CAM, a visual interpretation method which supports deep traceability for prediction-basis samples and focuses on similar regions of the category based on the input image and the prediction-basis samples. The Cross-CAM extracts deep discriminative feature vectors and screens out the prediction-basis samples from the training set. The similarity-weight and the grad-weight are then combined to form the cross-weight, which highlights similar regions and aids in classification decisions. On the ILSVRC-15 dataset, the proposed Cross-CAM is tested. The new weakly-supervised localization evaluation metric IoS (Intersection over Self) is proposed to effectively evaluate the focusing effect. Using Cross-CAM highlight regions, the top-1 location error for weakly-supervised localization achieves 44.95% on the ILSVRC-15 validation set, which is 16.25% lower than Grad-CAM. In comparison to Grad-CAM, Cross-CAM focuses on the key regions using the similarity between the test image and the prediction-basis samples, according to the visualisation results.

Supported by the National Natural Science Foundation of China (32071775) and 2020 Industrial Internet Innovation and Development Project - Malicious Code Analysis Equipment Project of Security and Controlled System, No.: TC200H02X.

First Author and Second Author contribute equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Bhattacharya, G., Alam, M.J., Kenny, P.: Deep speaker recognition: Modular or monolithic? In: INTERSPEECH, pp. 1143–1147 (2019)
Google Scholar
Qiu, M., Xue, C., Shao, Z., Sha, E.:Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In: IEEE DATE Conference, pp. 1–6 (2007)
Google Scholar
Qiu, M., Liu, J., Li, J., et al.: A novel energy-aware fault tolerance mechanism for wireless sensor networks. In: IEEE/ACM International Conference on GCC (2011)
Google Scholar
Qiu, M., Li, H., Sha, E.: Heterogeneous real-time embedded software optimization considering hardware platform. In: Proceedings of the 2009 ACM symposium on Applied Computing, pp. 1637–1641 (2009)
Google Scholar
Wu, G., Zhang, H., et al.: A decentralized approach for mining event correlations in distrib-uted system monitoring. JPDC 73(3), 330–340 (2013)
Google Scholar
Niu, J., Gao, Y., et al.: Selecting proper wireless network interfaces for user experience enhancement with guaranteed probability. JPDC 72(12), 1565–1575 (2012)
Google Scholar
Liu, M., Zhang, S., et al.: H infinite state estimation for discrete-time chaotic systems based on a unified model. IEEE Trans. Syst. Man Cybern. 42(4), 1053–1063 (2012)
Google Scholar
Li, Y., Song, Y., Jia, L., et al.: Intelligent fault diagnosis by fusing domain adversarial training and maximum mean discrepancy via ensemble learning. IEEE Trans. Ind. Inform. 17(4), 2833–2841 (2020)
Article Google Scholar
Shao, Z., Xue, C., Zhuge, Q., et al.: Security protection and checking for embedded system integration against buffer overflow attacks via hardware/software. IEEE Trans. Comput. 55(4), 443–453 (2006)
Article Google Scholar
Qiu, H., Qiu, M., Liu, M., Memmi, G.: Secure health data sharing for medical cyber-physical systems for the healthcare 4.0. IEEE J. Biomed. Health Inform. 24(9), 2499–2505 (2020)
Article Google Scholar
Qiu, H., Zheng, Q., et al.: Topological graph convolutional network-based urban traffic flow and density prediction. IEEE Trans. ITS (2020)
Google Scholar
Hua, Y., Zhang, D., Ge, S.: Research progress in the interpretability of deep learning models. J. Cyber Secur. 5(3), 1–12 (2020)
Google Scholar
Cui, X., Wang, D., Wang, Z.J.: Chip: Channel-wise disentangled interpretation of deep convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 31(10), 4143–4156 (2019)
Article MathSciNet Google Scholar
Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3429–3437 (2017)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Google Scholar
Kong, X., Tang, X., Wang, Z.: A survey of explainable artificial intelligence decision Xitong Gongcheng Lilun yu Shijian. Syst. Eng. Theory Pract. 41(2), 524–536 (2021)
Google Scholar
Parafita, A., Vitria, J.: Explaining visual models by causal attribution. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 4167–4175. IEEE (2019)
Google Scholar
Zhao, Q., Hastie, T.: Causal interpretations of black-box models. J. Bus. Econ. Stat. 39(1), 272–281 (2021)
Article MathSciNet Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)
Google Scholar
Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_31
Chapter Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Wang, H., et al.: Score-cam: score-weighted visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 24–25 (2020)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Google Scholar
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839–847. IEEE (2018)
Google Scholar
Omeiza, D., Speakman, S., Cintas, C., Weldermariam, K.: Smooth gradcam++: An enhanced inference level visualization technique for deep convolutional neural network models. arXiv preprint arXiv:1908.01224 (2019)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Wu, B., Wu, H.: Angular discriminative deep feature learning for face verification. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2133–2137. IEEE (2020)
Google Scholar
Chen, T., et al.: Abd-net: Attentive but diverse person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8351– 8361 (2019)
Google Scholar
Dai, Z., Chen, M., Gu, X., Zhu, S., Tan, P.: Batch dropblock network for person re-identification and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3691–3701 (2019)
Google Scholar
Zheng, Z., Yang, X., Yu, Z., Zheng, L., Yang, Y., Kautz, J.: Joint discriminative and generative learning for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2138–2147 (2019)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

Download references

Author information

Authors and Affiliations

School of Cyber Science and Technology, Beihang University, Beijing, 100191, China
Yu Sun, Kailang Ma & Jian Cui
College of Forestry, Beijing Forestry University, Beijing, 100191, China
Xuanxin Liu

Authors

Yu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Kailang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xuanxin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Cui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Cui .

Editor information

Editors and Affiliations

Télécom Paris, Paris, France
Gerard Memmi
Purdue University, West Lafayette, IN, USA
Baijian Yang
Shanghai Jiao Tong University, Shanghai, Shanghai, China
Linghe Kong
Nanyang Technological University, Singapore, Singapore
Tianwei Zhang
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, Y., Ma, K., Liu, X., Cui, J. (2022). Cross-CAM: Focused Visual Explanations for Deep Convolutional Networks via Training-Set Tracing. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13368. Springer, Cham. https://doi.org/10.1007/978-3-031-10983-6_56

Download citation

DOI: https://doi.org/10.1007/978-3-031-10983-6_56
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10982-9
Online ISBN: 978-3-031-10983-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics