Abstract
In recent years, the widely used deep learning technologies have always been controversial in terms of reliability and credibility. Class Activation Map (CAM) has been proposed to explain the deep learning models. Existing CAM-based algorithms highlight critical portions of the input image, but they don’t go any farther in tracing the neural network’s decision-basis. This work proposes Cross-CAM, a visual interpretation method which supports deep traceability for prediction-basis samples and focuses on similar regions of the category based on the input image and the prediction-basis samples. The Cross-CAM extracts deep discriminative feature vectors and screens out the prediction-basis samples from the training set. The similarity-weight and the grad-weight are then combined to form the cross-weight, which highlights similar regions and aids in classification decisions. On the ILSVRC-15 dataset, the proposed Cross-CAM is tested. The new weakly-supervised localization evaluation metric IoS (Intersection over Self) is proposed to effectively evaluate the focusing effect. Using Cross-CAM highlight regions, the top-1 location error for weakly-supervised localization achieves 44.95% on the ILSVRC-15 validation set, which is 16.25% lower than Grad-CAM. In comparison to Grad-CAM, Cross-CAM focuses on the key regions using the similarity between the test image and the prediction-basis samples, according to the visualisation results.
Supported by the National Natural Science Foundation of China (32071775) and 2020 Industrial Internet Innovation and Development Project - Malicious Code Analysis Equipment Project of Security and Controlled System, No.: TC200H02X.
First Author and Second Author contribute equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Bhattacharya, G., Alam, M.J., Kenny, P.: Deep speaker recognition: Modular or monolithic? In: INTERSPEECH, pp. 1143–1147 (2019)
Qiu, M., Xue, C., Shao, Z., Sha, E.:Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In: IEEE DATE Conference, pp. 1–6 (2007)
Qiu, M., Liu, J., Li, J., et al.: A novel energy-aware fault tolerance mechanism for wireless sensor networks. In: IEEE/ACM International Conference on GCC (2011)
Qiu, M., Li, H., Sha, E.: Heterogeneous real-time embedded software optimization considering hardware platform. In: Proceedings of the 2009 ACM symposium on Applied Computing, pp. 1637–1641 (2009)
Wu, G., Zhang, H., et al.: A decentralized approach for mining event correlations in distrib-uted system monitoring. JPDC 73(3), 330–340 (2013)
Niu, J., Gao, Y., et al.: Selecting proper wireless network interfaces for user experience enhancement with guaranteed probability. JPDC 72(12), 1565–1575 (2012)
Liu, M., Zhang, S., et al.: H infinite state estimation for discrete-time chaotic systems based on a unified model. IEEE Trans. Syst. Man Cybern. 42(4), 1053–1063 (2012)
Li, Y., Song, Y., Jia, L., et al.: Intelligent fault diagnosis by fusing domain adversarial training and maximum mean discrepancy via ensemble learning. IEEE Trans. Ind. Inform. 17(4), 2833–2841 (2020)
Shao, Z., Xue, C., Zhuge, Q., et al.: Security protection and checking for embedded system integration against buffer overflow attacks via hardware/software. IEEE Trans. Comput. 55(4), 443–453 (2006)
Qiu, H., Qiu, M., Liu, M., Memmi, G.: Secure health data sharing for medical cyber-physical systems for the healthcare 4.0. IEEE J. Biomed. Health Inform. 24(9), 2499–2505 (2020)
Qiu, H., Zheng, Q., et al.: Topological graph convolutional network-based urban traffic flow and density prediction. IEEE Trans. ITS (2020)
Hua, Y., Zhang, D., Ge, S.: Research progress in the interpretability of deep learning models. J. Cyber Secur. 5(3), 1–12 (2020)
Cui, X., Wang, D., Wang, Z.J.: Chip: Channel-wise disentangled interpretation of deep convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 31(10), 4143–4156 (2019)
Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3429–3437 (2017)
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Kong, X., Tang, X., Wang, Z.: A survey of explainable artificial intelligence decision Xitong Gongcheng Lilun yu Shijian. Syst. Eng. Theory Pract. 41(2), 524–536 (2021)
Parafita, A., Vitria, J.: Explaining visual models by causal attribution. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 4167–4175. IEEE (2019)
Zhao, Q., Hastie, T.: Causal interpretations of black-box models. J. Bus. Econ. Stat. 39(1), 272–281 (2021)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)
Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_31
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Wang, H., et al.: Score-cam: score-weighted visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 24–25 (2020)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839–847. IEEE (2018)
Omeiza, D., Speakman, S., Cintas, C., Weldermariam, K.: Smooth gradcam++: An enhanced inference level visualization technique for deep convolutional neural network models. arXiv preprint arXiv:1908.01224 (2019)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Wu, B., Wu, H.: Angular discriminative deep feature learning for face verification. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2133–2137. IEEE (2020)
Chen, T., et al.: Abd-net: Attentive but diverse person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8351– 8361 (2019)
Dai, Z., Chen, M., Gu, X., Zhu, S., Tan, P.: Batch dropblock network for person re-identification and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3691–3701 (2019)
Zheng, Z., Yang, X., Yu, Z., Zheng, L., Yang, Y., Kautz, J.: Joint discriminative and generative learning for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2138–2147 (2019)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, Y., Ma, K., Liu, X., Cui, J. (2022). Cross-CAM: Focused Visual Explanations for Deep Convolutional Networks via Training-Set Tracing. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13368. Springer, Cham. https://doi.org/10.1007/978-3-031-10983-6_56
Download citation
DOI: https://doi.org/10.1007/978-3-031-10983-6_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10982-9
Online ISBN: 978-3-031-10983-6
eBook Packages: Computer ScienceComputer Science (R0)