Skip to main content

Cross-CAM: Focused Visual Explanations for Deep Convolutional Networks via Training-Set Tracing

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13368))

  • 1735 Accesses

Abstract

In recent years, the widely used deep learning technologies have always been controversial in terms of reliability and credibility. Class Activation Map (CAM) has been proposed to explain the deep learning models. Existing CAM-based algorithms highlight critical portions of the input image, but they don’t go any farther in tracing the neural network’s decision-basis. This work proposes Cross-CAM, a visual interpretation method which supports deep traceability for prediction-basis samples and focuses on similar regions of the category based on the input image and the prediction-basis samples. The Cross-CAM extracts deep discriminative feature vectors and screens out the prediction-basis samples from the training set. The similarity-weight and the grad-weight are then combined to form the cross-weight, which highlights similar regions and aids in classification decisions. On the ILSVRC-15 dataset, the proposed Cross-CAM is tested. The new weakly-supervised localization evaluation metric IoS (Intersection over Self) is proposed to effectively evaluate the focusing effect. Using Cross-CAM highlight regions, the top-1 location error for weakly-supervised localization achieves 44.95% on the ILSVRC-15 validation set, which is 16.25% lower than Grad-CAM. In comparison to Grad-CAM, Cross-CAM focuses on the key regions using the similarity between the test image and the prediction-basis samples, according to the visualisation results.

Supported by the National Natural Science Foundation of China (32071775) and 2020 Industrial Internet Innovation and Development Project - Malicious Code Analysis Equipment Project of Security and Controlled System, No.: TC200H02X.

First Author and Second Author contribute equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  2. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  3. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  4. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  5. Bhattacharya, G., Alam, M.J., Kenny, P.: Deep speaker recognition: Modular or monolithic? In: INTERSPEECH, pp. 1143–1147 (2019)

    Google Scholar 

  6. Qiu, M., Xue, C., Shao, Z., Sha, E.:Energy minimization with soft real-time and DVS for uniprocessor and multiprocessor embedded systems. In: IEEE DATE Conference, pp. 1–6 (2007)

    Google Scholar 

  7. Qiu, M., Liu, J., Li, J., et al.: A novel energy-aware fault tolerance mechanism for wireless sensor networks. In: IEEE/ACM International Conference on GCC (2011)

    Google Scholar 

  8. Qiu, M., Li, H., Sha, E.: Heterogeneous real-time embedded software optimization considering hardware platform. In: Proceedings of the 2009 ACM symposium on Applied Computing, pp. 1637–1641 (2009)

    Google Scholar 

  9. Wu, G., Zhang, H., et al.: A decentralized approach for mining event correlations in distrib-uted system monitoring. JPDC 73(3), 330–340 (2013)

    Google Scholar 

  10. Niu, J., Gao, Y., et al.: Selecting proper wireless network interfaces for user experience enhancement with guaranteed probability. JPDC 72(12), 1565–1575 (2012)

    Google Scholar 

  11. Liu, M., Zhang, S., et al.: H infinite state estimation for discrete-time chaotic systems based on a unified model. IEEE Trans. Syst. Man Cybern. 42(4), 1053–1063 (2012)

    Google Scholar 

  12. Li, Y., Song, Y., Jia, L., et al.: Intelligent fault diagnosis by fusing domain adversarial training and maximum mean discrepancy via ensemble learning. IEEE Trans. Ind. Inform. 17(4), 2833–2841 (2020)

    Article  Google Scholar 

  13. Shao, Z., Xue, C., Zhuge, Q., et al.: Security protection and checking for embedded system integration against buffer overflow attacks via hardware/software. IEEE Trans. Comput. 55(4), 443–453 (2006)

    Article  Google Scholar 

  14. Qiu, H., Qiu, M., Liu, M., Memmi, G.: Secure health data sharing for medical cyber-physical systems for the healthcare 4.0. IEEE J. Biomed. Health Inform. 24(9), 2499–2505 (2020)

    Article  Google Scholar 

  15. Qiu, H., Zheng, Q., et al.: Topological graph convolutional network-based urban traffic flow and density prediction. IEEE Trans. ITS (2020)

    Google Scholar 

  16. Hua, Y., Zhang, D., Ge, S.: Research progress in the interpretability of deep learning models. J. Cyber Secur. 5(3), 1–12 (2020)

    Google Scholar 

  17. Cui, X., Wang, D., Wang, Z.J.: Chip: Channel-wise disentangled interpretation of deep convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 31(10), 4143–4156 (2019)

    Article  MathSciNet  Google Scholar 

  18. Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3429–3437 (2017)

    Google Scholar 

  19. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)

    Google Scholar 

  20. Kong, X., Tang, X., Wang, Z.: A survey of explainable artificial intelligence decision Xitong Gongcheng Lilun yu Shijian. Syst. Eng. Theory Pract. 41(2), 524–536 (2021)

    Google Scholar 

  21. Parafita, A., Vitria, J.: Explaining visual models by causal attribution. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 4167–4175. IEEE (2019)

    Google Scholar 

  22. Zhao, Q., Hastie, T.: Causal interpretations of black-box models. J. Bus. Econ. Stat. 39(1), 272–281 (2021)

    Article  MathSciNet  Google Scholar 

  23. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)

  24. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014)

  25. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)

    Google Scholar 

  26. Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_31

    Chapter  Google Scholar 

  27. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

    Google Scholar 

  28. Wang, H., et al.: Score-cam: score-weighted visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 24–25 (2020)

    Google Scholar 

  29. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

    Google Scholar 

  30. Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839–847. IEEE (2018)

    Google Scholar 

  31. Omeiza, D., Speakman, S., Cintas, C., Weldermariam, K.: Smooth gradcam++: An enhanced inference level visualization technique for deep convolutional neural network models. arXiv preprint arXiv:1908.01224 (2019)

  32. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)

    Google Scholar 

  33. Wu, B., Wu, H.: Angular discriminative deep feature learning for face verification. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2133–2137. IEEE (2020)

    Google Scholar 

  34. Chen, T., et al.: Abd-net: Attentive but diverse person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8351– 8361 (2019)

    Google Scholar 

  35. Dai, Z., Chen, M., Gu, X., Zhu, S., Tan, P.: Batch dropblock network for person re-identification and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3691–3701 (2019)

    Google Scholar 

  36. Zheng, Z., Yang, X., Yu, Z., Zheng, L., Yang, Y., Kautz, J.: Joint discriminative and generative learning for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2138–2147 (2019)

    Google Scholar 

  37. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Cui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, Y., Ma, K., Liu, X., Cui, J. (2022). Cross-CAM: Focused Visual Explanations for Deep Convolutional Networks via Training-Set Tracing. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13368. Springer, Cham. https://doi.org/10.1007/978-3-031-10983-6_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-10983-6_56

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-10982-9

  • Online ISBN: 978-3-031-10983-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics