Abstract
Gradient-based visual explanation techniques, such as Grad-CAM and Grad-CAM++ have been used to interpret how convolutional neural networks make decisions. But not all techniques can work properly in the task of remote sensing (RS) image classification. In this paper, after analyzing why Grad-CAM performs worse than Grad-CAM++ for RS images classification from the perspective of weight matrix of gradients, we propose an efficient visual explanation approach dubbed median-pooling Grad-CAM. It uses median pooling to capture the main trend of gradients and approximates the contributions of feature maps with respect to a specific class. We further propose a new evaluation index, confidence drop %, to express the degree of drop of classification accuracy when occluding the important regions that are captured by the visual saliency. Experiments on two RS image datasets and for two CNN models of VGG and ResNet, show our proposed method offers a good tradeoff between interpretability and efficiency of visual explanation for CNN-based models in RS image classification. The low time-complexity median-pooling Grad-CAM could provide a good complement to the gradient-based visual explanation techniques in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Yang Y., Newsam, S.: Bag-of-visual-words and spatial extensions for land-use classification. In: ACM GIS 2010, p. 270. ACM, San Jose (2010)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Molnar, C.: Interpretable machine learning - a guide for making black box models explainable. https://christophm.github.io/. Accessed 21 Feb 2020
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: Proceedings of International Conference on Learning Representations (2015)
Ribeiro, M. T., Singh, S., Guestrin, C.: Why should I trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD, pp. 1135–1144. ACM (2016)
Zhang, H., Chen, J., Xue, H., Zhang, Q.: Towards a unified evaluation of explanation methods without ground truth. arXiv:1911.09017 (2019)
Zhou, B., Khosla, A., Oliva, L. A., A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of CVPR, pp. 2921–2929. IEEE, Las Vegas (2016)
Fukui H., Hirakawa T., Yamashita T., et al.: Attention branch network: learning of attention mechanism for visual explanation. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10697–10706. IEEE (2019)
Selvaraju, R. R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 ICCV, pp. 618–626. IEEE, Venice (2017)
Chattopadhay, A., Sarkar, A.: Grad-CAM++: generalized gradient-based visual explanations for deep convolutional networks. In: WACV, pp. 839–847. IEEE, Lake Tahoe (2018)
Omeiza, D., Speakman, S., Cintas C., Weldemariam, K.: Smooth grad-CAM++: an enhanced inference level visualization technique for deep convolutional neural network models. arXiv preprint arXiv:1908.01224v1 (2019)
Smilkov, D., Thorat, N., Kim, B., et al.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825v1 (2017)
Song, W., Dai, S.Y., Wang, J., Huang, D., Liotta, A., Di Fatta, G.: Bi-gradient verification for grad-CAM towards accurate visual explanation for remote sensing images. In: 2019 ICDMW, pp. 473–479, Beijing, China (2019)
Wang, Y., Su, H., Zhang, B., Hu, X.: Learning reliable visual saliency for model explanations. IEEE Trans. Multimed. 22(7), 1796–1807 (2020)
Adebayo, J., Gilmer, J., Muelly, M., et al.: Sanity checks for saliency maps. In: Advances in Neural Information Processing Systems, pp. 9505–9515. Curran Associates, Inc. (2018)
Yang M., Kim, B.: Benchmarking attribution methods with relative feature importance. arXiv preprint arXiv:1907.09701 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 CVPR, pp. 770–778. IEEE, Las Vegas (2016)
Zhao, B., Zhong, Y., Xia, G.S., Zhang, L.: Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 54(4), 2108–2123 (2016)
Acknowledgement
This work was supported by NSFC (601702323) and Marine Science Research Special Project of Hebei Normal University of Sci & Tech (2018HY020).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, W., Dai, S., Huang, D., Song, J., Antonio, L. (2021). Median-Pooling Grad-CAM: An Efficient Inference Level Visual Explanation for CNN Networks in Remote Sensing Image Classification. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-67835-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67834-0
Online ISBN: 978-3-030-67835-7
eBook Packages: Computer ScienceComputer Science (R0)