Abstract
While various AI explaining methods has been developed in the last decade, the Class Activation Map (CAM) technique (and derivations thereof) has remains a popular alternative today. Highly versatile, and sufficiently simple, CAM-based techniques have been consistently used to explain the predictions of Convolutional Neural Networks over classification problems. However, many studies conducted to assert the precision of the produced explaining maps have done so on single-label multi-class classification problems, where the task is simplified by focusing and framing the objects of interest in the foreground. In this work, we test and evaluate the efficacy of CAM-based techniques over distinct multi-label sets. We find that techniques that were created with single-label classification in mind (such as Grad-CAM, Grad-CAM++ and Score-CAM) will often produce diffuse visualization maps in multi-label scenarios, overstepping the boundaries of their explaining objects of interest onto objects of different classes. We propose a generalization of the Grad-CAM technique for the multi-label scenario that produces more focused explaining maps by maximizing the activation of a class of interest while minimizing the activation of the remaining classes present in the sample. Our results indicate that MinMax-CAM produces more focused explaining maps over different network architectures and datasets. Finally, we present a regularization strategy that encourages sparse positive weights in the last classifying, while penalizing the association between the classification of a class and the occurrence of correlated patterns, resulting in cleaner activation maps.
Supported by CNPq (grants 140929/2021-5 and 309330/2018-1) and LNCC/MCTI.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Human Protein Atlas. Human Protein Atlas Image Classification. In: Kaggle. kaggle.com/competitions/human-protein-atlas-image-classification (Jan 2019). Accessed on Aug 2022.
References
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. In: 32nd International Conference on Neural Information Processing Systems (NIPS), pp. 9525–9536. Curran Associates Inc., Red Hook, NY, USA (2018)
Belharbi, S., Sarraf, A., Pedersoli, M., Ben Ayed, I., McCaffrey, L., Granger, E.: F-CAM: full resolution class activation maps via guided parametric upscaling. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3490–3499 (2022)
Chan, L., Hosseini, M.S., Plataniotis, K.N.: A comprehensive analysis of weakly-supervised semantic segmentation in different image domains. Int. J. Comput. Vision (IJCV) 129(2), 361–384 (2021)
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-CAM++: generalized gradient-based visual explanations for deep convolutional networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839–847. IEEE (2018)
David., L., Pedrini., H., Dias., Z.: MinMax-CAM: improving focus of cam-based visualization techniques in multi-label problems. In: 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, pp. 106–117. INSTICC, SciTePress (2022). https://doi.org/10.5220/0010807800003124
Demir, I., et al.: Deepglobe 2018: a challenge to parse the earth through satellite images. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 172–181 (2018)
Dhillon, A., Verma, G.K.: Convolutional neural network: a review of models, methodologies and applications to object detection. Progr. Artif. Intell. 9(2), 85–112 (2020)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision (IJCV) 88(2), 303–338 (2010)
Huff, D.T., Weisman, A.J., Jeraj, R.: Interpretation and visualization techniques for deep learning models in medical imaging. Phys. Med. Biol. 66(4), 04TR01 (2021)
Jiang, P.T., Zhang, C.B., Hou, Q., Cheng, M.M., Wei, Y.: Layercam: exploring hierarchical class activation maps for localization. IEEE Trans. Image Process. 30, 5875–5888 (2021). https://doi.org/10.1109/TIP.2021.3089943
Lee, J.R., Kim, S., Park, I., Eo, T., Hwang, D.: Relevance-CAM: your model already knows where to look. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14944–14953 (2021)
Li, Z., Liu, F., Yang, W., Peng, S., Zhou, J.: A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. (TNNLS) (2021)
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) European Conference on Computer Vision (ECCV), pp. 740–755. Springer International Publishing, Cham (2014)
Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), p. 1 (2021). https://doi.org/10.1109/TPAMI.2021.3059968
Ouyang, W., et al.: Analysis of the human protein atlas image classification competition. Nat. Methods 16(12), 1254–1261 (2019). https://doi.org/10.1038/s41592-019-0658-6
Pons, J., Slizovskaia, O., Gong, R., Gómez, E., Serra, X.: Timbre analysis of music audio signals with convolutional neural networks. In: 25th European Signal Processing Conference (EUSIPCO), pp. 2744–2748. IEEE (2017)
Ramaswamy, H.G., et al.: Ablation-CAM: visual explanations for deep convolutional network via gradient-free localization. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 983–991 (2020)
Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29(9), 2352–2449 (2017)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)
Shendryk, I., Rist, Y., Lucas, R., Thorburn, P., Ticehurst, C.: Deep learning - a new approach for multi-label scene classification in PlanetScope and sentinel-2 imagery. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 1116–1119 (2018)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. CoRR abs/1312.6034 (2014)
Smilkov, D., Thorat, N., Kim, B., Viégas, F.B., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv:1706.03825 (2017)
Springenberg, J., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: International Conference on Learning Representations (ICLR) - Workshop Track (2015)
Srinivas, S., Fleuret, F.: Full-gradient representation for neural network visualization. arXiv preprint arXiv:1905.00780 (2019)
Su, Y., Sun, R., Lin, G., Wu, Q.: Context decoupling augmentation for weakly supervised semantic segmentation. arXiv:2103.01795 (2021)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning (ICML), pp. 1139–1147. PMLR (2013)
Tachibana, H., Uenoyama, K., Aihara, S.: Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4784–4788 (2018). https://doi.org/10.1109/ICASSP.2018.8461829
Tarekegn, A.N., Giacobini, M., Michalak, K.: A review of methods for imbalanced multi-label classification. Pattern Recogn. 118, 107965 (2021). https://doi.org/10.1016/j.patcog.2021.107965, https://www.sciencedirect.com/science/article/pii/S0031320321001527
Vilone, G., Longo, L.: Explainable artificial intelligence: a systematic review. arXiv preprint arXiv:2006.00093 (2020)
Vilone, G., Longo, L.: Notions of explainability and evaluation approaches for explainable artificial intelligence. Inf. Fusion 76, 89–106 (2021). https://doi.org/10.1016/j.inffus.2021.05.009
Wang, H., Naidu, R., Michael, J., Kundu, S.S.: SS-CAM: smoothed score-cam for sharper visual feature localization. arXiv preprint arXiv:2006.14255 (2020)
Wang, H., et al.: Score-CAM: score-weighted visual explanations for convolutional neural networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 111–119 (2020)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4732 (2016)
Xu, F., et al.: Explainable AI: a brief survey on history, research areas, approaches and challenges. In: Tang, J., Kan, M.Y., Zhao, D., Li, S., Zan, H. (eds.) Natural Language Processing and Chinese Computing, pp. 563–574. Springer International Publishing, Cham (2019)
Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. In: AAAI Conference on Artificial Intelligence, vol. 33, pp. 7370–7377 (2019)
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: CutMix: regularization strategy to train strong classifiers with localizable features. In: IEEE/CVF International Conference on Computer Vision (CVPR), pp. 6023–6032 (2019)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Zhang, D., Han, J., Cheng, G., Yang, M.H.: Weakly supervised object localization and detection: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1 (2021). https://doi.org/10.1109/TPAMI.2021.3074313
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: MixUp: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
David, L., Pedrini, H., Dias, Z. (2023). MinMax-CAM: Increasing Precision of Explaining Maps by Contrasting Gradient Signals and Regularizing Kernel Usage. In: de Sousa, A.A., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2022. Communications in Computer and Information Science, vol 1815. Springer, Cham. https://doi.org/10.1007/978-3-031-45725-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-45725-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45724-1
Online ISBN: 978-3-031-45725-8
eBook Packages: Computer ScienceComputer Science (R0)