Skip to main content

MinMax-CAM: Increasing Precision of Explaining Maps by Contrasting Gradient Signals and Regularizing Kernel Usage

  • Conference paper
  • First Online:
Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022)

Abstract

While various AI explaining methods has been developed in the last decade, the Class Activation Map (CAM) technique (and derivations thereof) has remains a popular alternative today. Highly versatile, and sufficiently simple, CAM-based techniques have been consistently used to explain the predictions of Convolutional Neural Networks over classification problems. However, many studies conducted to assert the precision of the produced explaining maps have done so on single-label multi-class classification problems, where the task is simplified by focusing and framing the objects of interest in the foreground. In this work, we test and evaluate the efficacy of CAM-based techniques over distinct multi-label sets. We find that techniques that were created with single-label classification in mind (such as Grad-CAM, Grad-CAM++ and Score-CAM) will often produce diffuse visualization maps in multi-label scenarios, overstepping the boundaries of their explaining objects of interest onto objects of different classes. We propose a generalization of the Grad-CAM technique for the multi-label scenario that produces more focused explaining maps by maximizing the activation of a class of interest while minimizing the activation of the remaining classes present in the sample. Our results indicate that MinMax-CAM produces more focused explaining maps over different network architectures and datasets. Finally, we present a regularization strategy that encourages sparse positive weights in the last classifying, while penalizing the association between the classification of a class and the occurrence of correlated patterns, resulting in cleaner activation maps.

Supported by CNPq (grants 140929/2021-5 and 309330/2018-1) and LNCC/MCTI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Human Protein Atlas. Human Protein Atlas Image Classification. In: Kaggle. kaggle.com/competitions/human-protein-atlas-image-classification (Jan 2019). Accessed on Aug 2022.

References

  1. Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. In: 32nd International Conference on Neural Information Processing Systems (NIPS), pp. 9525–9536. Curran Associates Inc., Red Hook, NY, USA (2018)

    Google Scholar 

  2. Belharbi, S., Sarraf, A., Pedersoli, M., Ben Ayed, I., McCaffrey, L., Granger, E.: F-CAM: full resolution class activation maps via guided parametric upscaling. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3490–3499 (2022)

    Google Scholar 

  3. Chan, L., Hosseini, M.S., Plataniotis, K.N.: A comprehensive analysis of weakly-supervised semantic segmentation in different image domains. Int. J. Comput. Vision (IJCV) 129(2), 361–384 (2021)

    Article  Google Scholar 

  4. Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-CAM++: generalized gradient-based visual explanations for deep convolutional networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839–847. IEEE (2018)

    Google Scholar 

  5. David., L., Pedrini., H., Dias., Z.: MinMax-CAM: improving focus of cam-based visualization techniques in multi-label problems. In: 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, pp. 106–117. INSTICC, SciTePress (2022). https://doi.org/10.5220/0010807800003124

  6. Demir, I., et al.: Deepglobe 2018: a challenge to parse the earth through satellite images. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 172–181 (2018)

    Google Scholar 

  7. Dhillon, A., Verma, G.K.: Convolutional neural network: a review of models, methodologies and applications to object detection. Progr. Artif. Intell. 9(2), 85–112 (2020)

    Article  Google Scholar 

  8. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision (IJCV) 88(2), 303–338 (2010)

    Article  Google Scholar 

  9. Huff, D.T., Weisman, A.J., Jeraj, R.: Interpretation and visualization techniques for deep learning models in medical imaging. Phys. Med. Biol. 66(4), 04TR01 (2021)

    Article  Google Scholar 

  10. Jiang, P.T., Zhang, C.B., Hou, Q., Cheng, M.M., Wei, Y.: Layercam: exploring hierarchical class activation maps for localization. IEEE Trans. Image Process. 30, 5875–5888 (2021). https://doi.org/10.1109/TIP.2021.3089943

    Article  Google Scholar 

  11. Lee, J.R., Kim, S., Park, I., Eo, T., Hwang, D.: Relevance-CAM: your model already knows where to look. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14944–14953 (2021)

    Google Scholar 

  12. Li, Z., Liu, F., Yang, W., Peng, S., Zhou, J.: A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. (TNNLS) (2021)

    Google Scholar 

  13. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) European Conference on Computer Vision (ECCV), pp. 740–755. Springer International Publishing, Cham (2014)

    Google Scholar 

  14. Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), p. 1 (2021). https://doi.org/10.1109/TPAMI.2021.3059968

  15. Ouyang, W., et al.: Analysis of the human protein atlas image classification competition. Nat. Methods 16(12), 1254–1261 (2019). https://doi.org/10.1038/s41592-019-0658-6

    Article  Google Scholar 

  16. Pons, J., Slizovskaia, O., Gong, R., Gómez, E., Serra, X.: Timbre analysis of music audio signals with convolutional neural networks. In: 25th European Signal Processing Conference (EUSIPCO), pp. 2744–2748. IEEE (2017)

    Google Scholar 

  17. Ramaswamy, H.G., et al.: Ablation-CAM: visual explanations for deep convolutional network via gradient-free localization. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 983–991 (2020)

    Google Scholar 

  18. Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29(9), 2352–2449 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  19. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)

    Google Scholar 

  20. Shendryk, I., Rist, Y., Lucas, R., Thorburn, P., Ticehurst, C.: Deep learning - a new approach for multi-label scene classification in PlanetScope and sentinel-2 imagery. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 1116–1119 (2018)

    Google Scholar 

  21. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. CoRR abs/1312.6034 (2014)

    Google Scholar 

  22. Smilkov, D., Thorat, N., Kim, B., Viégas, F.B., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv:1706.03825 (2017)

  23. Springenberg, J., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: International Conference on Learning Representations (ICLR) - Workshop Track (2015)

    Google Scholar 

  24. Srinivas, S., Fleuret, F.: Full-gradient representation for neural network visualization. arXiv preprint arXiv:1905.00780 (2019)

  25. Su, Y., Sun, R., Lin, G., Wu, Q.: Context decoupling augmentation for weakly supervised semantic segmentation. arXiv:2103.01795 (2021)

  26. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning (ICML), pp. 1139–1147. PMLR (2013)

    Google Scholar 

  27. Tachibana, H., Uenoyama, K., Aihara, S.: Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4784–4788 (2018). https://doi.org/10.1109/ICASSP.2018.8461829

  28. Tarekegn, A.N., Giacobini, M., Michalak, K.: A review of methods for imbalanced multi-label classification. Pattern Recogn. 118, 107965 (2021). https://doi.org/10.1016/j.patcog.2021.107965, https://www.sciencedirect.com/science/article/pii/S0031320321001527

  29. Vilone, G., Longo, L.: Explainable artificial intelligence: a systematic review. arXiv preprint arXiv:2006.00093 (2020)

  30. Vilone, G., Longo, L.: Notions of explainability and evaluation approaches for explainable artificial intelligence. Inf. Fusion 76, 89–106 (2021). https://doi.org/10.1016/j.inffus.2021.05.009

    Article  Google Scholar 

  31. Wang, H., Naidu, R., Michael, J., Kundu, S.S.: SS-CAM: smoothed score-cam for sharper visual feature localization. arXiv preprint arXiv:2006.14255 (2020)

  32. Wang, H., et al.: Score-CAM: score-weighted visual explanations for convolutional neural networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 111–119 (2020)

    Google Scholar 

  33. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4732 (2016)

    Google Scholar 

  34. Xu, F., et al.: Explainable AI: a brief survey on history, research areas, approaches and challenges. In: Tang, J., Kan, M.Y., Zhao, D., Li, S., Zan, H. (eds.) Natural Language Processing and Chinese Computing, pp. 563–574. Springer International Publishing, Cham (2019)

    Chapter  Google Scholar 

  35. Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. In: AAAI Conference on Artificial Intelligence, vol. 33, pp. 7370–7377 (2019)

    Google Scholar 

  36. Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: CutMix: regularization strategy to train strong classifiers with localizable features. In: IEEE/CVF International Conference on Computer Vision (CVPR), pp. 6023–6032 (2019)

    Google Scholar 

  37. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Chapter  Google Scholar 

  38. Zhang, D., Han, J., Cheng, G., Yang, M.H.: Weakly supervised object localization and detection: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1 (2021). https://doi.org/10.1109/TPAMI.2021.3074313

  39. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: MixUp: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)

  40. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucas David .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

David, L., Pedrini, H., Dias, Z. (2023). MinMax-CAM: Increasing Precision of Explaining Maps by Contrasting Gradient Signals and Regularizing Kernel Usage. In: de Sousa, A.A., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2022. Communications in Computer and Information Science, vol 1815. Springer, Cham. https://doi.org/10.1007/978-3-031-45725-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45725-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45724-1

  • Online ISBN: 978-3-031-45725-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics