Abstract
Class activation map (CAM) generation aims at highlighting regions of a class in an image by the classification model. However, the regions obtained are usually small and local. Existing methods attribute the problem to the ineffective CAM extraction model and pay much attention on enlarging the regions via developing new models for CAM generation, but limited success has been made. Different from the existing methods, this paper attributes such incompleteness extraction to the finite discriminative cues within a single classification model and improves CAM generation by providing more discriminative cues via training multiple classification models with consideration of class relationships. To this end, the similarities between classes are firstly measured, and hierarchical clustering is then implemented to cluster initial clusters into multiple semantic meanings level of clusters. Afterward, multiple classification models are trained on these different levels of clustering, and multiple class activation maps with various and complementary discriminative cues are obtained. Finally, the class activation map is obtained via the combination of these maps. A new orthogonal module and a two-branch network for CAM generating are also proposed to improve CAM generation via making the regions orthogonal and complementary. Experimental results on the PASCAL VOC 2012 dataset show the superior performance of the proposed CAM generation method.
Similar content being viewed by others
References
Ahn J, Kwak S (2018) Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4981–4990
Araslanov N, Roth S (2020) Single-stage semantic segmentation from image labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4253–4262
Chen X, Xu C, Yang X, Tao D (2018) Attention-GAN for object transfiguration in wild images. In: Proceedings of the European conference on computer vision (ECCV)
Choe J, Shim H (2019) Attention-based dropout layer for weakly supervised object localization. In: The IEEE Conference on computer vision and pattern recognition (CVPR)
Dubost F, Adams H, Yilmaz P, Bortsova G, van Tulder G, Ikram MA, Niessen W, Vernooij MW, de Bruijne M (2020) Weakly supervised object detection with 2d and 3d regression neural networks. Med Image Anal 65:101767
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The PASCAL visual object classes challenge 2012 (VOC2012) results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Ge W, Yang S, Yu Y (2018) Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. In: IEEE conference on computer vision and pattern recognition (CVPR) (2018)
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, pp 770–778
He X, Peng Y, Zhao J (2018) Fast fine-grained image classification via weakly supervised discriminative localization. IEEE Trans Circuits Syst Video Technol 29(5):1394–1407
Jiwoon A, Cho S, Suha K (2019) Weakly supervised learning of instance segmentation with inter-pixel relations. In: IEEE conference on computer vision and pattern recognition (CVPR)
Kim D, Cho D, Yoo D (2017) Two-phase learning for weakly supervised object localization. In: The IEEE international conference on computer vision (ICCV), pp 3554–3563
Lee J, Eunji K, Lee S, Lee J, Sungroh Y (2019) FickleNet: weakly and semi-supervised semantic image segmentation using stochastic inference. In: IEEE conference on computer vision and pattern recognition (CVPR)
Li K, Wu Z, Peng KC, Ernst J, Fu Y (2018) Tell me where to look: guided attention inference network. arXiv preprint arXiv:1802.10171 (2018)
Li Q, Anurag A, Torr PH (2018) Weakly- and semi-supervised panoptic segmentation. In: Proceedings of the European conference on computer vision (ECCV)
Li X, Liu J, Wang M (2019) Weakly supervised fine-grained visual recognition via adversarial complementary attentions and hierarchical bilinear pooling. In: Gedeon T, Wong KW, Lee M (eds) Neural information processing. Springer, Cham, pp 74–85
Oquab M, Bottou L, Laptev I, Sivic J (2015) Is object localization for free?-weakly-supervised learning with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 685–694
Pinheiro PO, Collobert R (2015) From image-level to pixel-level labeling with convolutional networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1713–1721
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D, et al (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: The IEEE international conference on computer vision (ICCV), pp 618–626
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR)
Singh KK, Lee YJ (2017) Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization. In: Proceedings of the IEEE international conference on computer vision (ICCV). IEEE, pp 3544–3553
Wan F, Wei P, Jiao J, Han Z, Ye Q (2018) Min-entropy latent model for weakly supervised object detection. In: IEEE conference on computer vision and pattern recognition (CVPR)
Wang C, Zheng H, Yu Z, Zheng Z, Gu Z, Zheng B (2018) Discriminative region proposal adversarial networks for high-quality image-to-image translation. In: Proceedings of the European conference on computer vision (ECCV)
Wei Y, Feng J, Liang X, Cheng MM, Zhao Y, Yan S (2017) Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1568–1576
Wei Y, Shen Z, Cheng B, Shi H, Xiong J, Feng J, Huang T (2018) TS2C: tight box mining with surrounding segmentation context for weakly supervised object detection. In: Proceedings of the European conference on computer vision (ECCV)
Wei Y, Xiao H, Shi H, Jie Z, Feng J, Huang TS (2018) Revisiting dilated convolution: a simple approach for weakly- and semi- supervised semantic segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR)
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: European conference on computer vision (ECCV)
Xiaopeng Z, Jiashi F, Hongkai X, Qi T (2018) Zigzag learning for weakly supervised object detection. In: IEEE conference on computer vision and pattern recognition (CVPR)
Zhang X, Wei Y, Feng J, Yang Y, Huang TS (2018) Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334
Zhang X, Wei Y, Kang G, Yang Y, Huang T (2018) Self-produced guidance for weakly-supervised object localization. In: Proceedings of the European conference on computer vision (ECCV)
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grants 61871087, 61502084, 61831005 and 61601102 and supported in part by Sichuan Science and Technology Program under Grant 2018JY0141.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Meng, F., Huang, K., Li, H. et al. Hierarchical class grouping with orthogonal constraint for class activation map generation. Neural Comput & Applic 33, 7371–7380 (2021). https://doi.org/10.1007/s00521-020-05416-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05416-2