Skip to main content
Log in

Hierarchical saliency mapping for weakly supervised object localization based on class activation mapping

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Weakly supervised object localization is a basic research in the field of computer vision. In this paper, a hierarchical saliency mapping network for object localization is proposed and designed to avoid missing detailed information of potential object. Based on the classical convolution network, we remove the fully connected part and add multiple information extraction branches. The network extracts information from convolution layers of different scales to generate Hierarchical Saliency Map. Hierarchical Saliency Maps that include Hierarchical-Class Activation Map and Hierarchical-Spatial Pyramid Saliency Map fuse deep-level features and low-level features to locate object. The datasets used for testing are Caltech-UCSD Birds 200, Caltech101 and ImageNet. Compared with Class Activation Map and Spatial Pyramid Saliency Map, the localization accuracy has been improved. This method can be used for fine-grained classification, object tracking and other fields.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin

    Book  Google Scholar 

  2. Abualigah L, Hanandeh E (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5:19–28. https://doi.org/10.5121/ijcsea.2015.5102

    Google Scholar 

  3. Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795

    Article  Google Scholar 

  4. Abualigah LM, Khader AT, Hanandeh ES, Gandomi AH (2017) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435

    Article  Google Scholar 

  5. Abualigah LM, Khader AT, Hanandeh ES (2018) A combination of objective functions and hybrid krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell 73:111–125. https://doi.org/10.1016/j.engappai.2018.05.003

    Article  Google Scholar 

  6. Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071

    Article  Google Scholar 

  7. Bazzani L, Bergamo A, Anguelov D, Torresani L (2016) Self-taught object localization with deep networks. In: 2016 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1–9

  8. Chu J, Guo Z, Leng L (2018) Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 1–1. https://doi.org/10.1109/ACCESS.2018.2815149

  9. Cinbis RG, Verbeek J, Schmid C (2016) Weakly supervised object localization with multi-fold multiple instance learning. IEEE Trans Pattern Anal Mach Intell 39(1):189–203

    Article  Google Scholar 

  10. Fan DP, Liu JJ, Gao S, Hou Q, Borji A, Cheng MM (2018) Salient objects in clutter: bringing salient object detection to the foreground. In: European conference on computer vision (ECCV), pp 196–212

  11. Fan DP, Lin Z, Zhao JX, Liu Y, Zhang Z, Hou Q, Zhu M, Cheng MM (2019) Rethinking rgb-d salient object detection: models, datasets and large-scale benchmarks

  12. Fan DP, Wang W, Cheng MM, Shen J (2019) Shifting more attention to video salient object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 8554–8564

  13. Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70

    Article  Google Scholar 

  14. Fu K, Zhao Q, Gu IY, Yang J (2019) Deepside: a general deep framework for salient object detection. Neurocomputing 356:69–82

    Article  Google Scholar 

  15. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  16. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  17. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  18. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580

  19. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  20. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  21. Leng L, Zhang J (2011) Dual-key-binding cancelable palmprint cryptosystem for palmprint protection and information security. J Netw Comput Appl 34(6):1979–1989. https://doi.org/10.1016/j.jnca.2011.07.003. Control and Optimization over Wireless Networks

    Article  Google Scholar 

  22. Leng L, Zhang J (2013) Palmhash code vs. palmphasor code. Neurocomputing 108:1–12. https://doi.org/10.1016/j.neucom.2012.08.028

    Article  Google Scholar 

  23. Leng L, Zhang J, Xu J, Khan MK, Alghathbar K (2010) Dynamic weighted discrimination power analysis in dct domain for face and palmprint recognition. In: 2010 international conference on information and communication technology convergence (ICTC). IEEE, pp 467–471

  24. Leng L, Li M, Teoh ABJ (2013) Conjugate 2dpalmhash code for secure palm-print-vein verification. In: 2013 6th International congress on image and signal processing (CISP), vol 3. IEEE, pp 1705–1710

  25. Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76(1):333–354

    Article  Google Scholar 

  26. Li D, Huang JB, Li Y, Wang S, Yang MH (2019) Progressive representation adaptation for weakly supervised object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence

  27. Liu P, Yu H, Cang S (2019) Adaptive neural network tracking control for underactuated systems with matched and mismatched disturbances. Nonlinear Dyn 98(2):1447–1464

    Article  Google Scholar 

  28. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  29. Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1717–1724

  30. Oquab M, Bottou L, Laptev I, Sivic J (2015) Is object localization for free?-weakly-supervised learning with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 685–694

  31. Preeti, Kumar D (2017) Feature selection for face recognition using dct-pca and bat algorithm. Int J Inf Technol 9(4):411–423

    Google Scholar 

  32. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  33. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  34. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  35. Song H, Wang W, Zhao S, Shen J, Lam KM (2018) Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 715–731

  36. Sun L, Zhao C, Yan Z, Liu P, Duckett T, Stolkin R (2019) A novel weakly-supervised approach for rgb-d-based nuclear waste object detection. IEEE Sens J 19(9):3487–3500

    Article  Google Scholar 

  37. Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp 1139–1147

  38. Tang S, Li Y, Deng L, Zhang Y (2017) Object localization based on proposal fusion. IEEE Trans Multimed 19(9):2105–2116

    Article  Google Scholar 

  39. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. California Institute of Technology

  40. Wan Z, He H (2017) Weakly supervised object localization with deep convolutional neural network based on spatial pyramid saliency map. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 4177–4181

  41. Xia S, Zeng J, Leng L, Fu X (2019) Ws-am: weakly supervised attention map for scene recognition. Electronics 8:1072. https://doi.org/10.3390/electronics8101072

    Article  Google Scholar 

  42. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833

  43. Zhang X, Wei Y, Feng J, Yang Y, Huang TS (2018) Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334

  44. Zhao J, Cao Y, Fan D, Cheng M, Li X, Zhang L (2019) Contrast prior and fluid pyramid integration for rgbd salient object detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3922–3931

  45. Zhao JX, Liu JJ, Fan DP, Cao Y, Yang J, Cheng MM (2019) Egnet: edge guidance network for salient object detection. In: Proceedings of the IEEE international conference on computer vision, pp 8779–8788

  46. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2014) Object detectors emerge in deep scene cnns. arXiv:1412.6856

  47. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495

  48. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929

Download references

Acknowledgments

This work was supported by Chongqing Science and Technology Commission Project (Grant No:cstc2017jcyj-AX0142 and cstc2018jcyjAX0525), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongjian Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, Z., Li, H., Zeng, X. et al. Hierarchical saliency mapping for weakly supervised object localization based on class activation mapping. Multimed Tools Appl 79, 31283–31298 (2020). https://doi.org/10.1007/s11042-020-09556-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09556-4

Keywords

Navigation