Skip to main content
Log in

Weakly-supervised object localization with gradient-pyramid feature

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

As a basic task of computer vision task, object localization plays an important role in many computer vision based applications. Supervised methods employ manual location labels to learn to localize the objects directly, but incomplete or incorrectly assigned location labels affect localization accuracy, and the cost of manual labelling should also be extremely large. This paper proposes a weakly-supervised localization method based on a multi-scale gradient-pyramid feature, which employs the weighted gradient features on the multiple convolutional layers in order to generate a gradient-pyramid feature for object localization. Pairs of gradients and features from different layers are first extracted to compute the gradient features. Then, during the fusion of the gradient features through a pyramid model, the larger value is selected as the result of the fusion task without using the concatenated method. Finally, the multi-scale gradient-pyramid feature is obtained and used to have a more accurate object localization by using the region scaling operation. Our proposed method can be directly integrated into the pre-trained classification model to perform object localization without additional training. Experimental results on the ILSVRC 2016 dataset and CUB-200-2011 dataset show that the proposed method can achieve better object localization performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The data that support the findings of this study are available in:

1) https://image-net.org/update-mar-11-2021.php.

2) http://www.vision.caltech.edu/datasets/cub_200_2011/.

3) http://host.robots.ox.ac.uk/pascal/VOC/voc2012/.

4) https://github.com/FiresWorker/WSOL.

References

  1. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems (NeurIPS), pp 3856–3866

  2. Yang S, Gao T, Wang J, Deng B, Lansdell B, Linares-Barranco B (2021) Efficient spike-driven learning with dendritic event-based processing. Front Neuro Sci 15:97

    Google Scholar 

  3. Yang S, Wang J, Deng B, Azghadi MR, Linares-Barranco B (2021) Neuromorphic Context-dependent learning framework with fault-tolerant spike routing. IEEE Trans Neural Netw Learn Syst:1–15

  4. Yang S, Wang J, Zhang N, Deng B, Pang Y, Azghadi MR (2021) CerebelluMorphic: large-scale neuromorphic model and architecture for supervised motor learning. IEEE Trans Neural Netw Learn Syst:1–15

  5. Yang S, Wei X, Deng B, Liu C, Li H, Wang J (2018) Efficient digital implementation of a conductance-based globus pallidus neuron and the dynamics analysis. Phys A: Stat Mech Appl 494:484–502

  6. Tychsen-Smith L, Petersson L (2018) Improving object localization with fitness nms and bounded iou loss. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6877–6885

  7. Cui S, Wang R, Hu J, Wei J, Wang S, Lou Z (2021)In-hand object localization using a novel high-resolution Visuotactile sensor. IEEE Trans Ind Electron 69(6):6015–6025

  8. Qin Z, Wang J, Lu Y (2019) Monogrnet: A geometric reasoning network for monocular 3d object localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, no 01, pp 8851–8858

  9. Cao G, Xie X, Yang W, Liao Q, Shi G, Wu J (2018)Feature-fused SSD: Fast detection for small objects. In: Ninth International Conference on Graphic and Image Processing. International Society for Optics and Photonics, vol 10615, p 106151E

  10. Mhalla A, Chateau T, Gazzah S, Amara NEB (2018) An embedded computer-vision system for multi-object detection in traffic surveillance. IEEE Trans Intell Transp Syst 20(11):4006–4018

    Article  Google Scholar 

  11. Cao J, Pang Y, Zhao S, Li X (2019)High-level semantic networks for multi-scale object detection. IEEE Trans Circuits Syst Video Technol 30(10):3372–3386

    Article  Google Scholar 

  12. Amin J, Sharif M, Yasmin M, Fernandes SL (2020) A distinctive approach in brain tumor detection and classification using MRI. Pattern Recognit Lett 139:118–127

    Article  Google Scholar 

  13. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Object detectors emerge in deep scene cnns. International Conference on Learning Representations

  14. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  15. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929

  16. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556

  17. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  18. Zhang X, Wei Y, Kang G, Yang Y, Huang T (2018)Self-produced guidance for weakly-supervised object localization. In: Proceedings of the European conference on computer vision (ECCV), pp 597–613

  19. Choe J, Shim H (2019)Attention-based dropout layer for weakly supervised object localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2219–2228

  20. Mai J, Yang M, Luo W (2020) Erasing integrated learning: A simple yet effective approach for weakly supervised object localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8766–8775

  21. Meng M, Zhang T, Tian Q, Zhang Y, Wu F (2021) Foreground activation maps for weakly supervised object localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3385–3395

  22. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, Cham, pp 818–833

  23. Singh KK, Lee YJ (2017) Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 3544–3553

  24. Bazzani L, Bergamo A, Anguelov D, Torresani L (2016)Self-taught object localization with deep networks. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–9

  25. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  26. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, pp 448–456

  27. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  28. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence

  29. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  30. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  31. Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 4470–4478

  32. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, pp 234–241

  33. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  34. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  35. Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation.arXiv preprint arXiv:1706.05587

  36. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018)Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pp 801–818

  37. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890

  38. Sagar A, Soundrapandiyan R (2021) Semantic segmentation with multi scale spatial attention for self driving cars. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2650–2656

  39. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  40. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  41. Liu Y, Cheng MM, Hu X, Wang K, Bai X (2017) Richer convolutional features for edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3000–3009

  42. Gao S, Cheng MM, Zhao K, Zhang XY, Yang MH, Torr PH (2019) Res2net: A new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell

  43. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  44. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … Fei-Fei, L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

  45. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSDBirds-200-2011 Dataset. Tech. Rep. Cns-Tr-2011-001, California Institute of Technology

  46. Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  47. Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034

  48. Zhang J, Bargal SA, Lin Z, Brandt J, Shen X, Sclaroff S (2018)Top-down neural attention by excitation backprop. Int J Comput Vis 126(10):1084–1102

    Article  Google Scholar 

  49. Zhang X, Wei Y, Feng J, Yang Y, Huang TS (2018) Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1325–1334

  50. Bae W, Noh J, Kim G (2020), August Rethinking class activation mapping for weakly supervised object localization. In: European Conference on Computer Vision. Springer, Cham, pp 618–634

  51. Zhang CL, Cao YH, Wu J (2020) Rethinking the route towards weakly supervised object localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13460–13469

  52. Choe J, Han D, Yun S, Ha JW, Oh SJ, Shim H (2021)Region-based dropout with attention prior for weakly supervised object localization. Pattern Recogn 116:107949

    Article  Google Scholar 

  53. Babar S, Das S (2021) Where to Look?: Mining complementary image regions for weakly supervised object localization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 1010–1019

Download references

Acknowledgements

The authors would like to thank the editor and the reviewers for their critical and constructive comments and suggestions. We also would like to acknowledge Dr. Vasile Palade for proofreading the whole manuscript and giving the valuable comments. This work was supported in part by the National Natural Science Foundation of China (Projects Numbers: 61673194, 61672263, 61672265), and in part by the National First-Class Discipline Program of Light Industry Technology and Engineering (Project Number: LITE2018-25).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mao, Z., Zhou, Y., Sun, J. et al. Weakly-supervised object localization with gradient-pyramid feature. Appl Intell 53, 2923–2935 (2023). https://doi.org/10.1007/s10489-022-03686-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03686-y

Keywords

Navigation