Skip to main content
Log in

Recurrent self-optimizing proposals for weakly supervised object detection

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Weakly supervised object detection (WSOD) has attracted attention increasingly in object detection, as it only requires image-level annotations to train the detector. A typical paradigm for WSOD is to first generate candidate region proposals for the training data, and then each image is treated as a bag of proposals to conduct the training based on the multiple instance learning (MIL). Most methods focus on optimizing the training process, but rarely consider the influence of pre-generated proposals that directly affect the learning of the detector, due to the overwhelming noisy proposals (e.g., negative or background proposals) and positive proposals with inaccurate locations. In this paper, we focus on improving the quality of proposals, and propose a recurrent self-optimizing proposal framework, a new paradigm for WSOD, to iteratively optimize the pre-generated proposals. In each iteration, all detection results (i.e., the object-aware coordinate offsets and the confidence scores) are accumulated for proposal optimization. To achieve accurate object location, we design a proposal self-transformation module to transform the locations of pre-generated proposals based on the coordinate offsets. To alleviate the impact of noise proposals, we design a proposal self-sampling module to mine object instances through confidence scores to filter out noisy proposals. Furthermore, these optimized proposals are fed into a decoupled proposal learner, which contains two parallel proposal training branches. A MIL module and an instance refinement module are supervised by the image label and the mined object instances, respectively. In addition, the instance refinement module contains an instance regression refinement module, which is proposed to generate object-aware coordinate offsets. In turn, the decoupled proposal learner produces the new detection results to optimize proposals in the next iteration. Extensive experiments on PASCAL VOC and MS-COCO datasets demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://github.com/pytorch/pytorch.

References

  1. Fayyaz M, Yasmin M, Sharif M, Shah JH, Raza M, Iqbal T (2020) Person re-identification with features-based clustering and deep features. Neural Comput Appl 32(14):10519–10540

    Article  Google Scholar 

  2. Ben Slima I, Ammar S, Ghorbel M (2021) Possibilistic rank-level fusion method for person re-identification. Neural Comput Appl 34(17):14151–14168

    Article  Google Scholar 

  3. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: A benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124

  4. Zheng L, Yang Y, Hauptmann A.G (2016) Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984

  5. Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC (2021) Deep learning for person re-identification: A survey and outlook. IEEE Trans. Pattern Anal. Mach, Intell

  6. Sun H, Zhang Y, Chen P, Dan Z, Sun S, Wan J, Li W (2021) Scale-free heterogeneous cyclegan for defogging from a single image for autonomous driving in fog. Neural Computing and Applications, pp 1–15

  7. Abbas W, Khan M.F, Taj M, Mahmood A (2021) Statistically correlated multi-task learning for autonomous driving. Neural Computing and Applications, pp 1–18

  8. Levinson J, Askeland J, Becker J, Dolson J, Held D, Kammel S, Kolter J.Z, Langer D, Pink O, Pratt V et al. (2011) Towards fully autonomous driving: Systems and algorithms. In: IEEE Intelligent Vehicles Symposium, pp 163–168 . IEEE

  9. Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B et al. (2020)Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE International Conference on Computer Vision. Pattern Recognit., pp 2446–2454

  10. Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot. 37(3):362–386

    Article  Google Scholar 

  11. Liu S, Liu X, Wang S, Muhammad K (2021) Fuzzy-aided solution for out-of-view challenge in visual tracking under IoT-assisted complex environment. Neural Comput Appl 33:1055–1065

    Article  Google Scholar 

  12. Xu L, Gao M, Liu Z, Li Q, Jeon G (2022) Accelerated duality-aware correlation filters for visual tracking. Neural Computing and Applications, pp 1–16

  13. Smeulders AW, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2013) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468

    Google Scholar 

  14. Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338

    Article  Google Scholar 

  15. Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. Pattern Recognit., pp 6668–6677

  16. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE International Conference on Computer Vision. Pattern Recognit., pp. 770–778

  18. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  19. Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  20. Lin T.-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Proceedings of Europe Conference on Computer Vision., pp. 740–755

  21. Girshick R, Faster RCNN (2015) In: Proceedings of the IEEE International Conference on Computer Vision., pp 1440–1448

  22. Ren S, He K, Girshick R, Sun J, (2015) Faster rcnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, pp 91–99

  23. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C.-Y, Berg A.C (2016) Ssd: Single shot multibox detector. In: Proceedings of Europe Conference on Computer Vision, pp 21–37

  24. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition., pp 779–788

  25. Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 6154–6162

  26. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE Conference on Computer Vision, pp 2961–2969

  27. Cao J, Pang Y, Zhao S, Li X (2019) High-level semantic networks for multi-scale object detection. IEEE Trans. Circuits Sys Video Technol 30(10):3372–3386

    Article  Google Scholar 

  28. Leng J, Liu Y (2019) An enhanced ssd with feature fusion and visual reasoning for object detection. Neural Comput Appl 31(10):6549–6558

    Article  Google Scholar 

  29. Qiu H, Li H, Wu Q, Shi H (2020) Offset bin classification network for accurate object detection. In: Proceedings of Conference on Computer Vision., pp 13188–13197

  30. Zhang S, Wen L, Lei Z, Li SZ (2020) Refinedet++: Single-shot refinement neural network for object detection. IEEE Trans Circuits Sys Video Technol 31(2):674–687

    Article  Google Scholar 

  31. Qiu H, Li H, Wu Q, Meng F, Xu L, Ngan KN, Shi H (2020) Hierarchical context features embedding for object detection. IEEE Trans, Multimedia

    Book  Google Scholar 

  32. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of Europe Conference on Computer Vision, pp 213–229

  33. Zhou W, Guo Q, Lei J, Yu L, Hwang J-N (2021) Ecffnet: effective and consistent feature fusion network for rgb-t salient object detection. IEEE Trans. Circuits Sys, Video Technol

  34. Roy A.M, Bose R, Bhaduri J(2022) A fast accurate fine-grain object detection model based on yolov4 deep neural network. Neural Computing and Applications, pp 1–27

  35. Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 2846–2854

  36. Kantorov V, Oquab M, Cho M, Laptev I (2016) Contextlocnet: Context-aware deep network models for weakly supervised localization. In: Proceedings of Europe the Conference on Computer Vision. pp 350–365

  37. Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit, pp 2843–2851

  38. Diba A, Sharma V, Pazandeh A, Pirsiavash H, Van Gool L (2017) Weakly supervised cascaded convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 914–922

  39. Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille A (2018) Pcl: proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell 42(1):176–191

    Article  Google Scholar 

  40. Tang P, Wang X, Wang A, Yan Y, Liu W, Huang J, Yuille A (2018) Weakly supervised region proposal network and object detection. In: Proceedings of the IEEE Conference on Computer Vision. pp 352–368

  41. Wei Y, Shen Z, Cheng B, Shi H, Xiong J, Feng J, Huang T (2018) Ts2c: Tight box mining with surrounding segmentation context for weakly supervised object detection. In: Proceedings of Europe the Conference on Computer Vision, pp 434–450

  42. Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In: Proceedings of the IEEE Conference on Computer Vision. pp 8372–8381

  43. Shen Y, Ji R, Yang K, Deng C, Wang C (2019) Category-aware spatial constraint for weakly supervised detection. IEEE Trans Image Process 29:843–858

    Article  MathSciNet  MATH  Google Scholar 

  44. Chen Z, Fu Z, Jiang R, Chen Y, Hua X.-S (2020) Slv: Spatial likelihood voting for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 12995–13004

  45. Cheng G, Yang J, Gao D, Guo L, Han J (2020) High-quality proposals for weakly supervised object detection. IEEE Trans Image Process 29:5794–5804

    Article  MATH  Google Scholar 

  46. Lin C, Wang S, Xu D, Lu Y, Zhang W (2020) Object instance mining for weakly supervised object detection. In: Proceedings of the 34nd AAAI Conference on Artificial Intelligence

  47. Jin R, Lin G, Wen C (2021) Online active proposal set generation for weakly supervised object detection. arXiv preprint arXiv:2101.07929

  48. Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171

    Article  Google Scholar 

  49. Zitnick C.L, Dollár P (2014) Edge boxes: Locating object proposals from edges. In: Proceedings of the IEEE Conference on Computer Vision, pp 391–405

  50. Arbeláez P, Pont-Tuset J, Barron J.T, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 328–335

  51. Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71

    Article  MATH  Google Scholar 

  52. Zhang M, Liu S, Zeng B (2021) Hierarchical region proposal refinement network for weakly supervised object detection. In: Proceedings of the IEEE Conference Image Process., pp 669–673. IEEE

  53. Li X, Kan M, Shan S, Chen X (2019) Weakly supervised object detection with segmentation collaboration. In: Proceedings of the IEEE Conference on Computer Vision, pp 9735–9744

  54. Zeng Z, Liu B, Fu J, Chao H, Zhang L (2019) Wsod2: Learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision, pp. 8292–8300

  55. Wan F, Liu C, Ke W, Ji X, Jiao J, Ye Q (2019) C-mil: Continuation multiple instance learning for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognit., pp 2199–2208

  56. Shen Y, Ji R, Wang Y, Wu Y, Cao L (2019) Cyclic guidance for weakly supervised joint detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit. pp 697–707

  57. Singh K.K, Lee Y.J (2019) You reap what you sow: Using videos to generate high precision object proposals for weakly-supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognit., pp 9414–9422

  58. Pathak D, Girshick R, Dollár P, Darrell T, Hariharan B (2017) Learning features by watching objects move. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognit, pp 2701–2710

  59. Selvaraju R.R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE Conference on Computer Vision, pp 618–626

  60. Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the IEEE Conference on Computer Vision, pp 784–799

  61. Pan T, Wang B, Ding G, Han J, Yong J.-H (2019) Low shot box correction for weakly supervised object detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp 890–896

  62. Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. Int J Comput Vis 100(3):275–293

    Article  MathSciNet  Google Scholar 

  63. Deng J, Dong W, Socher R, Li L.-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 248–255

  64. Wu Y, Kirillov A, Massa F, Lo W.-Y, Girshick R (2019) Detectron2. https://github.com/facebookresearch/detectron2

  65. Wang J, Yao J, Zhang Y, Zhang R (2018) Collaborative learning for weakly supervised object detection. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence

  66. Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) W2f: A weakly-supervised to fully-supervised framework for object detection. In: Proceedings of the IEEE Conference on Computer Vision Recognit., pp 928–936

Download references

Acknowledgement

This work is supported in part by the National Natural Science Foundation of China under Grant 61720106004, and in part by the Overseas Expertise Introduction Project for Discipline Innovation (111 Projects) under Grant B17008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Zhang.

Ethics declarations

Conflict of interest

The author declares no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, M., Zeng, B. Recurrent self-optimizing proposals for weakly supervised object detection. Neural Comput & Applic 35, 757–771 (2023). https://doi.org/10.1007/s00521-022-07818-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07818-w

Keywords

Navigation