Recurrent self-optimizing proposals for weakly supervised object detection

Zhang, Ming; Zeng, Bing

doi:10.1007/s00521-022-07818-w

Recurrent self-optimizing proposals for weakly supervised object detection

Original Article
Published: 21 September 2022

Volume 35, pages 757–771, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

262 Accesses
1 Altmetric
Explore all metrics

Abstract

Weakly supervised object detection (WSOD) has attracted attention increasingly in object detection, as it only requires image-level annotations to train the detector. A typical paradigm for WSOD is to first generate candidate region proposals for the training data, and then each image is treated as a bag of proposals to conduct the training based on the multiple instance learning (MIL). Most methods focus on optimizing the training process, but rarely consider the influence of pre-generated proposals that directly affect the learning of the detector, due to the overwhelming noisy proposals (e.g., negative or background proposals) and positive proposals with inaccurate locations. In this paper, we focus on improving the quality of proposals, and propose a recurrent self-optimizing proposal framework, a new paradigm for WSOD, to iteratively optimize the pre-generated proposals. In each iteration, all detection results (i.e., the object-aware coordinate offsets and the confidence scores) are accumulated for proposal optimization. To achieve accurate object location, we design a proposal self-transformation module to transform the locations of pre-generated proposals based on the coordinate offsets. To alleviate the impact of noise proposals, we design a proposal self-sampling module to mine object instances through confidence scores to filter out noisy proposals. Furthermore, these optimized proposals are fed into a decoupled proposal learner, which contains two parallel proposal training branches. A MIL module and an instance refinement module are supervised by the image label and the mined object instances, respectively. In addition, the instance refinement module contains an instance regression refinement module, which is proposed to generate object-aware coordinate offsets. In turn, the decoupled proposal learner produces the new detection results to optimize proposals in the next iteration. Extensive experiments on PASCAL VOC and MS-COCO datasets demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution

Adaptively Denoising Proposal Collection for Weakly Supervised Object Localization

Article 08 October 2019

End-to-End Object-Level Contrastive Pretraining for Detection via Semantic-Aware Localization

Notes

https://github.com/pytorch/pytorch.

References

Fayyaz M, Yasmin M, Sharif M, Shah JH, Raza M, Iqbal T (2020) Person re-identification with features-based clustering and deep features. Neural Comput Appl 32(14):10519–10540
Article Google Scholar
Ben Slima I, Ammar S, Ghorbel M (2021) Possibilistic rank-level fusion method for person re-identification. Neural Comput Appl 34(17):14151–14168
Article Google Scholar
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: A benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124
Zheng L, Yang Y, Hauptmann A.G (2016) Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC (2021) Deep learning for person re-identification: A survey and outlook. IEEE Trans. Pattern Anal. Mach, Intell
Sun H, Zhang Y, Chen P, Dan Z, Sun S, Wan J, Li W (2021) Scale-free heterogeneous cyclegan for defogging from a single image for autonomous driving in fog. Neural Computing and Applications, pp 1–15
Abbas W, Khan M.F, Taj M, Mahmood A (2021) Statistically correlated multi-task learning for autonomous driving. Neural Computing and Applications, pp 1–18
Levinson J, Askeland J, Becker J, Dolson J, Held D, Kammel S, Kolter J.Z, Langer D, Pink O, Pratt V et al. (2011) Towards fully autonomous driving: Systems and algorithms. In: IEEE Intelligent Vehicles Symposium, pp 163–168 . IEEE
Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B et al. (2020)Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE International Conference on Computer Vision. Pattern Recognit., pp 2446–2454
Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot. 37(3):362–386
Article Google Scholar
Liu S, Liu X, Wang S, Muhammad K (2021) Fuzzy-aided solution for out-of-view challenge in visual tracking under IoT-assisted complex environment. Neural Comput Appl 33:1055–1065
Article Google Scholar
Xu L, Gao M, Liu Z, Li Q, Jeon G (2022) Accelerated duality-aware correlation filters for visual tracking. Neural Computing and Applications, pp 1–16
Smeulders AW, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2013) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468
Google Scholar
Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338
Article Google Scholar
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. Pattern Recognit., pp 6668–6677
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE International Conference on Computer Vision. Pattern Recognit., pp. 770–778
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Article Google Scholar
Lin T.-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Proceedings of Europe Conference on Computer Vision., pp. 740–755
Girshick R, Faster RCNN (2015) In: Proceedings of the IEEE International Conference on Computer Vision., pp 1440–1448
Ren S, He K, Girshick R, Sun J, (2015) Faster rcnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, pp 91–99
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C.-Y, Berg A.C (2016) Ssd: Single shot multibox detector. In: Proceedings of Europe Conference on Computer Vision, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition., pp 779–788
Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 6154–6162
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE Conference on Computer Vision, pp 2961–2969
Cao J, Pang Y, Zhao S, Li X (2019) High-level semantic networks for multi-scale object detection. IEEE Trans. Circuits Sys Video Technol 30(10):3372–3386
Article Google Scholar
Leng J, Liu Y (2019) An enhanced ssd with feature fusion and visual reasoning for object detection. Neural Comput Appl 31(10):6549–6558
Article Google Scholar
Qiu H, Li H, Wu Q, Shi H (2020) Offset bin classification network for accurate object detection. In: Proceedings of Conference on Computer Vision., pp 13188–13197
Zhang S, Wen L, Lei Z, Li SZ (2020) Refinedet++: Single-shot refinement neural network for object detection. IEEE Trans Circuits Sys Video Technol 31(2):674–687
Article Google Scholar
Qiu H, Li H, Wu Q, Meng F, Xu L, Ngan KN, Shi H (2020) Hierarchical context features embedding for object detection. IEEE Trans, Multimedia
Book Google Scholar
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of Europe Conference on Computer Vision, pp 213–229
Zhou W, Guo Q, Lei J, Yu L, Hwang J-N (2021) Ecffnet: effective and consistent feature fusion network for rgb-t salient object detection. IEEE Trans. Circuits Sys, Video Technol
Roy A.M, Bose R, Bhaduri J(2022) A fast accurate fine-grain object detection model based on yolov4 deep neural network. Neural Computing and Applications, pp 1–27
Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 2846–2854
Kantorov V, Oquab M, Cho M, Laptev I (2016) Contextlocnet: Context-aware deep network models for weakly supervised localization. In: Proceedings of Europe the Conference on Computer Vision. pp 350–365
Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit, pp 2843–2851
Diba A, Sharma V, Pazandeh A, Pirsiavash H, Van Gool L (2017) Weakly supervised cascaded convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 914–922
Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille A (2018) Pcl: proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell 42(1):176–191
Article Google Scholar
Tang P, Wang X, Wang A, Yan Y, Liu W, Huang J, Yuille A (2018) Weakly supervised region proposal network and object detection. In: Proceedings of the IEEE Conference on Computer Vision. pp 352–368
Wei Y, Shen Z, Cheng B, Shi H, Xiong J, Feng J, Huang T (2018) Ts2c: Tight box mining with surrounding segmentation context for weakly supervised object detection. In: Proceedings of Europe the Conference on Computer Vision, pp 434–450
Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In: Proceedings of the IEEE Conference on Computer Vision. pp 8372–8381
Shen Y, Ji R, Yang K, Deng C, Wang C (2019) Category-aware spatial constraint for weakly supervised detection. IEEE Trans Image Process 29:843–858
Article MathSciNet MATH Google Scholar
Chen Z, Fu Z, Jiang R, Chen Y, Hua X.-S (2020) Slv: Spatial likelihood voting for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 12995–13004
Cheng G, Yang J, Gao D, Guo L, Han J (2020) High-quality proposals for weakly supervised object detection. IEEE Trans Image Process 29:5794–5804
Article MATH Google Scholar
Lin C, Wang S, Xu D, Lu Y, Zhang W (2020) Object instance mining for weakly supervised object detection. In: Proceedings of the 34nd AAAI Conference on Artificial Intelligence
Jin R, Lin G, Wen C (2021) Online active proposal set generation for weakly supervised object detection. arXiv preprint arXiv:2101.07929
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
Zitnick C.L, Dollár P (2014) Edge boxes: Locating object proposals from edges. In: Proceedings of the IEEE Conference on Computer Vision, pp 391–405
Arbeláez P, Pont-Tuset J, Barron J.T, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 328–335
Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71
Article MATH Google Scholar
Zhang M, Liu S, Zeng B (2021) Hierarchical region proposal refinement network for weakly supervised object detection. In: Proceedings of the IEEE Conference Image Process., pp 669–673. IEEE
Li X, Kan M, Shan S, Chen X (2019) Weakly supervised object detection with segmentation collaboration. In: Proceedings of the IEEE Conference on Computer Vision, pp 9735–9744
Zeng Z, Liu B, Fu J, Chao H, Zhang L (2019) Wsod2: Learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision, pp. 8292–8300
Wan F, Liu C, Ke W, Ji X, Jiao J, Ye Q (2019) C-mil: Continuation multiple instance learning for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognit., pp 2199–2208
Shen Y, Ji R, Wang Y, Wu Y, Cao L (2019) Cyclic guidance for weakly supervised joint detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit. pp 697–707
Singh K.K, Lee Y.J (2019) You reap what you sow: Using videos to generate high precision object proposals for weakly-supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognit., pp 9414–9422
Pathak D, Girshick R, Dollár P, Darrell T, Hariharan B (2017) Learning features by watching objects move. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognit, pp 2701–2710
Selvaraju R.R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE Conference on Computer Vision, pp 618–626
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the IEEE Conference on Computer Vision, pp 784–799
Pan T, Wang B, Ding G, Han J, Yong J.-H (2019) Low shot box correction for weakly supervised object detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp 890–896
Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. Int J Comput Vis 100(3):275–293
Article MathSciNet Google Scholar
Deng J, Dong W, Socher R, Li L.-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 248–255
Wu Y, Kirillov A, Massa F, Lo W.-Y, Girshick R (2019) Detectron2. https://github.com/facebookresearch/detectron2
Wang J, Yao J, Zhang Y, Zhang R (2018) Collaborative learning for weakly supervised object detection. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) W2f: A weakly-supervised to fully-supervised framework for object detection. In: Proceedings of the IEEE Conference on Computer Vision Recognit., pp 928–936

Download references

Acknowledgement

This work is supported in part by the National Natural Science Foundation of China under Grant 61720106004, and in part by the Overseas Expertise Introduction Project for Discipline Innovation (111 Projects) under Grant B17008.

Author information

Authors and Affiliations

School of Information and Communication Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Avenue, West Hi-tech Zone, Chengdu, 610054, Sichuan, China
Ming Zhang & Bing Zeng

Authors

Ming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bing Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Zhang.

Ethics declarations

Conflict of interest

The author declares no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, M., Zeng, B. Recurrent self-optimizing proposals for weakly supervised object detection. Neural Comput & Applic 35, 757–771 (2023). https://doi.org/10.1007/s00521-022-07818-w

Download citation

Received: 22 January 2022
Accepted: 06 September 2022
Published: 21 September 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s00521-022-07818-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recurrent self-optimizing proposals for weakly supervised object detection

Abstract

Access this article

Similar content being viewed by others

End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution

Adaptively Denoising Proposal Collection for Weakly Supervised Object Localization

End-to-End Object-Level Contrastive Pretraining for Detection via Semantic-Aware Localization

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recurrent self-optimizing proposals for weakly supervised object detection

Abstract

Access this article

Similar content being viewed by others

End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution

Adaptively Denoising Proposal Collection for Weakly Supervised Object Localization

End-to-End Object-Level Contrastive Pretraining for Detection via Semantic-Aware Localization

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation