Abstract
Weakly supervised object detection (WSOD) has attracted attention increasingly in object detection, as it only requires image-level annotations to train the detector. A typical paradigm for WSOD is to first generate candidate region proposals for the training data, and then each image is treated as a bag of proposals to conduct the training based on the multiple instance learning (MIL). Most methods focus on optimizing the training process, but rarely consider the influence of pre-generated proposals that directly affect the learning of the detector, due to the overwhelming noisy proposals (e.g., negative or background proposals) and positive proposals with inaccurate locations. In this paper, we focus on improving the quality of proposals, and propose a recurrent self-optimizing proposal framework, a new paradigm for WSOD, to iteratively optimize the pre-generated proposals. In each iteration, all detection results (i.e., the object-aware coordinate offsets and the confidence scores) are accumulated for proposal optimization. To achieve accurate object location, we design a proposal self-transformation module to transform the locations of pre-generated proposals based on the coordinate offsets. To alleviate the impact of noise proposals, we design a proposal self-sampling module to mine object instances through confidence scores to filter out noisy proposals. Furthermore, these optimized proposals are fed into a decoupled proposal learner, which contains two parallel proposal training branches. A MIL module and an instance refinement module are supervised by the image label and the mined object instances, respectively. In addition, the instance refinement module contains an instance regression refinement module, which is proposed to generate object-aware coordinate offsets. In turn, the decoupled proposal learner produces the new detection results to optimize proposals in the next iteration. Extensive experiments on PASCAL VOC and MS-COCO datasets demonstrate the effectiveness of our method.
Similar content being viewed by others
References
Fayyaz M, Yasmin M, Sharif M, Shah JH, Raza M, Iqbal T (2020) Person re-identification with features-based clustering and deep features. Neural Comput Appl 32(14):10519–10540
Ben Slima I, Ammar S, Ghorbel M (2021) Possibilistic rank-level fusion method for person re-identification. Neural Comput Appl 34(17):14151–14168
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: A benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124
Zheng L, Yang Y, Hauptmann A.G (2016) Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC (2021) Deep learning for person re-identification: A survey and outlook. IEEE Trans. Pattern Anal. Mach, Intell
Sun H, Zhang Y, Chen P, Dan Z, Sun S, Wan J, Li W (2021) Scale-free heterogeneous cyclegan for defogging from a single image for autonomous driving in fog. Neural Computing and Applications, pp 1–15
Abbas W, Khan M.F, Taj M, Mahmood A (2021) Statistically correlated multi-task learning for autonomous driving. Neural Computing and Applications, pp 1–18
Levinson J, Askeland J, Becker J, Dolson J, Held D, Kammel S, Kolter J.Z, Langer D, Pink O, Pratt V et al. (2011) Towards fully autonomous driving: Systems and algorithms. In: IEEE Intelligent Vehicles Symposium, pp 163–168 . IEEE
Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B et al. (2020)Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE International Conference on Computer Vision. Pattern Recognit., pp 2446–2454
Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot. 37(3):362–386
Liu S, Liu X, Wang S, Muhammad K (2021) Fuzzy-aided solution for out-of-view challenge in visual tracking under IoT-assisted complex environment. Neural Comput Appl 33:1055–1065
Xu L, Gao M, Liu Z, Li Q, Jeon G (2022) Accelerated duality-aware correlation filters for visual tracking. Neural Computing and Applications, pp 1–16
Smeulders AW, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2013) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468
Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision. Pattern Recognit., pp 6668–6677
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE International Conference on Computer Vision. Pattern Recognit., pp. 770–778
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Lin T.-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Proceedings of Europe Conference on Computer Vision., pp. 740–755
Girshick R, Faster RCNN (2015) In: Proceedings of the IEEE International Conference on Computer Vision., pp 1440–1448
Ren S, He K, Girshick R, Sun J, (2015) Faster rcnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, pp 91–99
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C.-Y, Berg A.C (2016) Ssd: Single shot multibox detector. In: Proceedings of Europe Conference on Computer Vision, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition., pp 779–788
Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 6154–6162
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE Conference on Computer Vision, pp 2961–2969
Cao J, Pang Y, Zhao S, Li X (2019) High-level semantic networks for multi-scale object detection. IEEE Trans. Circuits Sys Video Technol 30(10):3372–3386
Leng J, Liu Y (2019) An enhanced ssd with feature fusion and visual reasoning for object detection. Neural Comput Appl 31(10):6549–6558
Qiu H, Li H, Wu Q, Shi H (2020) Offset bin classification network for accurate object detection. In: Proceedings of Conference on Computer Vision., pp 13188–13197
Zhang S, Wen L, Lei Z, Li SZ (2020) Refinedet++: Single-shot refinement neural network for object detection. IEEE Trans Circuits Sys Video Technol 31(2):674–687
Qiu H, Li H, Wu Q, Meng F, Xu L, Ngan KN, Shi H (2020) Hierarchical context features embedding for object detection. IEEE Trans, Multimedia
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of Europe Conference on Computer Vision, pp 213–229
Zhou W, Guo Q, Lei J, Yu L, Hwang J-N (2021) Ecffnet: effective and consistent feature fusion network for rgb-t salient object detection. IEEE Trans. Circuits Sys, Video Technol
Roy A.M, Bose R, Bhaduri J(2022) A fast accurate fine-grain object detection model based on yolov4 deep neural network. Neural Computing and Applications, pp 1–27
Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 2846–2854
Kantorov V, Oquab M, Cho M, Laptev I (2016) Contextlocnet: Context-aware deep network models for weakly supervised localization. In: Proceedings of Europe the Conference on Computer Vision. pp 350–365
Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit, pp 2843–2851
Diba A, Sharma V, Pazandeh A, Pirsiavash H, Van Gool L (2017) Weakly supervised cascaded convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 914–922
Tang P, Wang X, Bai S, Shen W, Bai X, Liu W, Yuille A (2018) Pcl: proposal cluster learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell 42(1):176–191
Tang P, Wang X, Wang A, Yan Y, Liu W, Huang J, Yuille A (2018) Weakly supervised region proposal network and object detection. In: Proceedings of the IEEE Conference on Computer Vision. pp 352–368
Wei Y, Shen Z, Cheng B, Shi H, Xiong J, Feng J, Huang T (2018) Ts2c: Tight box mining with surrounding segmentation context for weakly supervised object detection. In: Proceedings of Europe the Conference on Computer Vision, pp 434–450
Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In: Proceedings of the IEEE Conference on Computer Vision. pp 8372–8381
Shen Y, Ji R, Yang K, Deng C, Wang C (2019) Category-aware spatial constraint for weakly supervised detection. IEEE Trans Image Process 29:843–858
Chen Z, Fu Z, Jiang R, Chen Y, Hua X.-S (2020) Slv: Spatial likelihood voting for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 12995–13004
Cheng G, Yang J, Gao D, Guo L, Han J (2020) High-quality proposals for weakly supervised object detection. IEEE Trans Image Process 29:5794–5804
Lin C, Wang S, Xu D, Lu Y, Zhang W (2020) Object instance mining for weakly supervised object detection. In: Proceedings of the 34nd AAAI Conference on Artificial Intelligence
Jin R, Lin G, Wen C (2021) Online active proposal set generation for weakly supervised object detection. arXiv preprint arXiv:2101.07929
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Zitnick C.L, Dollár P (2014) Edge boxes: Locating object proposals from edges. In: Proceedings of the IEEE Conference on Computer Vision, pp 391–405
Arbeláez P, Pont-Tuset J, Barron J.T, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 328–335
Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71
Zhang M, Liu S, Zeng B (2021) Hierarchical region proposal refinement network for weakly supervised object detection. In: Proceedings of the IEEE Conference Image Process., pp 669–673. IEEE
Li X, Kan M, Shan S, Chen X (2019) Weakly supervised object detection with segmentation collaboration. In: Proceedings of the IEEE Conference on Computer Vision, pp 9735–9744
Zeng Z, Liu B, Fu J, Chao H, Zhang L (2019) Wsod2: Learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision, pp. 8292–8300
Wan F, Liu C, Ke W, Ji X, Jiao J, Ye Q (2019) C-mil: Continuation multiple instance learning for weakly supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognit., pp 2199–2208
Shen Y, Ji R, Wang Y, Wu Y, Cao L (2019) Cyclic guidance for weakly supervised joint detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit. pp 697–707
Singh K.K, Lee Y.J (2019) You reap what you sow: Using videos to generate high precision object proposals for weakly-supervised object detection. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognit., pp 9414–9422
Pathak D, Girshick R, Dollár P, Darrell T, Hariharan B (2017) Learning features by watching objects move. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognit, pp 2701–2710
Selvaraju R.R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE Conference on Computer Vision, pp 618–626
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the IEEE Conference on Computer Vision, pp 784–799
Pan T, Wang B, Ding G, Han J, Yong J.-H (2019) Low shot box correction for weakly supervised object detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp 890–896
Deselaers T, Alexe B, Ferrari V (2012) Weakly supervised localization and learning with generic knowledge. Int J Comput Vis 100(3):275–293
Deng J, Dong W, Socher R, Li L.-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision. Pattern Recognit., pp 248–255
Wu Y, Kirillov A, Massa F, Lo W.-Y, Girshick R (2019) Detectron2. https://github.com/facebookresearch/detectron2
Wang J, Yao J, Zhang Y, Zhang R (2018) Collaborative learning for weakly supervised object detection. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) W2f: A weakly-supervised to fully-supervised framework for object detection. In: Proceedings of the IEEE Conference on Computer Vision Recognit., pp 928–936
Acknowledgement
This work is supported in part by the National Natural Science Foundation of China under Grant 61720106004, and in part by the Overseas Expertise Introduction Project for Discipline Innovation (111 Projects) under Grant B17008.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, M., Zeng, B. Recurrent self-optimizing proposals for weakly supervised object detection. Neural Comput & Applic 35, 757–771 (2023). https://doi.org/10.1007/s00521-022-07818-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07818-w