Abstract
Object detection and semantic segmentation are the basic tasks of computer vision. Recently, the combination of object detection and semantic segmentation has made great progress. With the box-level weakly supervised semantic segmentation(WSSS) method, we predict segmentation based on feature maps extracted from object detector. Existing methods require both box-level and pixel-level annotations to train the shared backbone network simultaneously to get the bounding boxes and segmentation. However, in the absence of pixel-level annotations and without changing the parameters of network framework, object detectors can’t predict semantic segmentation. We design a compact and plug-and-play object detection to semantic segmentation(O2S) module to enable object detectors to predict semantic masks, making full utilization of the training set and intermediate feature maps of object detection. We also propose a box-level weakly supervised probabilistic gap adaptive(PGA) method, which enables O2S to learn semantic masks from the training set of object detection. We evaluate the proposed approach on Pascal VOC 2007 and Pascal VOC 2012 and show its feasibility. With only 3.5 million parameters, the results of O2S trained with PGA are very close to the results of the whole networks trained with the WSSS methods. Our work has important implications for exploring the commonality of multiple visual tasks.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: Beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398
Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection, In: IEEE/CVF international conference on computer vision (ICCV) vol 2019, pp 9626–9635
Kai C, Pang J, Wang J, Yu X, Lin D (2019) Hybrid task cascade for instance segmentation, In: IEEE/CVF conference on computer vision and pattern recognition
Yla B, Gqa B, Msa B, Jq C, Jie Y, Zza B (2021) Semantic and detail collaborative learning network for salient object detection, Neurocomputing, 462(2)
Song C, Huang Y, Ouyang W, Wang L (2019) Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation, In: IEEE/CVF conference on computer vision and pattern recognition (CVPR) vol 2019, pp 3131–3140
Dai J, He K, Sun J (dec 2015) “Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” In: 2015 IEEE international conference on computer vision (ICCV). Los Alamitos, CA, USA: IEEE Computer Society, pp
Papandreou G, Chen L, Murphy KP, Yuille AL (2015) Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation, In: IEEE international conference on computer vision (ICCV) vol 2015, pp 1742–1750
Khoreva A, Benenson R, Hosang J, Hein M, Schiele B (2017) Simple does it: Weakly supervised instance and semantic segmentation, In: IEEE conference on computer vision and pattern recognition (CVPR) vol 2017, pp 1665–1674
Zhou X, Wang D, Krhenbühl P (2019) Objects as points
Law H, Deng J (2020) Cornernet: Detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: Point set representation for object detection, In: IEEE/CVF international conference on computer vision (ICCV) vol 2019, pp 9656–9665
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection, In: IEEE/CVF conference on computer vision and pattern recognition (CVPR) vol 2019, pp 840–849
Shen Y, Ji R, Wang Y, Wu Y, Cao L (2019) Cyclic guidance for weakly supervised joint detection and segmentation, In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Yu J, Yao J, Zhang J, Yu Z, Tao D (2021) Sprnet: Single-pixel reconstruction for one-stage instance segmentation. IEEE Trans Cybern 51(4):1731–1742
Lin TY, Maire M, Belongie S, Hays J, Zitnick CL (2014) Microsoft coco: Common objects in context, In: European conference on computer vision
Yu J, Tan M, Zhang H, Rui Y, Tao D (2022) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578
Krähenbühl P, Koltun V (Oct. 2012) Efficient inference in fully connected CRFs with Gaussian edge potentials, arXiv e-prints, p. arXiv:1210.5644
Kim G (2006) Pascal visual object classes challenge
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement, arXiv e-prints, p. arXiv:1804.02767,
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, In: CVPR
Girshick R (2015) Fast r-cnn, In: IEEE international conference on computer vision (ICCV) vol 2015, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection, arXiv e-prints, p. arXiv:2004.10934
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016. Springer International Publishing, Cham, pp 21–37
Shen Z, Zhuang L, Li J, Jiang YG, Xue X (2017) Dsod: Learning deeply supervised object detectors from scratch, In: 2017 IEEE international conference on computer vision (ICCV),
Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring, In: IEEE/CVF conference on computer vision and pattern recognition (CVPR) vol 2019, pp 2960–2969
Zhou X, Zhuo J, Krähenbühl P (2019) Bottom-up object detection by grouping extreme and center points, arXiv e-prints, p. arXiv:1901.08043,
Jonathan L, Evan S, Trevor D (2017) Fully convolutional networks for semantic segmentation, IEEE Trans Pattern Anal Mach Intell
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation, In IEEE/CVF conference on computer vision and pattern recognition vol 2018, pp 7151–7160
Kirillov A, Wu Y, He K, Girshick R (2019) Pointrend: Image segmentation as rendering
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation, In: European conference on computer vision
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions, In: ICLR,
Yang M, Yu K, Zhang C, Li Z, Yang K (2018) Denseaspp for semantic segmentation in street scenes, In: IEEE/CVF conference on computer vision and pattern recognition vol 2018, pp 3684–3692
Kolesnikov A, Lampert CH (2016) Seed, expand and constrain: Three principles for weakly-supervised image segmentation, In: European conference on computer vision
Fan J, Zhang Z, Song C, Tan T (2020) Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation, In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Araslanov N, Roth S (2020) Single-stage semantic segmentation from image labels, In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR),
Wang Y, Zhang J, Kan M, Shan S, Chen X (2020) Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation, In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Redondo-Cabrera C, Baptista-Ríos M, López-Sastre RJ (2019) Learning to exploit the prior network knowledge for weakly supervised semantic segmentation. IEEE Trans Image Process 28(7):3649–3661
Lee J, Kim E, Lee S, Lee J, Yoon S (2019) Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference, In: IEEE/CVF conference on computer vision and pattern recognition (CVPR) vol 2019, pp 5262–5271
Wei Y, Feng J, Liang X, Cheng M, Zhao Y, Yan S (2017) Object region mining with adversarial erasing: A simple classification to semantic segmentation approach, In: IEEE conference on computer vision and pattern recognition (CVPR) vol 2017, pp 6488–6496
Xu L, Xue H, Bennamoun M, Boussaid F, Sohel F (2021) Atrous convolutional feature network for weakly supervised semantic segmentation. Neurocomputing 421(1):115–126
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization, In: IEEE conference on computer vision and pattern recognition (CVPR) vol 2016, pp 2921–2929
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization, In: IEEE international conference on computer vision (ICCV) vol 2017, pp 618–626
Bearman A, Russakovsky O, Ferrari V, Fei-Fei L (2016) What’s the point: Semantic segmentation with point supervision, ECCV
Lin D, Dai J, Jia J, He K, Sun J (2016) Scribblesup: Scribble-supervised convolutional networks for semantic segmentation, In: IEEE conference on computer vision and pattern recognition (CVPR) vol 2016, pp 3159–3167
Tang M, Djelouah A, Perazzi F, Boykov Y, Schroers C (2018) Normalized cut loss for weakly-supervised cnn segmentation, In: IEEE/CVF conference on computer vision and pattern recognition vol 2018, pp 1818–1827
Vernaza P, Chandraker M (2017) Learning random-walk label propagation for weakly-supervised semantic segmentation, In: CVPR
Arbeláez P, Pont-Tuset J, Barron J, Marques F, Malik J (2014) Multiscale combinatorial grouping, In: IEEE conference on computer vision and pattern recognition vol 2014, pp 328–335
Rother C, Kolmogorov V, Blake A (2004) “grabcut”: Interactive foreground extraction using iterated graph cuts, In: ACM SIGGRAPH, (2004) Papers, ser. SIGGRAPH ’04. New York, NY, USA: association for computing machinery, pp 309–314
Ibrahim MS, Vahdat A, Ranjbar M, Macready WG (2020) Semi-supervised semantic image segmentation with self-correcting networks, In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Paszke A, Gross S, Massa F, Lerer A, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No.61972417, 61902431) and the Natural Science Foundation of Shandong Province (No. ZR2020MF005).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, S., Liu, Y., Zhang, Y. et al. Adaptive Generation of Weakly Supervised Semantic Segmentation for Object Detection. Neural Process Lett 55, 657–670 (2023). https://doi.org/10.1007/s11063-022-10902-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-10902-w