Abstract
Weakly supervised semantic segmentation under image-level label supervision has undergone impressive improvements over the past years. These approaches can significantly reduce human annotation efforts, although they remain inferior to fully supervised procedures. In this paper, we propose a novel framework that iteratively refines pixel-level annotations and optimizes segmentation network. We first produce initial deep cues using the combination of activation maps and a saliency map. To produce high-quality pixel-level annotations, a graphical model is constructed over optimal segmentation of high-quality region hierarchies to propagate information from deep cues to unmarked regions. In the training process, the initial pixel-level annotations are used as supervision to train the segmentation network and to predict segmentation masks. To correct inaccurate labels of segmentation masks, we use these segmentation masks with the graphical model to produce accurate pixel-level annotations and use them as supervision to retrain the segmentation network. Experimental results show that the proposed method can significantly outperform the weakly-supervised semantic segmentation methods using static labels. The proposed method has state-of-the-art performance, which are \(66.7\%\) mIoU score on PASCAL VOC 2012 test set and \(27.0\%\) mIoU score on MS COCO validation set.
Similar content being viewed by others
References
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282
Al-Huda Z, Peng B, Yang Y, Ahmed M (2019) Object scale selection of hierarchical image segmentation using reliable regions. In: 2019 IEEE 14th international conference on intelligent systems and knowledge engineering (ISKE). IEEE, pp 1081–1088
Alghodhaifi H, Alghodhaifi A, Alghodhaifi M (2019) Predicting invasive ductal carcinoma in breast histology images using convolutional neural network. In: 2019 IEEE national aerospace and electronics conference (NAECON). IEEE, pp. 374–378
Arbelaez P, Maire M, Fowlkes C, Malik J (2010) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916
Arslan Chaudhry PKD, Torr P (2017) Discovering class-specific pixels for weakly-supervised semantic segmentation. In: Proceedings of the British machine vision conference (BMVC). BMVA Press, pp 20.1–20.13. https://doi.org/10.5244/C.31.20
Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239
Cai Q, Liu H, Zhou S, Sun J, Li J (2018) An adaptive-scale active contour model for inhomogeneous image segmentation and bias field estimation. Pattern Recogn 82:79–93
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp. 801–818
Chen Y, Dai D, Pont-Tuset J, Van Gool L (2016) Scale-aware alignment of hierarchical image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 364–372
Dai J, He K, Sun J (2015) Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp. 1635–1643
Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3150–3158
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Fan R, Hou Q, Cheng MM, Yu G, Martin RR, Hu SM (2018) Associating inter-image salient instances for weakly supervised semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp. 367–383
Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vision 59(2):167–181
Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: 2011 international conference on computer vision, pp. 991–998. IEEE
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi-supervised semantic segmentation. In: Advances in neural information processing systems, pp. 1495–1503
Huang Z, Wang X, Wang J, Liu W, Wang J (2018) Weakly-supervised semantic segmentation network with deep seeded region growing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7014–7023
Isola P, Zoran D, Krishnan D, Adelson EH (2014) Crisp boundary detection using pointwise mutual information. In: European conference on computer vision. Springer, pp. 799–814
Ji Y, Zhang H, Jie Z, Ma L, Wu QJ (2020) Casnet: A cross-attention siamese network for video salient object detection. IEEE transactions on neural networks and learning systems
Jiang F, Grigorev A, Rho S, Tian Z, Fu Y, Jifara W, Adil K, Liu S (2018) Medical image semantic segmentation based on deep learning. Neural Comput Appl 29(5):1257–1265
Kim D, Cho D, Yoo D, So Kweon I (2017) Two-phase learning for weakly supervised object localization. In: Proceedings of the IEEE international conference on computer vision, pp. 3534–3543
Kohli P, Torr PH et al (2009) Robust higher order potentials for enforcing label consistency. Int J Comput Vision 82(3):302–324
Kolesnikov A, Lampert CH (2016) Seed, expand and constrain: three principles for weakly-supervised image segmentation. In: European conference on computer vision. Springer, pp. 695–711
Kompella A, Kulkarni RV (2020) Weakly supervised multi-scale recurrent convolutional neural network for co-saliency detection and co-segmentation. Neural Comput Appl 32:16571–16588. https://doi.org/10.1007/s00521-019-04265-y
Krähenbühl P, Koltun V (2011) Efficient inference in fully connected crfs with gaussian edge potentials. In: Advances in neural information processing systems, pp 109–117
Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis 43(1):29–44
Levin A, Lischinski D, Weiss Y (2007) A closed-form solution to natural image matting. IEEE Trans Pattern Anal Mach Intell 30(2):228–242
Li K, Tao W, Liu X, Liu L (2018) Iterative image segmentation with feature driven heuristic four-color labeling. Pattern Recogn 76:69–79
Li Y, Liu Y, Liu G, Guo M (2020) Weakly supervised semantic segmentation by iterative superpixel-CRF refinement with initial clues guiding. Neurocomputing 391:25–41. https://doi.org/10.1016/j.neucom.2020.01.054
Li Y, Liu Y, Liu G, Zhai D, Guo M (2018) Weakly supervised semantic segmentation based on EM algorithm with localization clues. Neurocomputing 275:2574–2587
Li Y, Tax DM, Loog M (2012) Scale selection for supervised image segmentation. Image Vis Comput 30(12):991–1003
Lin D, Dai J, Jia J, He K, Sun J (2016) Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3159–3167
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp. 740–755
Liu N, Han J (2016) Dhsnet: Deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 678–686
Maninis KK, Pont-Tuset J, Arbeláez P, Van Gool L (2017) Convolutional oriented boundaries: From image segmentation to high-level tasks. IEEE Trans Pattern Anal Mach Intell 40(4):819–833
Martin DR, Malik J, Patterson D (2003) An empirical approach to grouping and segmentaqtion computer science division. University of California, Berkeley
Meraj T, Hassan A, Zahoor S, Rauf HT, Lali M, Ali L, Bukhari SAC (2019) Lungs nodule detection using semantic segmentation and classification with optimal features. Neural Computing and Applications
Oh SJ, Benenson R, Khoreva A, Akata Z, Fritz M, Schiele B (2017) Exploiting saliency for object segmentation from image level labels. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 5038–5047
Papandreou G, Chen LC, Murphy KP, Yuille AL (2015) Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1742–1750
Pathak D, Krahenbuhl P, Darrell T (2015) Constrained convolutional neural networks for weakly supervised segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1796–1804
Pathak D, Shelhamer E, Long J, Darrell T (2014) Fully convolutional multi-class multiple instance learning. arXiv preprint arXiv:1412.7144
Peng B, Al-Huda Z, Xie Z, Wu X (2020) Multi-scale region composition of hierarchical image segmentation. Multimed Tools Appl 79:32833–32855. https://doi.org/10.1007/s11042-020-09346-y
Pinheiro PO, Collobert R (2015) From image-level to pixel-level labeling with convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1713–1721
Pont-Tuset J, Arbelaez P, Barron JT, Marques F, Malik J (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans Pattern Anal Mach Intell 39(1):128–140
Qi X, Liu Z, Shi J, Zhao H, Jia J (2016) Augmented feedback in semantic segmentation under image level supervision. In: European conference on computer vision. Springer, pp 90–105
Qin X, Zhang Z, Huang C, Gao C, Dehghan M, Jagersand M (2019) Basnet: Boundary-aware salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7479–7489
Redondo-Cabrera C, Baptista-Ríos M, López-Sastre RJ (2019) Learning to exploit the prior network knowledge for weakly supervised semantic segmentation. IEEE Trans Image Process 28(7):3649–3661
Rother C, Kolmogorov V, Blake A (2004) “grabcut” interactive foreground extraction using iterated graph cuts. ACM Trans Gr (TOG) 23(3):309–314
Roy A, Todorovic S (2017) Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3529–3538
Saleh F, Aliakbarian MS, Salzmann M, Petersson L, Gould S, Alvarez JM (2016) Built-in foreground/background prior for weakly-supervised semantic segmentation. In: European conference on computer vision. Springer, pp 413–432
Saleh FS, Aliakbarian MS, Salzmann M, Petersson L, Alvarez JM, Gould S (2017) Incorporating network built-in priors in weakly-supervised semantic segmentation. IEEE Trans Pattern Anal Mach Intell 40(6):1382–1396
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. 2017 IEEE international conference on computer vision (ICCV), Venice, pp 618–626. https://doi.org/10.1109/ICCV.2017.74
Shimoda W, Yanai K (2016) Distinct class-specific saliency maps for weakly supervised semantic segmentation. In: European conference on computer vision. Springer, pp 218–234
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations
Sun F, Li W (2019) Saliency guided deep network for weakly-supervised image segmentation. Pattern Recogn Lett 120:62–68
Syu JH, Wang SJ, Wang LC (2017) Hierarchical image segmentation based on iterative contraction and merging. IEEE Trans Image Process 26(5):2246–2260
Wang X, You S, Li X, Ma H (2018) Weakly-supervised semantic segmentation by iteratively mining common object features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1354–1362
Wei Y, Feng J, Liang X, Cheng MM, Zhao Y, Yan S (2017) Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1568–1576
Wei Y, Liang X, Chen Y, Jie Z, Xiao Y, Zhao Y, Yan S (2016) Learning to segment with image-level annotations. Pattern Recogn 59:234–244
Wei Y, Liang X, Chen Y, Shen X, Cheng MM, Feng J, Zhao Y, Yan S (2016) STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(11):2314–2320
Zhang L, Sheng Z, Li Y, Sun Q, Zhao Y, Feng D (2020) Image object detection and semantic segmentation based on convolutional neural network. Neural Comput Appl 32:1949–1958. https://doi.org/10.1007/s00521-019-04491-4
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Acknowledgements
This work was supported by the National Science Foundation of China (Nos. 61772435, 61976247).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Al-Huda, Z., Peng, B., Yang, Y. et al. Weakly supervised semantic segmentation by iteratively refining optimal segmentation with deep cues guidance. Neural Comput & Applic 33, 9035–9060 (2021). https://doi.org/10.1007/s00521-020-05669-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05669-x