Abstract
The major obstacle in semantic segmentation is that it requires a large number of pixel-level labeled data to train an effective model. In order to reduce the cost of annotation, weakly-supervised methods use weaker labels to overcome the need for per-pixel labels, while zero-shot methods transfer the knowledge learned from seen classes to unseen classes to reduce the number of classes that need to be labeled. To further alleviate the burden of annotation, we introduce a more challenging task of Weakly-supervised Zero-shot Semantic Segmentation (WZSS): learning models which only utilize image-level annotation of seen classes to segment images containing unseen objects. To this end, we propose a Dual Semantic-Guided (DSG) model which is double guided by semantic embeddings of classes to obtain classification scores and localization maps. By ignoring the localization maps with low classification scores, our proposed framework can generate prediction segmentation masks. To improve our model’s performance, we propose a simple stochastic selection on semantic embeddings during inference, which explores the difference between image-level class embeddings and pixel-level class embeddings. This simple approach increases our model’s performance in terms of hIoU from 25.9 to 31.8. In addition, compared with some zero-shot semantic segmentation methods, our method delivers better results in terms of hIoU (31.8) and \(\text {mIoU}_{{u}}\) (22.0) on the PASCAL VOC 2012 dataset with less supervision information.
Similar content being viewed by others
References
Chen LC, Zhu Y, George P, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. ECCV 2018:833–851
He K, Georgia G, Piotr D, Ross G (2020) Mask R-CNN IEEE Trans Pattern Anal Mach Intell 42(2):386–397
Anna K, Rodrigo B, Jan H, Matthias H, Bernt S (2017) Simple does it: Weakly supervised instance and semantic segmentation. CVPR 2017:1665–1674
Lin D, Dai J, Jia J, He K, Sun J (2016) ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation. CVPR 2016:3159–3167
Lee J, Kim E, Lee S, Lee J, Yoon S (2019) FickleNet: Weakly and semi-supervised semantic image segmentation using stochastic inference. CVPR 2019:5267–5276
Jiwoon A, Kwak S (2018) learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. CVPR 2018:4981–4990
Fan J, Zhang Z, Song C, Tan T (2020) learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. CVPR 2020:4283–4292
Xian Y, Choudhury S, He Y, Schiele B, Akata Z (2019) Semantic projection network for zero- and few-label semantic segmentation. CVPR 2019:8256–8265
Bucher M, Vu Th, Cord M, Patrick P (2019) Zero-shot semantic segmentation. NIPS 2019:468–479
Gu Z, Zhou S, Niu L, Zhao Z, Zhang L (2020) context-aware feature generation for zero-shot semantic segmentation. In: Proceedings of the 28th ACM international conference on multimedia, vol 2020, 1921–1929
Mancini M, Akata Z, Ricci E, Caputo B (2020) Towards recognizing unseen categories in unseen domains. ECCV 2020:466–483
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. CVPR 2016:2921–2929
Zhou ZH (2018) A brief introduction to weakly supervised learning. Nat Sci Rev 5(1):44–53
Antti R, Harri V, Mikko H, Mathias B, Tapani R (2015) Semi-supervised learning with Ladder networks. NIPS 2015:3546–3554
Yu Z, Liu W, Zou Y, Feng C, Ramalingam S, Kumar BVKV, Kautz J (2018) simultaneous edge alignment and learning. ECCV 2018:400–417
Mirikharaji Z, Yan Y, Hamarneh G (2019) learning to segment skin lesions from noisy annotations. DART/MIL3ID@MICCAI, pp 207–215
Ding L, Kuriyan AE, Ramchandran RS, Wykoff CC, Sharma G (2020) Weakly-supervised vessel detection in ultra-widefield fundus photography via iterative multi-modal registration and learning. IEEE Transactions on Medical Imaging, pp 1–1
Chen X, Gupta A (2015) Webly supervised learning of convolutional networks. ICCV 2015:1431–1439
Navarro F, Conjeti S, Tombari F, Navab N (2018) Webly supervised learning for skin lesion classification. International Conference on Medical Image Computing and Computer-Assisted Intervention 2018:398–406
Yang K, Hu X, Fang Y, Wang K, Stiefelhagen R (2020) omnisupervised omnidirectional semantic segmentation. IEEE Trans Intell Transport Syst 2020:1–16
Amy B, Olga R, Vittorio F, Li FF (2016) Whats the point: Semantic segmentation with point supervision. ECCV 2016:549–565
Wang Y, Zhang J, Kan M, Shan S, Chen X (2020) Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. CVPR, 2020, 1227512284
Sun G, Wang W, Dai J, Gool LV (2020) Mining cross-image semantics for weakly supervised semantic segmentation. ECCV 2020:347–365
Raza H, Ravanbakhsh M, Klein K, Nabi M (2019) Weakly supervised one shot segmentation. ICCVW, 2019
Siam M, Doraiswamy N, Oreshkin BN, Yao H, Jagersand M (2020) Weakly supervised few-shot object segmentation using co-attention with visual and semantic inputs. IJCAI 2020:860–867
Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for finegrained image classification. CVPR 2015:2927–2936
Xian Y, Lampert CH, Schiele B, Akata Z (2019) Zero-shot learninga comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell 41(9):2251–2265
Lapin M, Hein M, Schiele B (2018) Analysis and optimization of loss functions for multiclass, top-k, and multilabel classification. IEEE Trans Pattern Anal Mach Intell 40(7):1533–1554
Mark E, Eslami SM, Gool L, Williams CK, John W, Andrew Z (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136
Tomas M, Ilya S, Chen K, Corrado GS, Jeff D (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inform Process Syst 26:3111–3119
Armand J, Edouard G, Piotr B, Matthijs D, Herve J, Tomas M (2016) Fasttext.zip: Compressing text classification models. 2016, arXiv preprint arXiv:1612.03651
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. CVPR 2019:4591–4600
Lon B (2010) Large-scale machine learning with stochastic gradient descent. COMPSTAT 2010:177–186
Alex K, Ilya S (2017) Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Acknowledgements
This research is supported in part by the National Key Research and Development Program of China under Grant No.2020AAA0140004.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Shen, F., Lu, ZM., Lu, Z. et al. Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation. Multimed Tools Appl 81, 5443–5458 (2022). https://doi.org/10.1007/s11042-021-11792-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11792-1