Skip to main content
Log in

Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The major obstacle in semantic segmentation is that it requires a large number of pixel-level labeled data to train an effective model. In order to reduce the cost of annotation, weakly-supervised methods use weaker labels to overcome the need for per-pixel labels, while zero-shot methods transfer the knowledge learned from seen classes to unseen classes to reduce the number of classes that need to be labeled. To further alleviate the burden of annotation, we introduce a more challenging task of Weakly-supervised Zero-shot Semantic Segmentation (WZSS): learning models which only utilize image-level annotation of seen classes to segment images containing unseen objects. To this end, we propose a Dual Semantic-Guided (DSG) model which is double guided by semantic embeddings of classes to obtain classification scores and localization maps. By ignoring the localization maps with low classification scores, our proposed framework can generate prediction segmentation masks. To improve our model’s performance, we propose a simple stochastic selection on semantic embeddings during inference, which explores the difference between image-level class embeddings and pixel-level class embeddings. This simple approach increases our model’s performance in terms of hIoU from 25.9 to 31.8. In addition, compared with some zero-shot semantic segmentation methods, our method delivers better results in terms of hIoU (31.8) and \(\text {mIoU}_{{u}}\) (22.0) on the PASCAL VOC 2012 dataset with less supervision information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Chen LC, Zhu Y, George P, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. ECCV 2018:833–851

    Google Scholar 

  2. He K, Georgia G, Piotr D, Ross G (2020) Mask R-CNN IEEE Trans Pattern Anal Mach Intell 42(2):386–397

    Article  Google Scholar 

  3. Anna K, Rodrigo B, Jan H, Matthias H, Bernt S (2017) Simple does it: Weakly supervised instance and semantic segmentation. CVPR 2017:1665–1674

    Google Scholar 

  4. Lin D, Dai J, Jia J, He K, Sun J (2016) ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation. CVPR 2016:3159–3167

    Google Scholar 

  5. Lee J, Kim E, Lee S, Lee J, Yoon S (2019) FickleNet: Weakly and semi-supervised semantic image segmentation using stochastic inference. CVPR 2019:5267–5276

    Google Scholar 

  6. Jiwoon A, Kwak S (2018) learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. CVPR 2018:4981–4990

    Google Scholar 

  7. Fan J, Zhang Z, Song C, Tan T (2020) learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. CVPR 2020:4283–4292

    Google Scholar 

  8. Xian Y, Choudhury S, He Y, Schiele B, Akata Z (2019) Semantic projection network for zero- and few-label semantic segmentation. CVPR 2019:8256–8265

    Google Scholar 

  9. Bucher M, Vu Th, Cord M, Patrick P (2019) Zero-shot semantic segmentation. NIPS 2019:468–479

    Google Scholar 

  10. Gu Z, Zhou S, Niu L, Zhao Z, Zhang L (2020) context-aware feature generation for zero-shot semantic segmentation. In: Proceedings of the 28th ACM international conference on multimedia, vol 2020, 1921–1929

  11. Mancini M, Akata Z, Ricci E, Caputo B (2020) Towards recognizing unseen categories in unseen domains. ECCV 2020:466–483

    Google Scholar 

  12. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. CVPR 2016:2921–2929

    Google Scholar 

  13. Zhou ZH (2018) A brief introduction to weakly supervised learning. Nat Sci Rev 5(1):44–53

    Article  Google Scholar 

  14. Antti R, Harri V, Mikko H, Mathias B, Tapani R (2015) Semi-supervised learning with Ladder networks. NIPS 2015:3546–3554

    Google Scholar 

  15. Yu Z, Liu W, Zou Y, Feng C, Ramalingam S, Kumar BVKV, Kautz J (2018) simultaneous edge alignment and learning. ECCV 2018:400–417

    Google Scholar 

  16. Mirikharaji Z, Yan Y, Hamarneh G (2019) learning to segment skin lesions from noisy annotations. DART/MIL3ID@MICCAI, pp 207–215

  17. Ding L, Kuriyan AE, Ramchandran RS, Wykoff CC, Sharma G (2020) Weakly-supervised vessel detection in ultra-widefield fundus photography via iterative multi-modal registration and learning. IEEE Transactions on Medical Imaging, pp 1–1

  18. Chen X, Gupta A (2015) Webly supervised learning of convolutional networks. ICCV 2015:1431–1439

    Google Scholar 

  19. Navarro F, Conjeti S, Tombari F, Navab N (2018) Webly supervised learning for skin lesion classification. International Conference on Medical Image Computing and Computer-Assisted Intervention 2018:398–406

    Google Scholar 

  20. Yang K, Hu X, Fang Y, Wang K, Stiefelhagen R (2020) omnisupervised omnidirectional semantic segmentation. IEEE Trans Intell Transport Syst 2020:1–16

    Google Scholar 

  21. Amy B, Olga R, Vittorio F, Li FF (2016) Whats the point: Semantic segmentation with point supervision. ECCV 2016:549–565

    Google Scholar 

  22. Wang Y, Zhang J, Kan M, Shan S, Chen X (2020) Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. CVPR, 2020, 1227512284

  23. Sun G, Wang W, Dai J, Gool LV (2020) Mining cross-image semantics for weakly supervised semantic segmentation. ECCV 2020:347–365

    Google Scholar 

  24. Raza H, Ravanbakhsh M, Klein K, Nabi M (2019) Weakly supervised one shot segmentation. ICCVW, 2019

  25. Siam M, Doraiswamy N, Oreshkin BN, Yao H, Jagersand M (2020) Weakly supervised few-shot object segmentation using co-attention with visual and semantic inputs. IJCAI 2020:860–867

    Google Scholar 

  26. Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for finegrained image classification. CVPR 2015:2927–2936

    Google Scholar 

  27. Xian Y, Lampert CH, Schiele B, Akata Z (2019) Zero-shot learninga comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell 41(9):2251–2265

    Article  Google Scholar 

  28. Lapin M, Hein M, Schiele B (2018) Analysis and optimization of loss functions for multiclass, top-k, and multilabel classification. IEEE Trans Pattern Anal Mach Intell 40(7):1533–1554

    Article  Google Scholar 

  29. Mark E, Eslami SM, Gool L, Williams CK, John W, Andrew Z (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  30. Tomas M, Ilya S, Chen K, Corrado GS, Jeff D (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inform Process Syst 26:3111–3119

    Google Scholar 

  31. Armand J, Edouard G, Piotr B, Matthijs D, Herve J, Tomas M (2016) Fasttext.zip: Compressing text classification models. 2016, arXiv preprint arXiv:1612.03651

  32. Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. CVPR 2019:4591–4600

    Google Scholar 

  33. Lon B (2010) Large-scale machine learning with stochastic gradient descent. COMPSTAT 2010:177–186

    MathSciNet  Google Scholar 

  34. Alex K, Ilya S (2017) Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported in part by the National Key Research and Development Program of China under Grant No.2020AAA0140004.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhe-Ming Lu or Zonghui Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, F., Lu, ZM., Lu, Z. et al. Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation. Multimed Tools Appl 81, 5443–5458 (2022). https://doi.org/10.1007/s11042-021-11792-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11792-1

Keywords

Navigation