Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation

Shen, Fengli; Lu, Zhe-Ming; Lu, Ziqian; Wang, Zonghui

doi:10.1007/s11042-021-11792-1

Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation

Published: 22 December 2021

Volume 81, pages 5443–5458, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Fengli Shen¹,
Zhe-Ming Lu¹,
Ziqian Lu¹ &
…
Zonghui Wang²

560 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

The major obstacle in semantic segmentation is that it requires a large number of pixel-level labeled data to train an effective model. In order to reduce the cost of annotation, weakly-supervised methods use weaker labels to overcome the need for per-pixel labels, while zero-shot methods transfer the knowledge learned from seen classes to unseen classes to reduce the number of classes that need to be labeled. To further alleviate the burden of annotation, we introduce a more challenging task of Weakly-supervised Zero-shot Semantic Segmentation (WZSS): learning models which only utilize image-level annotation of seen classes to segment images containing unseen objects. To this end, we propose a Dual Semantic-Guided (DSG) model which is double guided by semantic embeddings of classes to obtain classification scores and localization maps. By ignoring the localization maps with low classification scores, our proposed framework can generate prediction segmentation masks. To improve our model’s performance, we propose a simple stochastic selection on semantic embeddings during inference, which explores the difference between image-level class embeddings and pixel-level class embeddings. This simple approach increases our model’s performance in terms of hIoU from 25.9 to 31.8. In addition, compared with some zero-shot semantic segmentation methods, our method delivers better results in terms of hIoU (31.8) and \(\text {mIoU}_{{u}}\) (22.0) on the PASCAL VOC 2012 dataset with less supervision information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

“Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation

Target-Aware Bi-Transformer for Few-Shot Segmentation

A New Local Transformation Module for Few-Shot Segmentation

References

Chen LC, Zhu Y, George P, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. ECCV 2018:833–851
Google Scholar
He K, Georgia G, Piotr D, Ross G (2020) Mask R-CNN IEEE Trans Pattern Anal Mach Intell 42(2):386–397
Article Google Scholar
Anna K, Rodrigo B, Jan H, Matthias H, Bernt S (2017) Simple does it: Weakly supervised instance and semantic segmentation. CVPR 2017:1665–1674
Google Scholar
Lin D, Dai J, Jia J, He K, Sun J (2016) ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation. CVPR 2016:3159–3167
Google Scholar
Lee J, Kim E, Lee S, Lee J, Yoon S (2019) FickleNet: Weakly and semi-supervised semantic image segmentation using stochastic inference. CVPR 2019:5267–5276
Google Scholar
Jiwoon A, Kwak S (2018) learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. CVPR 2018:4981–4990
Google Scholar
Fan J, Zhang Z, Song C, Tan T (2020) learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. CVPR 2020:4283–4292
Google Scholar
Xian Y, Choudhury S, He Y, Schiele B, Akata Z (2019) Semantic projection network for zero- and few-label semantic segmentation. CVPR 2019:8256–8265
Google Scholar
Bucher M, Vu Th, Cord M, Patrick P (2019) Zero-shot semantic segmentation. NIPS 2019:468–479
Google Scholar
Gu Z, Zhou S, Niu L, Zhao Z, Zhang L (2020) context-aware feature generation for zero-shot semantic segmentation. In: Proceedings of the 28th ACM international conference on multimedia, vol 2020, 1921–1929
Mancini M, Akata Z, Ricci E, Caputo B (2020) Towards recognizing unseen categories in unseen domains. ECCV 2020:466–483
Google Scholar
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. CVPR 2016:2921–2929
Google Scholar
Zhou ZH (2018) A brief introduction to weakly supervised learning. Nat Sci Rev 5(1):44–53
Article Google Scholar
Antti R, Harri V, Mikko H, Mathias B, Tapani R (2015) Semi-supervised learning with Ladder networks. NIPS 2015:3546–3554
Google Scholar
Yu Z, Liu W, Zou Y, Feng C, Ramalingam S, Kumar BVKV, Kautz J (2018) simultaneous edge alignment and learning. ECCV 2018:400–417
Google Scholar
Mirikharaji Z, Yan Y, Hamarneh G (2019) learning to segment skin lesions from noisy annotations. DART/MIL3ID@MICCAI, pp 207–215
Ding L, Kuriyan AE, Ramchandran RS, Wykoff CC, Sharma G (2020) Weakly-supervised vessel detection in ultra-widefield fundus photography via iterative multi-modal registration and learning. IEEE Transactions on Medical Imaging, pp 1–1
Chen X, Gupta A (2015) Webly supervised learning of convolutional networks. ICCV 2015:1431–1439
Google Scholar
Navarro F, Conjeti S, Tombari F, Navab N (2018) Webly supervised learning for skin lesion classification. International Conference on Medical Image Computing and Computer-Assisted Intervention 2018:398–406
Google Scholar
Yang K, Hu X, Fang Y, Wang K, Stiefelhagen R (2020) omnisupervised omnidirectional semantic segmentation. IEEE Trans Intell Transport Syst 2020:1–16
Google Scholar
Amy B, Olga R, Vittorio F, Li FF (2016) Whats the point: Semantic segmentation with point supervision. ECCV 2016:549–565
Google Scholar
Wang Y, Zhang J, Kan M, Shan S, Chen X (2020) Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. CVPR, 2020, 1227512284
Sun G, Wang W, Dai J, Gool LV (2020) Mining cross-image semantics for weakly supervised semantic segmentation. ECCV 2020:347–365
Google Scholar
Raza H, Ravanbakhsh M, Klein K, Nabi M (2019) Weakly supervised one shot segmentation. ICCVW, 2019
Siam M, Doraiswamy N, Oreshkin BN, Yao H, Jagersand M (2020) Weakly supervised few-shot object segmentation using co-attention with visual and semantic inputs. IJCAI 2020:860–867
Google Scholar
Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for finegrained image classification. CVPR 2015:2927–2936
Google Scholar
Xian Y, Lampert CH, Schiele B, Akata Z (2019) Zero-shot learninga comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell 41(9):2251–2265
Article Google Scholar
Lapin M, Hein M, Schiele B (2018) Analysis and optimization of loss functions for multiclass, top-k, and multilabel classification. IEEE Trans Pattern Anal Mach Intell 40(7):1533–1554
Article Google Scholar
Mark E, Eslami SM, Gool L, Williams CK, John W, Andrew Z (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136
Article Google Scholar
Tomas M, Ilya S, Chen K, Corrado GS, Jeff D (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inform Process Syst 26:3111–3119
Google Scholar
Armand J, Edouard G, Piotr B, Matthijs D, Herve J, Tomas M (2016) Fasttext.zip: Compressing text classification models. 2016, arXiv preprint arXiv:1612.03651
Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. CVPR 2019:4591–4600
Google Scholar
Lon B (2010) Large-scale machine learning with stochastic gradient descent. COMPSTAT 2010:177–186
MathSciNet Google Scholar
Alex K, Ilya S (2017) Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar

Download references

Acknowledgements

This research is supported in part by the National Key Research and Development Program of China under Grant No.2020AAA0140004.

Author information

Authors and Affiliations

School of Aeronautics and Astronautics, Zhejiang University, 310027, Hangzhou, People’s Republic of China
Fengli Shen, Zhe-Ming Lu & Ziqian Lu
College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, People’s Republic of China
Zonghui Wang

Authors

Fengli Shen
View author publications
You can also search for this author in PubMed Google Scholar
Zhe-Ming Lu
View author publications
You can also search for this author in PubMed Google Scholar
Ziqian Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zonghui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhe-Ming Lu or Zonghui Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, F., Lu, ZM., Lu, Z. et al. Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation. Multimed Tools Appl 81, 5443–5458 (2022). https://doi.org/10.1007/s11042-021-11792-1

Download citation

Received: 18 February 2021
Revised: 23 June 2021
Accepted: 14 December 2021
Published: 22 December 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11042-021-11792-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation

Abstract

Access this article

Similar content being viewed by others

“Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation

Target-Aware Bi-Transformer for Few-Shot Segmentation

A New Local Transformation Module for Few-Shot Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation

Abstract

Access this article

Similar content being viewed by others

“Car or Bus?" CLearSeg: CLIP-Enhanced Discrimination Among Resembling Classes for Few-Shot Semantic Segmentation

Target-Aware Bi-Transformer for Few-Shot Segmentation

A New Local Transformation Module for Few-Shot Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation