Mining semantic information from intra-image and cross-image for few-shot segmentation

Liu, Yu; Guo, Yingchun; Zhu, Ye; Yu, Ming

doi:10.1007/s11042-022-12096-8

Mining semantic information from intra-image and cross-image for few-shot segmentation

Published: 09 March 2022

Volume 81, pages 18305–18326, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yu Liu¹,
Yingchun Guo²,
Ye Zhu² &
…
Ming Yu^1,2

360 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

In recent years, few-shot segmentation has been proposed to alleviate the scarcity of pixel-wise labels, which performs segmentation on new categories using only a few annotated samples, while the problems of category-agnostic and low-data make few-shot segmentation very challenging. To address the task, we propose a new symmetric network, which mines semantic information from intra-image and cross-image in a holistic view and guides the segmentation of the paired images (i.e., the support image and the query image). We emphasize the importance of self-correlations in intra-image and inter-correlations in cross-image. Taking advantage of the provided labels, a self-attention relation module is proposed to transfer more category information for non-linear relation metrics by mining intra-image semantics. A co-attention module is designed to obtain common semantic information by exploring long-range dependencies of cross-image in spatial and channel dimensions, thus producing more precise segmentation results for the few-shot segmentation task. Experiments on two benchmark datasets (FSS-1000 and PASCAL-5ⁱ) show that the mean Intersection-over-Union scores of our method attain state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Learning with Noisy Correspondence

Article 13 April 2024

Learning to Prompt for Vision-Language Models

Article 31 July 2022

References

Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Article Google Scholar
Chen L-C, Zhu Y, Papandreou G et al. (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In proceedings of the European conference on computer vision (ECCV): 801-818. https://doi.org/10.1007/978-3-030-01234-2_49
Dong N, Xing EP (2018) Few-shot semantic segmentation with prototype learning. BMVC 3(4)
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611. https://doi.org/10.1109/tpami.2006.79
Article Google Scholar
Fu J, Liu J, Tian H et al. (2019) Dual attention network for scene segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 3146-3154. https://doi.org/10.1109/cvpr.2019.00326
Gadekallu TR, Alazab M, Kaluri R, Maddikunta PKR, Bhattacharya S, Lakshmanna K, M P (2021) Hand gesture classification using a novel CNN-crow search algorithm. Complex Intell Syst 7:1855–1868. https://doi.org/10.1007/s40747-021-00324-x
Article Google Scholar
Han M, Wang R, Yang J, Xue L, Hu M (2020) Multi-scale feature network for few-shot learning. Multimed Tools Appl 79(17):11617–11637. https://doi.org/10.1007/s11042-019-08413-3
Article Google Scholar
He S, Han D (2020) An effective dense co-attention networks for visual question answering. Sensors 20(17):4897. https://doi.org/10.3390/s20174897
Article Google Scholar
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Patt Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1007/978-3-319-10578-9_23
Article Google Scholar
He K, Zhang X, Ren S et al. (2016) Deep residual learning for image recognition. In proceedings of the IEEE conference on computer vision and pattern recognition: 770-778. https://doi.org/10.1109/cvpr.2016.90
Hong S, Oh J, Lee H et al. (2016) Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In proceedings of the IEEE conference on computer vision and pattern recognition: 3204-3212. https://doi.org/10.1109/cvpr.2016.349
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In proceedings of the IEEE conference on computer vision and pattern recognition: 7132-7141. https://doi.org/10.1109/cvpr.2018.00745
Hu H, Gu J, Zhang Z et al. (2018) Relation networks for object detection. In proceedings of the IEEE conference on computer vision and pattern recognition: 3588-3597. https://doi.org/10.1109/cvpr.2018.00378
Hu T, Yang P, Zhang C et al. (2019) Attention-based multi-context guiding for few-shot semantic segmentation. In proceedings of the AAAI conference on artificial intelligence: 8441-8448. https://doi.org/10.1609/aaai.v33i01.33018441
Hui B, Zhu P, Hu Q et al. (2019) Self-attention relation network for few-shot learning. In 2019 IEEE international conference on Multimedia & Expo Workshops (ICMEW): 198-203. https://doi.org/10.1109/icmew.2019.00041
Jégou S, Drozdzal M, Vazquez D et al. (2017) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In proceedings of the IEEE conference on computer vision and pattern recognition workshops: 11-19. https://doi.org/10.1109/cvprw.2017.156
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop 2
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Article Google Scholar
Li X, Wei T, Chen YP et al. (2020) Fss-1000: a 1000-class dataset for few-shot segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 2869-2878. https://doi.org/10.1109/cvpr42600.2020.00294
Liu W, Zhang C, Lin G et al. (2020) Crnet: cross-reference networks for few-shot segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 4165-4173. https://doi.org/10.1109/cvpr42600.2020.00422
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In proceedings of the IEEE conference on computer vision and pattern recognition: 3431-3440. https://doi.org/10.1109/cvpr.2015.7298965
Lu J, Yang J, Batra D et al (2016) Hierarchical question-image co-attention for visual question answering. Neural Inform Process Syst:289–297 https://arxiv.org/abs/1606.00061
Lu X, Wang W, Ma C et al. (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 3623-3632. https://doi.org/10.1109/cvpr.2019.00374
Nguyen D-K, Okatani T (2018) Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering. In proceedings of the IEEE conference on computer vision and pattern recognition: 6087-6096. https://doi.org/10.1109/cvpr.2018.00637
Rakelly K, Shelhamer E, Darrell T et al. (2018) Conditional networks for few-shot semantic segmentation. ICLR workshop. https://openreview.net/references/pdf?id=Bkxg2F1vG
Rakelly K, Shelhamer E, Darrell T et al. (2018) Few-shot segmentation propagation with guided networks. arXiv preprint arXiv:.07373. https://arxiv.org/abs/1806.07373
Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/tpami.2016.2577031
Article Google Scholar
Rodner E, Denzler J (2010) One-shot learning of object categories using dependent gaussian processes. In joint pattern recognition symposium: 232-241. https://doi.org/10.1007/978-3-642-15986-2_24
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In international conference on medical image computing and computer-assisted intervention: 234-241. https://doi.org/10.1007/978-3-319-24574-4_28
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Shaban A, Bansal S, Liu Z et al. (2017) One-shot learning for semantic segmentation. BMVC. https://arxiv.org/abs/1709.03410
Shen T, Zhou T, Long G et al (2018) Disan: directional self-attention network for rnn/cnn-free language understanding. AAAI Conf Artificial Intell 32(1) https://ojs.aaai.org/index.php/AAAI/article/view/11941
Siam M, Oreshkin BN, Jagersand M (2019) Amp: adaptive masked proxies for few-shot segmentation. In proceedings of the IEEE/CVF international conference on computer vision: 5249-5258. https://doi.org/10.1109/iccv.2019.00535
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
Smirnov EA, Timoshenko DM, Andrianov SN (2014) Comparison of regularization methods for imagenet classification with deep convolutional neural networks. Aasri Procedia 6:89–94. https://doi.org/10.1016/j.aasri.2014.05.013
Article Google Scholar
Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. Neural Inform Process Syst:4080–4090 https://arxiv.org/abs/1703.05175
Sun G, Wang W, Dai J et al. (2020) Mining cross-image semantics for weakly supervised semantic segmentation. In European conference on computer vision: 347-365. https://doi.org/10.1007/978-3-030-58536-5_21
Sung F, Yang Y, Zhang L et al. (2018) Learning to compare: relation network for few-shot learning. In proceedings of the IEEE conference on computer vision and pattern recognition: 1199-1208. https://doi.org/10.1109/cvpr.2018.00131
Tian Z, Shen C, Chen H et al. (2019) Fcos: fully convolutional one-stage object detection. In proceedings of the IEEE/CVF international conference on computer vision: 9627-9636. https://doi.org/10.1109/iccv.2019.00972
Vasan D, Alazab M, Wassan S, Safaei B, Zheng Q (2020) Image-based malware classification using ensemble of CNN architectures (IMCEC). Comp Sec 92:101748. https://doi.org/10.1016/j.cose.2020.101748
Article Google Scholar
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Neural Inform Process Syst:6000–6010. https://doi.org/10.5555/3295222.3295349
Vinyals O, Blundell C, Lillicrap T et al (2016) Matching networks for one shot learning. Neural Inform Process Syst:630–3638 https://arxiv.org/abs/1606.04080
Wang F, Jiang M, Qian C et al. (2017) Residual attention network for image classification. In proceedings of the IEEE conference on computer vision and pattern recognition: 3156-3164. https://doi.org/10.1109/cvpr.2017.683
Wang W, Lu X, Shen J et al. (2019) Zero-shot video object segmentation via attentive graph neural networks. In proceedings of the IEEE/CVF international conference on computer vision: 9236-9245. https://doi.org/10.1109/iccv.2019.00933
Wang Y, Yao Q, Kwok JT et al (2020) Generalizing from a few examples: a survey on few-shot learning. ACM Comput Surv 53(3):1–34. https://doi.org/10.1145/3386252
Article Google Scholar
Wang W, Zhou T, Yu F et al. (2021) Exploring cross-image pixel contrast for semantic segmentation. arXiv preprint arXiv:210111939
Woo S, Park J, Lee J-Y et al. (2018) Cbam: convolutional block attention module. In proceedings of the European conference on computer vision (ECCV): 3-19. https://doi.org/10.1007/978-3-030-01234-2_1
Wu Q, Wang P, Shen C et al. (2018) Are you talking to me? Reasoned visual dialog generation through adversarial learning. In proceedings of the IEEE conference on computer vision and pattern recognition: 6106-6115. https://doi.org/10.1109/cvpr.2018.00639
Wu Z, Li Y, Guo L et al. (2019) Parn: position-aware relation networks for few-shot learning. In proceedings of the IEEE/CVF international conference on computer vision: 6659-6667. https://doi.org/10.1109/iccv.2019.00676
Yang B, Liu C, Li B et al (2020) Prototype mixture models for few-shot semantic segmentation. In Eur Conf Comp:763–778
Yang K, Zhang J, Reiß S et al. (2021) Capturing Omni-range context for omnidirectional segmentation. arXiv preprint arXiv:210305687
Yu Z, Yu J, Cui Y et al. (2019) Deep modular co-attention networks for visual question answering. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 6281-6290. https://doi.org/10.1109/cvpr.2019.00644
Zhang H, Zhang H, Wang C et al. (2019) Co-occurrent features in semantic segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 548-557. https://doi.org/10.1109/cvpr.2019.00064
Zhang C, Lin G, Liu F et al. (2019) Canet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 5217-5226. https://doi.org/10.1109/cvpr.2019.00536
Zhang X, Wei Y, Yang Y, Huang TS (2020) SG-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans Cybern 50(9):3855–3865. https://doi.org/10.1109/tcyb.2020.2992433
Article Google Scholar
Zheng Z, Wang W, Qi S et al. (2019) Reasoning visual dialogs with structural and partial observations. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 6669-6678. https://doi.org/10.1109/cvpr.2019.00683

Download references

Acknowledgments

This research was supported by the National Natural Science Foundation of China under Grant 61806071 and 62102129, the Major Research plan of the National Natural Science Foundation of China under Grant 91746207, National Key R&D Program of China under Grant 2018YFC08, Natural Science Foundation of Hebei Province under Grants F2019202381, F2019202464, F2020202025 and F2021202030, Key Research and Development Program of Xinjiang Province under Grant 2020B03001, Open Projects Program of National Laboratory of Pattern Recognition under Grant 201900043, Technical Expert Project of Tianjin under Grants 19JCTPJC55800 and 19JCTPJC57000, and Sci-tech Research Project of Higher Education of Hebei Province under Grant QN2019207 and QN2020185.

Author information

Authors and Affiliations

School of Electronics and Information Engineering, Hebei University of Technology, Tianjin, 300401, China
Yu Liu & Ming Yu
School of Artificial Intelligence, Hebei University of Technology, Tianjin, 300401, China
Yingchun Guo, Ye Zhu & Ming Yu

Authors

Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yingchun Guo
View author publications
You can also search for this author in PubMed Google Scholar
Ye Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ming Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Guo, Y., Zhu, Y. et al. Mining semantic information from intra-image and cross-image for few-shot segmentation. Multimed Tools Appl 81, 18305–18326 (2022). https://doi.org/10.1007/s11042-022-12096-8

Download citation

Received: 30 March 2021
Revised: 14 July 2021
Accepted: 03 January 2022
Published: 09 March 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11042-022-12096-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining semantic information from intra-image and cross-image for few-shot segmentation

Abstract

Access this article

Similar content being viewed by others

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Learning with Noisy Correspondence

Learning to Prompt for Vision-Language Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining semantic information from intra-image and cross-image for few-shot segmentation

Abstract

Access this article

Similar content being viewed by others

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Learning with Noisy Correspondence

Learning to Prompt for Vision-Language Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation