Abstract
In recent years, few-shot segmentation has been proposed to alleviate the scarcity of pixel-wise labels, which performs segmentation on new categories using only a few annotated samples, while the problems of category-agnostic and low-data make few-shot segmentation very challenging. To address the task, we propose a new symmetric network, which mines semantic information from intra-image and cross-image in a holistic view and guides the segmentation of the paired images (i.e., the support image and the query image). We emphasize the importance of self-correlations in intra-image and inter-correlations in cross-image. Taking advantage of the provided labels, a self-attention relation module is proposed to transfer more category information for non-linear relation metrics by mining intra-image semantics. A co-attention module is designed to obtain common semantic information by exploring long-range dependencies of cross-image in spatial and channel dimensions, thus producing more precise segmentation results for the few-shot segmentation task. Experiments on two benchmark datasets (FSS-1000 and PASCAL-5i) show that the mean Intersection-over-Union scores of our method attain state-of-the-art performance.
Similar content being viewed by others
References
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Chen L-C, Zhu Y, Papandreou G et al. (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In proceedings of the European conference on computer vision (ECCV): 801-818. https://doi.org/10.1007/978-3-030-01234-2_49
Dong N, Xing EP (2018) Few-shot semantic segmentation with prototype learning. BMVC 3(4)
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611. https://doi.org/10.1109/tpami.2006.79
Fu J, Liu J, Tian H et al. (2019) Dual attention network for scene segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 3146-3154. https://doi.org/10.1109/cvpr.2019.00326
Gadekallu TR, Alazab M, Kaluri R, Maddikunta PKR, Bhattacharya S, Lakshmanna K, M P (2021) Hand gesture classification using a novel CNN-crow search algorithm. Complex Intell Syst 7:1855–1868. https://doi.org/10.1007/s40747-021-00324-x
Han M, Wang R, Yang J, Xue L, Hu M (2020) Multi-scale feature network for few-shot learning. Multimed Tools Appl 79(17):11617–11637. https://doi.org/10.1007/s11042-019-08413-3
He S, Han D (2020) An effective dense co-attention networks for visual question answering. Sensors 20(17):4897. https://doi.org/10.3390/s20174897
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Patt Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1007/978-3-319-10578-9_23
He K, Zhang X, Ren S et al. (2016) Deep residual learning for image recognition. In proceedings of the IEEE conference on computer vision and pattern recognition: 770-778. https://doi.org/10.1109/cvpr.2016.90
Hong S, Oh J, Lee H et al. (2016) Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In proceedings of the IEEE conference on computer vision and pattern recognition: 3204-3212. https://doi.org/10.1109/cvpr.2016.349
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In proceedings of the IEEE conference on computer vision and pattern recognition: 7132-7141. https://doi.org/10.1109/cvpr.2018.00745
Hu H, Gu J, Zhang Z et al. (2018) Relation networks for object detection. In proceedings of the IEEE conference on computer vision and pattern recognition: 3588-3597. https://doi.org/10.1109/cvpr.2018.00378
Hu T, Yang P, Zhang C et al. (2019) Attention-based multi-context guiding for few-shot semantic segmentation. In proceedings of the AAAI conference on artificial intelligence: 8441-8448. https://doi.org/10.1609/aaai.v33i01.33018441
Hui B, Zhu P, Hu Q et al. (2019) Self-attention relation network for few-shot learning. In 2019 IEEE international conference on Multimedia & Expo Workshops (ICMEW): 198-203. https://doi.org/10.1109/icmew.2019.00041
Jégou S, Drozdzal M, Vazquez D et al. (2017) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In proceedings of the IEEE conference on computer vision and pattern recognition workshops: 11-19. https://doi.org/10.1109/cvprw.2017.156
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop 2
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Li X, Wei T, Chen YP et al. (2020) Fss-1000: a 1000-class dataset for few-shot segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 2869-2878. https://doi.org/10.1109/cvpr42600.2020.00294
Liu W, Zhang C, Lin G et al. (2020) Crnet: cross-reference networks for few-shot segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 4165-4173. https://doi.org/10.1109/cvpr42600.2020.00422
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In proceedings of the IEEE conference on computer vision and pattern recognition: 3431-3440. https://doi.org/10.1109/cvpr.2015.7298965
Lu J, Yang J, Batra D et al (2016) Hierarchical question-image co-attention for visual question answering. Neural Inform Process Syst:289–297 https://arxiv.org/abs/1606.00061
Lu X, Wang W, Ma C et al. (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 3623-3632. https://doi.org/10.1109/cvpr.2019.00374
Nguyen D-K, Okatani T (2018) Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering. In proceedings of the IEEE conference on computer vision and pattern recognition: 6087-6096. https://doi.org/10.1109/cvpr.2018.00637
Rakelly K, Shelhamer E, Darrell T et al. (2018) Conditional networks for few-shot semantic segmentation. ICLR workshop. https://openreview.net/references/pdf?id=Bkxg2F1vG
Rakelly K, Shelhamer E, Darrell T et al. (2018) Few-shot segmentation propagation with guided networks. arXiv preprint arXiv:.07373. https://arxiv.org/abs/1806.07373
Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/tpami.2016.2577031
Rodner E, Denzler J (2010) One-shot learning of object categories using dependent gaussian processes. In joint pattern recognition symposium: 232-241. https://doi.org/10.1007/978-3-642-15986-2_24
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In international conference on medical image computing and computer-assisted intervention: 234-241. https://doi.org/10.1007/978-3-319-24574-4_28
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Shaban A, Bansal S, Liu Z et al. (2017) One-shot learning for semantic segmentation. BMVC. https://arxiv.org/abs/1709.03410
Shen T, Zhou T, Long G et al (2018) Disan: directional self-attention network for rnn/cnn-free language understanding. AAAI Conf Artificial Intell 32(1) https://ojs.aaai.org/index.php/AAAI/article/view/11941
Siam M, Oreshkin BN, Jagersand M (2019) Amp: adaptive masked proxies for few-shot segmentation. In proceedings of the IEEE/CVF international conference on computer vision: 5249-5258. https://doi.org/10.1109/iccv.2019.00535
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
Smirnov EA, Timoshenko DM, Andrianov SN (2014) Comparison of regularization methods for imagenet classification with deep convolutional neural networks. Aasri Procedia 6:89–94. https://doi.org/10.1016/j.aasri.2014.05.013
Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. Neural Inform Process Syst:4080–4090 https://arxiv.org/abs/1703.05175
Sun G, Wang W, Dai J et al. (2020) Mining cross-image semantics for weakly supervised semantic segmentation. In European conference on computer vision: 347-365. https://doi.org/10.1007/978-3-030-58536-5_21
Sung F, Yang Y, Zhang L et al. (2018) Learning to compare: relation network for few-shot learning. In proceedings of the IEEE conference on computer vision and pattern recognition: 1199-1208. https://doi.org/10.1109/cvpr.2018.00131
Tian Z, Shen C, Chen H et al. (2019) Fcos: fully convolutional one-stage object detection. In proceedings of the IEEE/CVF international conference on computer vision: 9627-9636. https://doi.org/10.1109/iccv.2019.00972
Vasan D, Alazab M, Wassan S, Safaei B, Zheng Q (2020) Image-based malware classification using ensemble of CNN architectures (IMCEC). Comp Sec 92:101748. https://doi.org/10.1016/j.cose.2020.101748
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Neural Inform Process Syst:6000–6010. https://doi.org/10.5555/3295222.3295349
Vinyals O, Blundell C, Lillicrap T et al (2016) Matching networks for one shot learning. Neural Inform Process Syst:630–3638 https://arxiv.org/abs/1606.04080
Wang F, Jiang M, Qian C et al. (2017) Residual attention network for image classification. In proceedings of the IEEE conference on computer vision and pattern recognition: 3156-3164. https://doi.org/10.1109/cvpr.2017.683
Wang W, Lu X, Shen J et al. (2019) Zero-shot video object segmentation via attentive graph neural networks. In proceedings of the IEEE/CVF international conference on computer vision: 9236-9245. https://doi.org/10.1109/iccv.2019.00933
Wang Y, Yao Q, Kwok JT et al (2020) Generalizing from a few examples: a survey on few-shot learning. ACM Comput Surv 53(3):1–34. https://doi.org/10.1145/3386252
Wang W, Zhou T, Yu F et al. (2021) Exploring cross-image pixel contrast for semantic segmentation. arXiv preprint arXiv:210111939
Woo S, Park J, Lee J-Y et al. (2018) Cbam: convolutional block attention module. In proceedings of the European conference on computer vision (ECCV): 3-19. https://doi.org/10.1007/978-3-030-01234-2_1
Wu Q, Wang P, Shen C et al. (2018) Are you talking to me? Reasoned visual dialog generation through adversarial learning. In proceedings of the IEEE conference on computer vision and pattern recognition: 6106-6115. https://doi.org/10.1109/cvpr.2018.00639
Wu Z, Li Y, Guo L et al. (2019) Parn: position-aware relation networks for few-shot learning. In proceedings of the IEEE/CVF international conference on computer vision: 6659-6667. https://doi.org/10.1109/iccv.2019.00676
Yang B, Liu C, Li B et al (2020) Prototype mixture models for few-shot semantic segmentation. In Eur Conf Comp:763–778
Yang K, Zhang J, Reiß S et al. (2021) Capturing Omni-range context for omnidirectional segmentation. arXiv preprint arXiv:210305687
Yu Z, Yu J, Cui Y et al. (2019) Deep modular co-attention networks for visual question answering. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 6281-6290. https://doi.org/10.1109/cvpr.2019.00644
Zhang H, Zhang H, Wang C et al. (2019) Co-occurrent features in semantic segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 548-557. https://doi.org/10.1109/cvpr.2019.00064
Zhang C, Lin G, Liu F et al. (2019) Canet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 5217-5226. https://doi.org/10.1109/cvpr.2019.00536
Zhang X, Wei Y, Yang Y, Huang TS (2020) SG-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans Cybern 50(9):3855–3865. https://doi.org/10.1109/tcyb.2020.2992433
Zheng Z, Wang W, Qi S et al. (2019) Reasoning visual dialogs with structural and partial observations. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 6669-6678. https://doi.org/10.1109/cvpr.2019.00683
Acknowledgments
This research was supported by the National Natural Science Foundation of China under Grant 61806071 and 62102129, the Major Research plan of the National Natural Science Foundation of China under Grant 91746207, National Key R&D Program of China under Grant 2018YFC08, Natural Science Foundation of Hebei Province under Grants F2019202381, F2019202464, F2020202025 and F2021202030, Key Research and Development Program of Xinjiang Province under Grant 2020B03001, Open Projects Program of National Laboratory of Pattern Recognition under Grant 201900043, Technical Expert Project of Tianjin under Grants 19JCTPJC55800 and 19JCTPJC57000, and Sci-tech Research Project of Higher Education of Hebei Province under Grant QN2019207 and QN2020185.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Y., Guo, Y., Zhu, Y. et al. Mining semantic information from intra-image and cross-image for few-shot segmentation. Multimed Tools Appl 81, 18305–18326 (2022). https://doi.org/10.1007/s11042-022-12096-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12096-8