Skip to main content
Log in

Mining semantic information from intra-image and cross-image for few-shot segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, few-shot segmentation has been proposed to alleviate the scarcity of pixel-wise labels, which performs segmentation on new categories using only a few annotated samples, while the problems of category-agnostic and low-data make few-shot segmentation very challenging. To address the task, we propose a new symmetric network, which mines semantic information from intra-image and cross-image in a holistic view and guides the segmentation of the paired images (i.e., the support image and the query image). We emphasize the importance of self-correlations in intra-image and inter-correlations in cross-image. Taking advantage of the provided labels, a self-attention relation module is proposed to transfer more category information for non-linear relation metrics by mining intra-image semantics. A co-attention module is designed to obtain common semantic information by exploring long-range dependencies of cross-image in spatial and channel dimensions, thus producing more precise segmentation results for the few-shot segmentation task. Experiments on two benchmark datasets (FSS-1000 and PASCAL-5i) show that the mean Intersection-over-Union scores of our method attain state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  2. Chen L-C, Zhu Y, Papandreou G et al. (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In proceedings of the European conference on computer vision (ECCV): 801-818. https://doi.org/10.1007/978-3-030-01234-2_49

  3. Dong N, Xing EP (2018) Few-shot semantic segmentation with prototype learning. BMVC 3(4)

  4. Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611. https://doi.org/10.1109/tpami.2006.79

    Article  Google Scholar 

  5. Fu J, Liu J, Tian H et al. (2019) Dual attention network for scene segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 3146-3154. https://doi.org/10.1109/cvpr.2019.00326

  6. Gadekallu TR, Alazab M, Kaluri R, Maddikunta PKR, Bhattacharya S, Lakshmanna K, M P (2021) Hand gesture classification using a novel CNN-crow search algorithm. Complex Intell Syst 7:1855–1868. https://doi.org/10.1007/s40747-021-00324-x

    Article  Google Scholar 

  7. Han M, Wang R, Yang J, Xue L, Hu M (2020) Multi-scale feature network for few-shot learning. Multimed Tools Appl 79(17):11617–11637. https://doi.org/10.1007/s11042-019-08413-3

    Article  Google Scholar 

  8. He S, Han D (2020) An effective dense co-attention networks for visual question answering. Sensors 20(17):4897. https://doi.org/10.3390/s20174897

    Article  Google Scholar 

  9. He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Patt Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1007/978-3-319-10578-9_23

    Article  Google Scholar 

  10. He K, Zhang X, Ren S et al. (2016) Deep residual learning for image recognition. In proceedings of the IEEE conference on computer vision and pattern recognition: 770-778. https://doi.org/10.1109/cvpr.2016.90

  11. Hong S, Oh J, Lee H et al. (2016) Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In proceedings of the IEEE conference on computer vision and pattern recognition: 3204-3212. https://doi.org/10.1109/cvpr.2016.349

  12. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In proceedings of the IEEE conference on computer vision and pattern recognition: 7132-7141. https://doi.org/10.1109/cvpr.2018.00745

  13. Hu H, Gu J, Zhang Z et al. (2018) Relation networks for object detection. In proceedings of the IEEE conference on computer vision and pattern recognition: 3588-3597. https://doi.org/10.1109/cvpr.2018.00378

  14. Hu T, Yang P, Zhang C et al. (2019) Attention-based multi-context guiding for few-shot semantic segmentation. In proceedings of the AAAI conference on artificial intelligence: 8441-8448. https://doi.org/10.1609/aaai.v33i01.33018441

  15. Hui B, Zhu P, Hu Q et al. (2019) Self-attention relation network for few-shot learning. In 2019 IEEE international conference on Multimedia & Expo Workshops (ICMEW): 198-203. https://doi.org/10.1109/icmew.2019.00041

  16. Jégou S, Drozdzal M, Vazquez D et al. (2017) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In proceedings of the IEEE conference on computer vision and pattern recognition workshops: 11-19. https://doi.org/10.1109/cvprw.2017.156

  17. Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop 2

  18. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  19. Li X, Wei T, Chen YP et al. (2020) Fss-1000: a 1000-class dataset for few-shot segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 2869-2878. https://doi.org/10.1109/cvpr42600.2020.00294

  20. Liu W, Zhang C, Lin G et al. (2020) Crnet: cross-reference networks for few-shot segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 4165-4173. https://doi.org/10.1109/cvpr42600.2020.00422

  21. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In proceedings of the IEEE conference on computer vision and pattern recognition: 3431-3440. https://doi.org/10.1109/cvpr.2015.7298965

  22. Lu J, Yang J, Batra D et al (2016) Hierarchical question-image co-attention for visual question answering. Neural Inform Process Syst:289–297 https://arxiv.org/abs/1606.00061

  23. Lu X, Wang W, Ma C et al. (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 3623-3632. https://doi.org/10.1109/cvpr.2019.00374

  24. Nguyen D-K, Okatani T (2018) Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering. In proceedings of the IEEE conference on computer vision and pattern recognition: 6087-6096. https://doi.org/10.1109/cvpr.2018.00637

  25. Rakelly K, Shelhamer E, Darrell T et al. (2018) Conditional networks for few-shot semantic segmentation. ICLR workshop. https://openreview.net/references/pdf?id=Bkxg2F1vG

  26. Rakelly K, Shelhamer E, Darrell T et al. (2018) Few-shot segmentation propagation with guided networks. arXiv preprint arXiv:.07373. https://arxiv.org/abs/1806.07373

  27. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/tpami.2016.2577031

    Article  Google Scholar 

  28. Rodner E, Denzler J (2010) One-shot learning of object categories using dependent gaussian processes. In joint pattern recognition symposium: 232-241. https://doi.org/10.1007/978-3-642-15986-2_24

  29. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In international conference on medical image computing and computer-assisted intervention: 234-241. https://doi.org/10.1007/978-3-319-24574-4_28

  30. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  31. Shaban A, Bansal S, Liu Z et al. (2017) One-shot learning for semantic segmentation. BMVC. https://arxiv.org/abs/1709.03410

  32. Shen T, Zhou T, Long G et al (2018) Disan: directional self-attention network for rnn/cnn-free language understanding. AAAI Conf Artificial Intell 32(1) https://ojs.aaai.org/index.php/AAAI/article/view/11941

  33. Siam M, Oreshkin BN, Jagersand M (2019) Amp: adaptive masked proxies for few-shot segmentation. In proceedings of the IEEE/CVF international conference on computer vision: 5249-5258. https://doi.org/10.1109/iccv.2019.00535

  34. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556

  35. Smirnov EA, Timoshenko DM, Andrianov SN (2014) Comparison of regularization methods for imagenet classification with deep convolutional neural networks. Aasri Procedia 6:89–94. https://doi.org/10.1016/j.aasri.2014.05.013

    Article  Google Scholar 

  36. Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. Neural Inform Process Syst:4080–4090 https://arxiv.org/abs/1703.05175

  37. Sun G, Wang W, Dai J et al. (2020) Mining cross-image semantics for weakly supervised semantic segmentation. In European conference on computer vision: 347-365. https://doi.org/10.1007/978-3-030-58536-5_21

  38. Sung F, Yang Y, Zhang L et al. (2018) Learning to compare: relation network for few-shot learning. In proceedings of the IEEE conference on computer vision and pattern recognition: 1199-1208. https://doi.org/10.1109/cvpr.2018.00131

  39. Tian Z, Shen C, Chen H et al. (2019) Fcos: fully convolutional one-stage object detection. In proceedings of the IEEE/CVF international conference on computer vision: 9627-9636. https://doi.org/10.1109/iccv.2019.00972

  40. Vasan D, Alazab M, Wassan S, Safaei B, Zheng Q (2020) Image-based malware classification using ensemble of CNN architectures (IMCEC). Comp Sec 92:101748. https://doi.org/10.1016/j.cose.2020.101748

    Article  Google Scholar 

  41. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Neural Inform Process Syst:6000–6010. https://doi.org/10.5555/3295222.3295349

  42. Vinyals O, Blundell C, Lillicrap T et al (2016) Matching networks for one shot learning. Neural Inform Process Syst:630–3638 https://arxiv.org/abs/1606.04080

  43. Wang F, Jiang M, Qian C et al. (2017) Residual attention network for image classification. In proceedings of the IEEE conference on computer vision and pattern recognition: 3156-3164. https://doi.org/10.1109/cvpr.2017.683

  44. Wang W, Lu X, Shen J et al. (2019) Zero-shot video object segmentation via attentive graph neural networks. In proceedings of the IEEE/CVF international conference on computer vision: 9236-9245. https://doi.org/10.1109/iccv.2019.00933

  45. Wang Y, Yao Q, Kwok JT et al (2020) Generalizing from a few examples: a survey on few-shot learning. ACM Comput Surv 53(3):1–34. https://doi.org/10.1145/3386252

    Article  Google Scholar 

  46. Wang W, Zhou T, Yu F et al. (2021) Exploring cross-image pixel contrast for semantic segmentation. arXiv preprint arXiv:210111939

  47. Woo S, Park J, Lee J-Y et al. (2018) Cbam: convolutional block attention module. In proceedings of the European conference on computer vision (ECCV): 3-19. https://doi.org/10.1007/978-3-030-01234-2_1

  48. Wu Q, Wang P, Shen C et al. (2018) Are you talking to me? Reasoned visual dialog generation through adversarial learning. In proceedings of the IEEE conference on computer vision and pattern recognition: 6106-6115. https://doi.org/10.1109/cvpr.2018.00639

  49. Wu Z, Li Y, Guo L et al. (2019) Parn: position-aware relation networks for few-shot learning. In proceedings of the IEEE/CVF international conference on computer vision: 6659-6667. https://doi.org/10.1109/iccv.2019.00676

  50. Yang B, Liu C, Li B et al (2020) Prototype mixture models for few-shot semantic segmentation. In Eur Conf Comp:763–778

  51. Yang K, Zhang J, Reiß S et al. (2021) Capturing Omni-range context for omnidirectional segmentation. arXiv preprint arXiv:210305687

  52. Yu Z, Yu J, Cui Y et al. (2019) Deep modular co-attention networks for visual question answering. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 6281-6290. https://doi.org/10.1109/cvpr.2019.00644

  53. Zhang H, Zhang H, Wang C et al. (2019) Co-occurrent features in semantic segmentation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 548-557. https://doi.org/10.1109/cvpr.2019.00064

  54. Zhang C, Lin G, Liu F et al. (2019) Canet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 5217-5226. https://doi.org/10.1109/cvpr.2019.00536

  55. Zhang X, Wei Y, Yang Y, Huang TS (2020) SG-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans Cybern 50(9):3855–3865. https://doi.org/10.1109/tcyb.2020.2992433

    Article  Google Scholar 

  56. Zheng Z, Wang W, Qi S et al. (2019) Reasoning visual dialogs with structural and partial observations. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition: 6669-6678. https://doi.org/10.1109/cvpr.2019.00683

Download references

Acknowledgments

This research was supported by the National Natural Science Foundation of China under Grant 61806071 and 62102129, the Major Research plan of the National Natural Science Foundation of China under Grant 91746207, National Key R&D Program of China under Grant 2018YFC08, Natural Science Foundation of Hebei Province under Grants F2019202381, F2019202464, F2020202025 and F2021202030, Key Research and Development Program of Xinjiang Province under Grant 2020B03001, Open Projects Program of National Laboratory of Pattern Recognition under Grant 201900043, Technical Expert Project of Tianjin under Grants 19JCTPJC55800 and 19JCTPJC57000, and Sci-tech Research Project of Higher Education of Hebei Province under Grant QN2019207 and QN2020185.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Guo, Y., Zhu, Y. et al. Mining semantic information from intra-image and cross-image for few-shot segmentation. Multimed Tools Appl 81, 18305–18326 (2022). https://doi.org/10.1007/s11042-022-12096-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12096-8

Keywords

Navigation