Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

Liu, Yifu; Xu, Chenfeng; Chen, Zhihong; Chen, Chao; Zhao, Han; Jin, Xinyu

doi:10.1007/s11063-019-10148-z

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

Published: 24 January 2020

Volume 51, pages 2281–2299, (2020)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Yifu Liu¹,
Chenfeng Xu¹,
Zhihong Chen¹,
Chao Chen¹,
Han Zhao¹ &
…
Xinyu Jin¹

613 Accesses
8 Citations
Explore all metrics

Abstract

The fusion of multi-scale features has been an effective method to get state-of-the-art performance in semantic segmentation. In this work, we concentrate on two tricky problems—the intra-class inconsistency and the blur on the localization of object boundaries and tackle them by combining two separate multi-scale context features respectively. Specifically, we propose a dual-stream structure with the scale context selection attention module to enhance the capabilities for multi-scale processing, where one stream collects global-scale context and the other captures local-scale information. Meanwhile, the embedded scale context selection attention module in each stream can adaptively focus on different scale context information to get optimal scale features. Based on our dual-stream structure with attention modules, our network can efficiently make use of multi-scale context to generate more comprehensive and powerful features. Our experiments show that our dual-stream network with scale context selection attention module achieves promising performance on the PASCAL VOC 2012 and PASCAL-Person-Part datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CGNet: cross-guidance network for semantic segmentation

Article 16 January 2020

Multi-layer Adaptive Feature Fusion for Semantic Segmentation

Article 24 October 2019

Refine for Semantic Segmentation Based on Parallel Convolutional Network with Attention Model

Article 05 September 2021

Notes

https://github.com/YifuLiuL/DSCANet.

References

Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv e-prints, arXiv:1409.0473
Bansal A, Chen X, Russell B, Gupta A, Ramanan D (2017) PixelNet: representation of the pixels, by the pixels, and for the pixels. arXiv e-prints, arXiv:1702.06506
Buyssens P, Elmoataz A, Lézoray O (2012) Multiscale convolutional neural networks for vision-based classification of cells. In: Lee KM, Matsushita Y, Rehg JM, Hu Z (eds) Computer vision—ACCV 2012. Springer, Berlin, pp 342–352
Chen L.-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv e-prints, arXiv:1412.7062
Chen L.-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv e-prints, arXiv:1606.00915
Chen L.-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv e-prints, arXiv:1706.05587
Chen L.-C, Papandreou G, Yuille AL (2013) Learning a dictionary of shape epitomes with applications to image labeling. In: 2013 IEEE international conference on computer vision. IEEE
Chen L.-C, Yang Y, Wang J, Xu W, Yuille AL (2015) Attention to scale: scale-aware semantic image segmentation. arXiv e-prints, arXiv:1511.03339
Chen X, Mottaghi R, Liu X, Fidler S, Urtasun R, Yuille A (2014) Detect what you can: detecting and representing objects using holistic models and body parts. arXiv e-prints, arXiv:1406.2031
Chen Z, Chen C, Jin X, Liu Y, Cheng Z (2019) Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04262-1
Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Article Google Scholar
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929
Article Google Scholar
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2018) Dual attention network for scene segmentation. arXiv e-prints, arXiv:1809.02983
Ganin Y, Lempitsky V (2015) N4-fields: neural network nearest neighbor fields for image transforms. In: Cremers D, Reid I, Saito H, Yang M-H (eds) Computer vision—ACCV 2014. Springer, Cham, pp 536–551
Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Rodríguez JG (2017) A review on deep learning techniques applied to semantic segmentation. CoRR, arXiv:1704.06857
Ghiasi G, Fowlkes CC (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. arXiv e-prints, arXiv:1605.02264
Hariharan B, Arbelaez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: 2011 international conference on computer vision. IEEE
He C, Hu H (2018) Image captioning with text-based visual attention. Neural Process Lett 49(1):177–185
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR, arXiv:1406.4729
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv e-prints, page arXiv:1512.03385
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Article MathSciNet Google Scholar
Hong C, Yu J, Zhang J, Jin X, Lee K (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961
Article Google Scholar
Hu J, Shen L, Albanie S, Sun G, Wu E (2017) Squeeze-and-excitation networks. arXiv e-prints, arXiv:1709.01507
Kim J, Bukhari W, Lee M (2017) Feature analysis of unsupervised learning for multi-task classification using convolutional neural network. Neural Process Lett 47(3):783–797
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Neural Inf Process Syst 25:01
Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 2169–2178
Lee C.-Y, Xie S, Gallagher P, Zhang Z, Tu Z (2014) Deeply-supervised nets. arXiv e-prints, arXiv:1409.5185
Liang X, Shen X, Feng J, Lin L, Yan S (2016) Semantic object parsing with graph LSTM. arXiv e-prints, arXiv:1603.07063
Liang X, Shen X, Xiang D, Feng J, Lin L, Yan S (2015) Semantic object parsing with local-global long short-term memory. arXiv e-prints, arXiv:1511.04510
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Lin G, Shen C, van dan Hengel A, Reid I (2015) Efficient piecewise training of deep structured models for semantic segmentation. arXiv e-prints, arXiv:1504.01013
Liu W, Rabinovich A, Berg AC (2015) ParseNet: looking wider to see better. arXiv e-prints, arXiv:1506.04579
Liu Z, Li X, Luo P, Change Loy C, Tang X (2015) Semantic image segmentation via deep parsing network. arXiv e-prints, arXiv:1509.02634
Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. arXiv e-prints, arXiv:1406.6247
Neverova N, Wolf C, Taylor GW, Nebout F (2015) Multi-scale deep learning for gesture detection and localization. In: Agapito L, Bronstein MM, Rother C (eds) Computer vision—ECCV 2014 workshops. Springer, Cham, pp 474–490
Chapter Google Scholar
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: 2015 IEEE international conference on computer vision (ICCV). IEEE
Papandreou G, Chen L.-C, Murphy K, Yuille AL (2015) Weakly- and semi-supervised learning of a DCNN for semantic image segmentation. arXiv e-prints, arXiv:1502.02734
Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters—improve semantic segmentation by global convolutional network. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Pohlen T, Hermans A, Mathias M, Leibe B (2017) Full-resolution residual networks for semantic segmentation in street scenes. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Article Google Scholar
Shuai B, Zuo Z, Wang B, Wang G (2018) Scene segmentation with DAG-recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 40(6):1480–1493
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv e-prints, arXiv:1706.03762
Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2017) Understanding convolution for semantic segmentation. arXiv e-prints, arXiv:1702.08502
Wang X, Girshick RB, Gupta A, He K (2017) Non-local neural networks. CoRR, arXiv:1711.07971
Woo S, Park J, Lee J.-Y, Kweon IS (2018) CBAM: convolutional block attention module. arXiv e-prints, arXiv:1807.06521
Xia F, Wang P, Chen L.-C, Yuille AL (2015) Zoom better to see clearer: human and object parsing with hierarchical auto-zoom net. arXiv e-prints, arXiv:1511.06881
Xiao Y, Codevilla F, Gurram A, Urfalioglu O, López AM (2019) Multimodal end-to-end autonomous driving. arXiv e-prints, arXiv:1906.03199
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. arXiv e-prints, arXiv:1502.03044
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. arXiv e-prints, arXiv:1804.09337
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Article MathSciNet Google Scholar
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2932058
Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024
Article Google Scholar
Yu J, Zhang B, Kuang Z, Lin D, Fan J (2017) iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Forensics Secur 12(5):1005–1016
Article Google Scholar
Yu J, Zhu C, Zhang J, Huang Q, Tao D (2019) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2908982
Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. arXiv e-prints, arXiv:1803.08904
Zhang J, Yu J, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27(5):2420–2432
Article MathSciNet Google Scholar
Zhang W, Hu H, Hu H (2018) Training visual-semantic embedding network for boosting automatic image annotation. Neural Process Lett 48(3):1503–1519
Article Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. CoRR, arXiv:1612.01105
Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) Psanet: point-wise spatial attention network for scene parsing. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer, Cham, pp 270–286
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. arXiv e-prints, arXiv:1502.03240

Download references

Acknowledgements

This work was supported by the Opening Foundation of the State Key Laboratory (No. 2014KF06), and the National Science and Technology Major Project (No. 2013ZX03005013).

Author information

Authors and Affiliations

Institution of Information Science and Electrical Engineering, Zhejiang University, Hangzhou, 310037, Zhejiang, China
Yifu Liu, Chenfeng Xu, Zhihong Chen, Chao Chen, Han Zhao & Xinyu Jin

Authors

Yifu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chenfeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhihong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Han Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinyu Jin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Xu, C., Chen, Z. et al. Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation. Neural Process Lett 51, 2281–2299 (2020). https://doi.org/10.1007/s11063-019-10148-z

Download citation

Published: 24 January 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11063-019-10148-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

Abstract

Access this article

Similar content being viewed by others

CGNet: cross-guidance network for semantic segmentation

Multi-layer Adaptive Feature Fusion for Semantic Segmentation

Refine for Semantic Segmentation Based on Parallel Convolutional Network with Attention Model

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

Abstract

Access this article

Similar content being viewed by others

CGNet: cross-guidance network for semantic segmentation

Multi-layer Adaptive Feature Fusion for Semantic Segmentation

Refine for Semantic Segmentation Based on Parallel Convolutional Network with Attention Model

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation