A lightweight network with attention decoder for real-time semantic segmentation

Wang, Kang; Yang, Jinfu; Yuan, Shuai; Li, Mingai

doi:10.1007/s00371-021-02115-4

A lightweight network with attention decoder for real-time semantic segmentation

Original article
Published: 07 May 2021

Volume 38, pages 2329–2339, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Kang Wang¹,
Jinfu Yang¹,
Shuai Yuan¹ &
…
Mingai Li¹

893 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

As an important task in scene understanding, semantic segmentation requires a large amount of computation to achieve high performance. In recent years, with the rise of autonomous systems, it is crucial to make a trade-off in terms of accuracy and speed. In this paper, we propose a novel asymmetric encoder–decoder network structure to address this problem. In the encoder, we design a Separable Asymmetric Module, which combines depth-wise separable asymmetric convolution with dilated convolution to greatly reduce computation cost while maintaining accuracy. On the other hand, an attention mechanism is also used in the decoder to further improve segmentation performance. Experimental results on CityScapes and CamVid datasets show that the proposed method can achieve a better balance between segmentation precision and speed compared with state-of-the-art semantic segmentation methods. Specifically, our model obtains mean IoU of 72.5% and 66.3% on CityScapes and CamVid test dataset, respectively, with less than 1M parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

DAABNet: depth-wise asymmetric attention bottleneck for real-time semantic segmentation

Article 24 February 2024

EBUNet: a fast and accurate semantic segmentation network with lightweight efficient bottleneck unit

Article Open access 17 April 2023

Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion

Article 10 April 2024

References

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
He, K., Zhang, X., Ren, S., Jian, S.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al.: Going deeper with convolutions. arXiv:1409.4842 (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition (2015)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018)
Article Google Scholar
Lin, G., Milan, A., Shen, C., Reid, I.D.: RefineNet: Multi-path refinement networks for high-resolution semantic segmentation In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-oder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L.G., Hajishirzi, H.: ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: European Conference on Computer Vision (ECCV), pp. 561–580 (2018)
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147 (2016)
Poudel, R.P.K., Bonde, U., Liwicki, S., Zach, C.: ContextNet: Exploring context and detail for semantic segmentation in real-time. In: Proceedings of BMVC (2018)
Siam, M., Gamal, M., et al.: RTSeg: Real-time semantic segmentation comparative study. In: 25th IEEE International Conference on Image Processing (ICIP) (2018)
Zheng, C., Wang, J., Chen, W., et al.: Multi-class indoor semantic segmentation with deep structured model. Vis. Comput. 34, 735–747 (2018)
Article Google Scholar
Zhou, Q., Yang, W., Gao, G., et al.: Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web 22, 555–570 (2019)
Article Google Scholar
Wang, D., Hu, G., Lyu, C.: FRNet: an end-to-end feature refinement neural network for medical image segmentation. Vis. Comput. (2020)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2018)
Article Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826. IEEE (2016)
Zhou, Q., Wang, Y., Liu, J., et al.: An open-source project for real-time image semantic segmentation. Sci. China Inf. Sci. 62, 227101–227102 (2019)
Article Google Scholar
Zhou, Q., Wang, Y., Fan, Y., et al.: AGLNet: towards real-time semantic segmentation of self-driving images via attention-guided lightweight network. Appl. Soft Comput. 96, 106682–106694 (2020)
Article Google Scholar
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of the 15th European Conference, Munich, Germany, September 8–14, 2018, Part III edn, pp. 418–434 (2018)
Yu, C.,Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: European Conference on Computer Vision (ECCV), Cham, pp. 334–349 (2018)
Wang, Y., Zhou, Q., Liu, J., Xiong, J., et al.: Lednet: a lightweight encoder-decoder network for real-time semantic segmentation (2019). arXiv:1905.02423v3
Li, H., Xiong, P., Fan, H., Sun, J.: Dfanet: Deep feature amobilggregation for real-time semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2019)
Chollet, F., et al.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807. IEEE (2017)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017). arXiv:1704.04861
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 6848–6856. IEEE (2018)
Holschneider, M., Kronland-Martinet, R., Morlet, J., Tchamitchian, P.: A Real-time Algorithm for Signal Analysis with the Help of the Wavelet Transform, pp. 286–297. Springer, Cham (1990)
MATH Google Scholar
Wang, P., Chen, P., Yuan, Y., Liu, D., et al.: Understanding convolution for semantic segmentation. In: IEEE Winter Conference on Applications of Computer Vision(WACA), pp. 1451–1460 (2018)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atr-ous convolution for semantic image segmentation (2017). arXiv:1706.05587
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation (2018). arXiv:1802.02611
Yang, M., Yu, K., Zhang, C., Li, C., Yang, K.: DeepMotion: Denseaspp for semantic segmentation in street scenes. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 3684–3692. IEEE (2018)
Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 3640–3649. IEEE (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks (2017). arXiv:1709.01507
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 7794–7803. IEEE (2018)
Yu, C., Wang, J., Peng, C., Gao,C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 1857–1866. IEEE (2018)
Wu, T., Tang, S., Zhang, R., Zhang, Y.: Cgnet: A light-weight context guided network for semantic segmentation (2018). arXiv:1811.08201
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015). arXiv:1511.07122
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., et al.:The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223. IEEE (2016)
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Proceedings of ECCV, pp. 44–57 (2008)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)

Download references

Acknowledgements

This work is partly supported by the National Natural Science Foundation of China Grant no.61973009 and Beijing Natural Science Foundation under Grant no.4182009.

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Kang Wang, Jinfu Yang, Shuai Yuan & Mingai Li

Authors

Kang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jinfu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Mingai Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kang Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, K., Yang, J., Yuan, S. et al. A lightweight network with attention decoder for real-time semantic segmentation. Vis Comput 38, 2329–2339 (2022). https://doi.org/10.1007/s00371-021-02115-4

Download citation

Accepted: 22 March 2021
Published: 07 May 2021
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00371-021-02115-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A lightweight network with attention decoder for real-time semantic segmentation

Abstract

Access this article

Similar content being viewed by others

DAABNet: depth-wise asymmetric attention bottleneck for real-time semantic segmentation

EBUNet: a fast and accurate semantic segmentation network with lightweight efficient bottleneck unit

Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A lightweight network with attention decoder for real-time semantic segmentation

Abstract

Access this article

Similar content being viewed by others

DAABNet: depth-wise asymmetric attention bottleneck for real-time semantic segmentation

EBUNet: a fast and accurate semantic segmentation network with lightweight efficient bottleneck unit

Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation