Skip to main content
Log in

Lightweight and efficient asymmetric network design for real-time semantic segmentation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the increasing demand for application scenarios such as autonomous driving and drone aerial photography, it has become a challenging problem that how to achieve the best trade-off between segmentation accuracy and inference speed while reducing the number of parameters. In this paper, a lightweight and efficient asymmetric network (LEANet) for real-time semantic segmentation is proposed to address this problem. Specifically, LEANet adopts an asymmetric encoder-decoder architecture. In the encoder, a depth-wise asymmetric bottleneck module with separation and shuffling operations (SS-DAB module) is proposed to jointly extract local and context information. In the decoder, a pyramid pooling module based on channel-wise attention (CA-PP module) is proposed to aggregate multi-scale context information and guide feature selection. Without any pre-training and post-processing, LEANet respectively achieves the accuracy of 71.9% and 67.5% mean Intersection over Union (mIoU) with the speed of 77.3 and 98.6 Frames Per Second (FPS) on the Cityscapes and CamVid test sets. These experimental results show that LEANet achieves an optimal trade-off between segmentation accuracy and inference speed with only 0.74 million parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N (2021) Image segmentation using deep learning. A Survey. IEEE Trans Pattern Anal Mach Intell, https://doi.org/10.1109/TPAMI.2021.3059968

  2. Wu J, Jiao J, Yang Q, Zha ZJ (2019) Ground-aware point cloud semantic segmentation for autonomous driving. In: MM 2019 - Proceedings of the 27th ACM international conference on multimedia, pp 971–979

  3. Chen C, Wang G (2020) IOSUDA: an unsupervised domain adaptation with input and output space alignment for joint optic disc and cup segmentation. Appl Intell, https://doi.org/10.1007/s10489-020-01956-1

  4. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440

  5. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  6. Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587

  7. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  8. Lin G, Milan A, Shen C, Reid I (2017) RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5168–5177

  9. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6230–6239

  10. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3213–3223

  11. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  12. Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147

  13. Zhao H, Qi X, Shen X, Shi J, Jia J (2018) ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of the european conference on computer vision (ECCV), pp 418–434

  14. Romera E, Alvarez JM, Bergasa LM, Arroyo R (2018) Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272

    Article  Google Scholar 

  15. Wu T, Tang S, Zhang R, Rui Z, Zhang Y (2021) CGNEt: A light-weight context guided network for semantic segmentation. IEEE Trans Image Process 30:1169–1179

    Article  Google Scholar 

  16. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) BiSeNet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the european conference on computer vision (ECCV), pp 334–349

  17. Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2020) BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation. arXiv:2004.02147

  18. Poudel RPK, Liwicki S, Cipolla R (2019) Fast-SCNN: Fast Semantic Segmentation Network. arXiv:1902.04502

  19. Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the european conference on computer vision (ECCV), pp 561–580

  20. Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) ESPNetv2: A light-weight, power efficient, and general purpose convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9182–9192

  21. Li H, Xiong P, Fan H, Sun J (2019) DFANet: Deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 9514–9523

  22. Wang Y, Zhou Q, Liu J, Xiong J, Latecki LJ (2019) Lednet: A lightweight encoder-decoder network for real-time semantic segmentation. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 1860–1864

  23. Wang Y, Zhou Q, Wu X (2019) ESNet: An efficient symmetric network for real-time semantic segmentation. In: Proceedings of the european conference on computer vision (ECCV), pp 41–52

  24. Li G, Yun I, Kim J, Kim J (2019) DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation. arXiv:1907.11357

  25. Liu M, Yin H (2019) Feature Pyramid Encoding Network for Real-time Semantic Segmentation. arXiv:1909.08599

  26. Liu J, Zhou Q, Qiang Y, Kang B, Zheng B (2020) FDDWNet: A lightweight convolutional neural network for real-time semantic segmentation. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2373–2377

  27. Wang J, Xiong H, Wang H, Nian X (2020) ADSCNEt: Asymmetric depthwise separable convolution for semantic segmentation in real-time. Appl Intell 50(4):1045–1056

    Article  Google Scholar 

  28. Brostow GJ, Shotton J, Fauqueur J, Cipolla R (2008) Segmentation and recognition using structure from motion point clouds. In: Proceedings of the european conference on computer vision (ECCV), pp 44–57

  29. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: 4th International conference on learning representations (ICLR)

  30. Yang X, Wu Y, Zhao J, Liu F (2020) Dense dual-path network for real-time semantic segmentation. arXiv:2010.10778

  31. Zhu H, Wang B, Zhang X, Liu J (2020) Semantic image segmentation with shared decomposition convolution and boundary reinforcement structure. Appl Intell 50(9):2676–2689

    Article  Google Scholar 

  32. Szegedy C, Vanhoucke V, Ioffe S, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826

  33. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings - 30th IEEE conference on computer vision and pattern recognition (CVPR), pp 1800–1807

  34. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  35. Zhang X, Zhou X, Lin M, Sun J (2018) ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6848–6856

  36. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  37. Yang M, Yu K, Chi Z, Li Z, Yang K (2018) DenseASPP for semantic segmentation in street scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3684–3692

  38. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269

  39. Emara T, Abd El Munim HE, Abbas HM (2019) Liteseg: A Novel Lightweight ConvNet for Semantic Segmentation. Digital Image Computing: Techniques and Applications (DICTA), 1–7

  40. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation Networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023

    Article  Google Scholar 

  41. Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) PSANet: Point-wise spatial attention network for scene parsing. In: Proceedings of the european conference on computer vision (ECCV), pp 1–6

  42. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1857–1866

  43. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3141–3149

  44. Mostafa G, Mennatullah S, Moemen AR (2018) ShuffleSeg: Real-time semantic segmentation network. arXiv:1803.03816

  45. Hao S, Zhou Y, Guo Y, Hong R (2020) Bi-direction context propagation network for real-time semantic segmentation. arXiv:2005.11034

  46. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  47. Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3080–3089

  48. Jiang W, Xie Z, Li Y, Liu C, Lu H (2020) LRNNet: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation. arXiv:2006.02706

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China under Grant 2018YFB1702300. The author thanks the teachers and classmates for their guidance and help with this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiu-Ling Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, XL., Du, BC., Luo, ZC. et al. Lightweight and efficient asymmetric network design for real-time semantic segmentation. Appl Intell 52, 564–579 (2022). https://doi.org/10.1007/s10489-021-02437-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02437-9

Keywords

Navigation