Skip to main content
Log in

ELANet: Effective Lightweight Attention-Guided Network for Real-Time Semantic Segmentation

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Deep neural networks have greatly facilitated the applications of semantic segmentation. However, most of the existing neural networks bring massive calculations with lots of model parameters for achieving a higher precision, which is unaffordable for resource-constrained edge devices. To achieve an appropriate trade-off between computing efficiency and segmentation accuracy, we proposed an effective lightweight attention-guided network (ELANet) for real-time semantic segmentation based on an asymmetrical encoder–decoder framework in this work. In the encoding phase, we combined atrous convolution and depth-wise convolution to design two types of effective context guidance blocks to learn contextual semantic information. A refined feature fusion module with a dual attention-guided fusion (DAF) unit was developed in the decoder to exploit different levels of features. Without any pretraining, we estimated the performance of multi-attention ELANet with extensive experiments on the Cityscapes dataset with an input resolution of 512\(\times \)1024, resulting in 75.4% mIoU and 83 FPS inference speed with only 0.76 M parameters and 10.34 GFLOPs on a single 3090 GPU. The code is publicly available at https://github.com/DGS666/ELANet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670

    Article  MathSciNet  MATH  Google Scholar 

  2. Hong C, Yu J, Zhang J, Jin X, Lee K-H (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961

    Article  Google Scholar 

  3. Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578

    Article  Google Scholar 

  4. Yu J, Yao J, Zhang J, Yu Z, Tao D (2020) SPRNet: single-pixel reconstruction for one-stage instance segmentation. IEEE Trans Cybern 51(4):1731–1742

    Article  Google Scholar 

  5. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  6. Siam M, Gamal M, Abdel-Razek M, Yogamani S, Jagersand M, Zhang H (2018) A comparative study of real-time semantic segmentation for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 587–597

  7. Siam M, Elkerdawy S, Jagersand M, Yogamani S (2017) Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges. In: 2017 IEEE 20th international conference on intelligent transportation systems (ITSC). IEEE, pp 1–8

  8. Bovcon B, Perš J, Kristan M et al (2018) Stereo obstacle detection for unmanned surface vehicles by imu-assisted semantic segmentation. Robot Auton Syst 104:1–13

    Article  Google Scholar 

  9. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017)MobileNets: efficient convolutional neural networks for mobile vision applications. Preprint at arXiv:1704.04861

  10. Zhang X, Zhou X, Lin M, Sun J (2018) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

  11. Romera E, Alvarez JM, Bergasa LM, Arroyo R (2017) ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272

    Article  Google Scholar 

  12. Li G, Yun I, Kim J, Kim J (2019) DABNet: depth-wise asymmetric bottleneck for real-time semantic segmentation. Preprint at arXiv:1907.11357

  13. Zhang X, Du B, Wu Z, Wan T (2022) LAANet: lightweight attention-guided asymmetric network for real-time semantic segmentation. Neural Comput Appl:1–15

  14. Li Y, Li X, Xiao C, Li H, Zhang W (2021) EACNet: enhanced asymmetric convolution for real-time semantic segmentation. IEEE Signal Process Lett 28:234–238

    Article  Google Scholar 

  15. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: 4th international conference on learning representations

  16. Li Y, Li M, Li Z, Xiao C, Li H (2022) EFRNet: efficient feature reuse network for real-time semantic segmentation. Neural Process Lett:1–13

  17. Elhassan MA, Huang C, Yang C, Munea TL (2021) DSANet: dilated spatial attention for real-time semantic segmentation in urban street scenes. Expert Syst Appl 183:115090

    Article  Google Scholar 

  18. Lin G, Milan A, Shen C, Reid I (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934

  19. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241

  20. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7132–7141

  21. Wang Q, Wu B, Zhu PF, Li P, Zuo W, Hu Q (2020) ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11531–11539

  22. Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  23. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223

  24. Brostow GJ, Fauqueur J, Cipolla R (2009) Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett 30(2):88–97

    Article  Google Scholar 

  25. Caesar H, Uijlings J, Ferrari V (2018) Coco-stuff: thing and stuff classes in context. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1209–1218

  26. Peng G, Yang S, Wang H (2021) Refine for semantic segmentation based on parallel convolutional network with attention model. Neural Process Lett 53(6):4177–4188

    Article  Google Scholar 

  27. Paszke A, Chaurasia A, Kim S, Culurciello E (2016) ENet: a deep neural network architecture for real-time semantic segmentation. Preprint arXiv:1606.02147

  28. Li G, Li L, Zhang J (2021) BiAttnNet: bilateral attention for improving real-time semantic segmentation. IEEE Signal Process Lett 29:46–50

    Article  Google Scholar 

  29. Wang Y, Zhou Q, Liu J, Xiong J, Gao G, Wu X, Latecki LJ (2019) LEDNet: a lightweight encoder-decoder network for real-time semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1860–1864

  30. Zhang J, Cao Y, Wu Q (2021) Vector of locally and adaptively aggregated descriptors for image feature representation. Pattern Recogn 116:107952

    Article  Google Scholar 

  31. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. Preprint at arXiv:2010.11929

  32. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  33. Yang X, Li S, Chen Z, Chanussot J, Jia X, Zhang B, Li B, Chen P (2021) An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery. ISPRS J Photogramm Remote Sens 177:238–262

    Article  Google Scholar 

  34. Wu T, Tang S, Zhang R, Cao J, Zhang Y (2020) CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans Image Process 30:1169–1179

    Article  Google Scholar 

  35. Hao X, Hao X, Zhang Y, Li Y, Wu C (2021) Real-time semantic segmentation with weighted factorized-depthwise convolution. Image Vis Comput 114:104269

    Article  Google Scholar 

  36. Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning. PMLR, pp 1139–1147

  37. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Preprint at arXiv:1412.6980

  38. Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 552–568

  39. Zhuang M, Zhong X, Gu D, Feng L, Zhong X, Hu H (2021) LRDNet: a lightweight and efficient network with refined dual attention decorder for real-time semantic segmentation. Neurocomputing 459:349–360

    Article  Google Scholar 

  40. Zhou Q, Wang Y, Fan Y, Wu X, Zhang S, Kang B, Latecki LJ (2020) AGLNet: towards real-time semantic segmentation of self-driving images via attention-guided lightweight network. Appl Soft Comput 96:106682

    Article  Google Scholar 

  41. Lu M, Chen Z, Wu QJ, Wang N, Rong X, Yan X (2020) FRNet: factorized and regular blocks network for semantic segmentation in road scene. IEEE Trans Intell Transp Syst

  42. Liu J, Zhou Q, Qiang Y, Kang B, Wu X, Zheng B (2020) FDDWNet: a lightweight convolutional neural network for real-time semantic segmentation. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2373–2377

  43. Jiang W, Xie Z, Li Y, Liu C, Lu H (2020) LRNNET: a light-weighted network with efficient reduced non-local operation for real-time semantic segmentation. In: 2020 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 1–6

  44. Yu C, Xiao B, Gao C, Yuan L, Zhang L, Sang N, Wang J (2021) Lite-HRNet: a lightweight high-resolution network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10440–10450

  45. Liu J, Xu X, Shi Y, Deng C, Shi M (2022) RELAXNet: residual efficient learning and attention expected fusion network for real-time semantic segmentation. Neurocomputing 474:115–127

    Article  Google Scholar 

  46. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325–341

  47. Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  48. Li H, Xiong P, Fan H, Sun J (2019) DFAnet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9522–9531

  49. Zhang X-L, Du B-C, Luo Z-C, Ma K (2022) Lightweight and efficient asymmetric network design for real-time semantic segmentation. Appl Intell 52(1):564–579

    Article  Google Scholar 

  50. Zhao H, Qi X, Shen X, Shi J, Jia J (2018) ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 405–420

  51. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Guangzhou Leading Talents in Innovation and Entrepreneurship under Grant No. 2019019, the National Natural Science Foundation of China under Grant No. 62002134, Guangdong Basic and Applied Basic Research Foundation under Grant No. 2020A1515110645, the Key Laboratory of New Semiconductors and Devices of Guangdong Higher Education Institutes under Grant No. 2021KSY001, the Fundamental Research Funds for the Central Universities at Jinan University under Grant No. 21620353, and the JNU-Techtotop Joint Foundation of Postgraduates Training Base under Grant No. 82621176.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aiwen Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yi, Q., Dai, G., Shi, M. et al. ELANet: Effective Lightweight Attention-Guided Network for Real-Time Semantic Segmentation. Neural Process Lett 55, 6425–6442 (2023). https://doi.org/10.1007/s11063-023-11145-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-023-11145-z

Keywords

Navigation