Skip to main content
Log in

A lightweight network with attention decoder for real-time semantic segmentation

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

As an important task in scene understanding, semantic segmentation requires a large amount of computation to achieve high performance. In recent years, with the rise of autonomous systems, it is crucial to make a trade-off in terms of accuracy and speed. In this paper, we propose a novel asymmetric encoder–decoder network structure to address this problem. In the encoder, we design a Separable Asymmetric Module, which combines depth-wise separable asymmetric convolution with dilated convolution to greatly reduce computation cost while maintaining accuracy. On the other hand, an attention mechanism is also used in the decoder to further improve segmentation performance. Experimental results on CityScapes and CamVid datasets show that the proposed method can achieve a better balance between segmentation precision and speed compared with state-of-the-art semantic segmentation methods. Specifically, our model obtains mean IoU of 72.5% and 66.3% on CityScapes and CamVid test dataset, respectively, with less than 1M parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  2. He, K., Zhang, X., Ren, S., Jian, S.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)

  3. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al.: Going deeper with convolutions. arXiv:1409.4842 (2014)

  4. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition (2015)

  5. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018)

    Article  Google Scholar 

  6. Lin, G., Milan, A., Shen, C., Reid, I.D.: RefineNet: Multi-path refinement networks for high-resolution semantic segmentation In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  7. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-oder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  8. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  9. Mehta, S., Rastegari, M., Caspi, A., Shapiro, L.G., Hajishirzi, H.: ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: European Conference on Computer Vision (ECCV), pp. 561–580 (2018)

  10. Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147 (2016)

  11. Poudel, R.P.K., Bonde, U., Liwicki, S., Zach, C.: ContextNet: Exploring context and detail for semantic segmentation in real-time. In: Proceedings of BMVC (2018)

  12. Siam, M., Gamal, M., et al.: RTSeg: Real-time semantic segmentation comparative study. In: 25th IEEE International Conference on Image Processing (ICIP) (2018)

  13. Zheng, C., Wang, J., Chen, W., et al.: Multi-class indoor semantic segmentation with deep structured model. Vis. Comput. 34, 735–747 (2018)

    Article  Google Scholar 

  14. Zhou, Q., Yang, W., Gao, G., et al.: Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web 22, 555–570 (2019)

    Article  Google Scholar 

  15. Wang, D., Hu, G., Lyu, C.: FRNet: an end-to-end feature refinement neural network for medical image segmentation. Vis. Comput. (2020)

  16. Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2018)

    Article  Google Scholar 

  17. Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826. IEEE (2016)

  18. Zhou, Q., Wang, Y., Liu, J., et al.: An open-source project for real-time image semantic segmentation. Sci. China Inf. Sci. 62, 227101–227102 (2019)

    Article  Google Scholar 

  19. Zhou, Q., Wang, Y., Fan, Y., et al.: AGLNet: towards real-time semantic segmentation of self-driving images via attention-guided lightweight network. Appl. Soft Comput. 96, 106682–106694 (2020)

    Article  Google Scholar 

  20. Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of the 15th European Conference, Munich, Germany, September 8–14, 2018, Part III edn, pp. 418–434 (2018)

  21. Yu, C.,Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: European Conference on Computer Vision (ECCV), Cham, pp. 334–349 (2018)

  22. Wang, Y., Zhou, Q., Liu, J., Xiong, J., et al.: Lednet: a lightweight encoder-decoder network for real-time semantic segmentation (2019). arXiv:1905.02423v3

  23. Li, H., Xiong, P., Fan, H., Sun, J.: Dfanet: Deep feature amobilggregation for real-time semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2019)

  24. Chollet, F., et al.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807. IEEE (2017)

  25. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017). arXiv:1704.04861

  26. Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 6848–6856. IEEE (2018)

  27. Holschneider, M., Kronland-Martinet, R., Morlet, J., Tchamitchian, P.: A Real-time Algorithm for Signal Analysis with the Help of the Wavelet Transform, pp. 286–297. Springer, Cham (1990)

    MATH  Google Scholar 

  28. Wang, P., Chen, P., Yuan, Y., Liu, D., et al.: Understanding convolution for semantic segmentation. In: IEEE Winter Conference on Applications of Computer Vision(WACA), pp. 1451–1460 (2018)

  29. Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atr-ous convolution for semantic image segmentation (2017). arXiv:1706.05587

  30. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation (2018). arXiv:1802.02611

  31. Yang, M., Yu, K., Zhang, C., Li, C., Yang, K.: DeepMotion: Denseaspp for semantic segmentation in street scenes. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 3684–3692. IEEE (2018)

  32. Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 3640–3649. IEEE (2016)

  33. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks (2017). arXiv:1709.01507

  34. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 7794–7803. IEEE (2018)

  35. Yu, C., Wang, J., Peng, C., Gao,C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 1857–1866. IEEE (2018)

  36. Wu, T., Tang, S., Zhang, R., Zhang, Y.: Cgnet: A light-weight context guided network for semantic segmentation (2018). arXiv:1811.08201

  37. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015). arXiv:1511.07122

  38. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., et al.:The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223. IEEE (2016)

  39. Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Proceedings of ECCV, pp. 44–57 (2008)

  40. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)

Download references

Acknowledgements

This work is partly supported by the National Natural Science Foundation of China Grant no.61973009 and Beijing Natural Science Foundation under Grant no.4182009.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kang Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, K., Yang, J., Yuan, S. et al. A lightweight network with attention decoder for real-time semantic segmentation. Vis Comput 38, 2329–2339 (2022). https://doi.org/10.1007/s00371-021-02115-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02115-4

Keywords

Navigation