Abstract
As an important task in scene understanding, semantic segmentation requires a large amount of computation to achieve high performance. In recent years, with the rise of autonomous systems, it is crucial to make a trade-off in terms of accuracy and speed. In this paper, we propose a novel asymmetric encoder–decoder network structure to address this problem. In the encoder, we design a Separable Asymmetric Module, which combines depth-wise separable asymmetric convolution with dilated convolution to greatly reduce computation cost while maintaining accuracy. On the other hand, an attention mechanism is also used in the decoder to further improve segmentation performance. Experimental results on CityScapes and CamVid datasets show that the proposed method can achieve a better balance between segmentation precision and speed compared with state-of-the-art semantic segmentation methods. Specifically, our model obtains mean IoU of 72.5% and 66.3% on CityScapes and CamVid test dataset, respectively, with less than 1M parameters.
Similar content being viewed by others
References
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
He, K., Zhang, X., Ren, S., Jian, S.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al.: Going deeper with convolutions. arXiv:1409.4842 (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition (2015)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018)
Lin, G., Milan, A., Shen, C., Reid, I.D.: RefineNet: Multi-path refinement networks for high-resolution semantic segmentation In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-oder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L.G., Hajishirzi, H.: ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: European Conference on Computer Vision (ECCV), pp. 561–580 (2018)
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147 (2016)
Poudel, R.P.K., Bonde, U., Liwicki, S., Zach, C.: ContextNet: Exploring context and detail for semantic segmentation in real-time. In: Proceedings of BMVC (2018)
Siam, M., Gamal, M., et al.: RTSeg: Real-time semantic segmentation comparative study. In: 25th IEEE International Conference on Image Processing (ICIP) (2018)
Zheng, C., Wang, J., Chen, W., et al.: Multi-class indoor semantic segmentation with deep structured model. Vis. Comput. 34, 735–747 (2018)
Zhou, Q., Yang, W., Gao, G., et al.: Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web 22, 555–570 (2019)
Wang, D., Hu, G., Lyu, C.: FRNet: an end-to-end feature refinement neural network for medical image segmentation. Vis. Comput. (2020)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2018)
Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826. IEEE (2016)
Zhou, Q., Wang, Y., Liu, J., et al.: An open-source project for real-time image semantic segmentation. Sci. China Inf. Sci. 62, 227101–227102 (2019)
Zhou, Q., Wang, Y., Fan, Y., et al.: AGLNet: towards real-time semantic segmentation of self-driving images via attention-guided lightweight network. Appl. Soft Comput. 96, 106682–106694 (2020)
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of the 15th European Conference, Munich, Germany, September 8–14, 2018, Part III edn, pp. 418–434 (2018)
Yu, C.,Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: European Conference on Computer Vision (ECCV), Cham, pp. 334–349 (2018)
Wang, Y., Zhou, Q., Liu, J., Xiong, J., et al.: Lednet: a lightweight encoder-decoder network for real-time semantic segmentation (2019). arXiv:1905.02423v3
Li, H., Xiong, P., Fan, H., Sun, J.: Dfanet: Deep feature amobilggregation for real-time semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2019)
Chollet, F., et al.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807. IEEE (2017)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017). arXiv:1704.04861
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 6848–6856. IEEE (2018)
Holschneider, M., Kronland-Martinet, R., Morlet, J., Tchamitchian, P.: A Real-time Algorithm for Signal Analysis with the Help of the Wavelet Transform, pp. 286–297. Springer, Cham (1990)
Wang, P., Chen, P., Yuan, Y., Liu, D., et al.: Understanding convolution for semantic segmentation. In: IEEE Winter Conference on Applications of Computer Vision(WACA), pp. 1451–1460 (2018)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atr-ous convolution for semantic image segmentation (2017). arXiv:1706.05587
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation (2018). arXiv:1802.02611
Yang, M., Yu, K., Zhang, C., Li, C., Yang, K.: DeepMotion: Denseaspp for semantic segmentation in street scenes. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 3684–3692. IEEE (2018)
Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 3640–3649. IEEE (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks (2017). arXiv:1709.01507
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 7794–7803. IEEE (2018)
Yu, C., Wang, J., Peng, C., Gao,C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: IEEE/CVF International Conference on Computer Vision(CVPR), pp. 1857–1866. IEEE (2018)
Wu, T., Tang, S., Zhang, R., Zhang, Y.: Cgnet: A light-weight context guided network for semantic segmentation (2018). arXiv:1811.08201
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015). arXiv:1511.07122
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., et al.:The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223. IEEE (2016)
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Proceedings of ECCV, pp. 44–57 (2008)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Acknowledgements
This work is partly supported by the National Natural Science Foundation of China Grant no.61973009 and Beijing Natural Science Foundation under Grant no.4182009.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, K., Yang, J., Yuan, S. et al. A lightweight network with attention decoder for real-time semantic segmentation. Vis Comput 38, 2329–2339 (2022). https://doi.org/10.1007/s00371-021-02115-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02115-4