Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes

Liu, Qian; Li, Zhensheng; Qi, Youwei; Wang, Cunbao

doi:10.1007/s00530-024-01429-2

Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes

Regular Paper
Published: 23 July 2024

Volume 30, article number 213, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Qian Liu¹,
Zhensheng Li¹,
Youwei Qi¹ &
…
Cunbao Wang¹

166 Accesses
Explore all metrics

Abstract

Semantic segmentation of street scenes is important for the vision-based application of autonomous driving. Recently, high-accuracy networks based on deep learning have been widely applied to semantic segmentation, but their inference speeds are slow. In order to achieve faster speed, most popular real-time network architectures adopt stepwise downsampling operation in the backbone to obtain features with different sizes. However, they ignore the misalignment between feature maps from different levels, and their simple feature aggregation using element-wise addition or channel-wise concatenation may submerge the useful information in a large number of useless information. To deal with these problems, we propose a gated feature aggregation and alignment network (GFAANet) for real-time semantic segmentation of street scenes. In GFAANet, a feature alignment aggregation module is developed to effectively align and aggregate the feature maps from different levels. And we present a gated feature aggregation module to selectively aggregate and refine effective information from multi-stage features of the backbone network using gates. Furthermore, a depthwise separable pyramid pooling module based on low-resolution feature maps is designed as a context extractor to expand the effective receptive fields and fuse multi-scale contexts. Experimental results on two challenging street scene benchmark datasets show that GFAANet achieves highest accuracy in real-time semantic segmentation of street scenes, as compared with the state-of-the-art. We conclude that our GFAANet can quickly and effectively segment street scene images, which may provide technical support for autonomous driving.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DRMNet: more efficient bilateral networks for real-time semantic segmentation of road scenes

Article 13 November 2024

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

Article 05 March 2024

HRDLNet: a semantic segmentation network with high resolution representation for urban street view images

Article Open access 05 August 2024

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

Cityscapes [36] and CamVid [37] datasets are used during the current study, where Cityscapes dataset is available at https://www.cityscapes-dataset.com/, and CamVid dataset is available at http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/.

References

Zhang, X., Cao, X., Wang, J., et al.: G-unext: a lightweight mlp-based network for reducing semantic gap in medical image segmentation. Multimed. Syst. 29(6), 3431–3446 (2023)
Article Google Scholar
Kampffmeyer, M., Salberg, AB., Jenssen, R.: Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In: IEEE conference on computer vision and pattern recognition workshops, pp 680–688 (2016)
Xu, H., Gao, Y., Yu, F., et al.: End-to-end learning of driving models from large-scale video datasets. In: IEEE conference on computer vision and pattern recognition, pp 2174–2182 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 3431–3440 (2015)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations, pp 2–4 (2016)
Chen, L.C., Papandreou, G., Kokkinos, I., et al.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Zhao, H., Shi, J., Qi, X., et al.: Pyramid scene parsing network. In: IEEE conference on computer vision and pattern recognition, pp 2881–2890 (2017)
Chen, LC., Zhu, Y., Papandreou, G., et al.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European conference on computer vision, pp 801–818 (2018)
Pang, Y., Li, Y., Shen, J., et al.: Towards bridging semantic gap to improve semantic segmentation. In: IEEE/CVF International Conference on Computer Vision, pp 4230–4239 (2019)
Huang, Z., Wang, X., Huang, L., et al.: Ccnet: Criss-cross attention for semantic segmentation. In: IEEE/CVF international conference on computer vision, pp 603–612 (2019)
Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In: IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154 (2019)
Chen, W., Zhu, X., Sun, R., et al.: Tensor low-rank reconstruction for semantic segmentation. In: European conference on computer vision, pp 52–69 (2020)
Liu, Y., Chen, Y., Lasang, P., et al.: Covariance attention for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 44(4), 1805–1818 (2020)
Google Scholar
Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: European conference on computer vision, pp 173–190 (2020)
Ji, J., Shi, R., Li, S., et al.: Encoder-decoder with cascaded crfs for semantic segmentation. IEEE Trans. Circ. Syst. Video Technol. 31(5), 1926–1938 (2020)
Article Google Scholar
Hou Q, Zhang L, Cheng MM, et al.: Strip pooling: rethinking spatial pooling for scene parsing. In: IEEE/CVF conference on computer vision and pattern recognition, pp 4003–4012 (2020)
Li, L., Zhou, T., Wang, W., et al.: Deep hierarchical semantic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1246–1257 (2022)
Zhang, Y., Pang, B., Lu, C.: Semantic segmentation by early region proxy. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1258–1268 (2022)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Romera, E., Alvarez, J.M., Bergasa, L.M., et al.: Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)
Article Google Scholar
Mehta, S., Rastegari, M., Caspi, A., et al.: Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: European conference on computer vision, pp 552–568 (2018)
Zhao, H., Qi, X., Shen, X., et al.: Icnet for real-time semantic segmentation on high-resolution images. In: European conference on computer vision, pp 405–420 (2018)
Yu, C., Wang, J., Peng, C., et al.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: European conference on computer vision, pp 325–341 (2018)
Lo, SY., Hang, HM., Chan, SW., et al.: Efficient dense modules of asymmetric convolution for real-time semantic segmentation. In: ACM multimedia Asia, pp 1–6 (2019)
Wang, Y., Zhou, Q., Liu, J., et al.: Lednet: a lightweight encoder-decoder network for real-time semantic segmentation. In: IEEE international conference on image processing, pp 1860–1864 (2019)
Orsic, M., Kreso, I., Bevandic, P., et al.: In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12607–12616 (2019)
Li, H., Xiong, P., Fan, H., et al.: Dfanet: Deep feature aggregation for real-time semantic segmentation. In: IEEE/CVF conference on computer vision and pattern recognition, pp 9522–9531 (2019)
Jiang, W., Xie, Z., Li, Y., et al.: Lrnnet: a light-weighted network with efficient reduced non-local operation for real-time semantic segmentation. In: IEEE international conference on multimedia and expo workshops, pp 1–6 (2020)
Yu, C., Gao, C., Wang, J., et al.: Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 129, 3051–3068 (2021)
Article Google Scholar
Nirkin, Y., Wolf, L., Hassner, T.: Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4061–4070 (2021)
Wu, Y., Jiang, J., Huang, Z., et al.: Fpanet: feature pyramid aggregation network for real-time semantic segmentation. Appl. Intell. 52(3), 3319–3336 (2022)
Article Google Scholar
Lu, M., Chen, Z., Liu, C., et al.: Mfnet: multi-feature fusion network for real-time semantic segmentation in road scenes. IEEE Trans. Intell. Transp. Syst. 23(11), 20991–21003 (2022)
Article Google Scholar
Mazzini, D.: Guided upsampling network for real-time semantic segmentation. In: British Machine Vision Conference, pp 117–125 (2018)
Li, X., You, A., Zhu, Z., et al.: Semantic flow for fast and accurate scene parsing. In: European conference on computer vision, pp 775–793 (2020)
Huang, Z., Wei, Y., Wang, X., et al.: Alignseg: feature-aligned segmentation networks. IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 550–557 (2021)
Google Scholar
Cordts, M., Omran, M., Ramos, S., et al: The cityscapes dataset for semantic urban scene understanding. In: IEEE conference on computer vision and pattern recognition, pp 3213–3223 (2016)
Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high-definition ground truth database. Pattern Recog. Lett. 30(2), 88–97 (2009)
Article Google Scholar
He, K., Zhang, X., Ren, S., et al: Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778 (2016)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE conference on computer vision and pattern recognition, pp 1251–1258 (2017)
Sandler, M., Howard, A., Zhu, M., et al: Mobilenetv2: inverted residuals and linear bottlenecks. In: IEEE conference on computer vision and pattern recognition, pp 4510–4520 (2018)

Download references

Funding

This work was fully supported by National Natural Science Foundation of China under Project no. 61903195.

Author information

Authors and Affiliations

School of Artificial Intelligence (School of Future Technology), Nanjing University of Information Science & Technology, Nanjing, 210044, China
Qian Liu, Zhensheng Li, Youwei Qi & Cunbao Wang

Authors

Qian Liu
View author publications
You can also search for this author inPubMed Google Scholar
Zhensheng Li
View author publications
You can also search for this author inPubMed Google Scholar
Youwei Qi
View author publications
You can also search for this author inPubMed Google Scholar
Cunbao Wang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Qian Liu: conceptualization, methodology, supervision, writing—original draft, writing—review and editing. Zhensheng Li: Formal analysis, software, investigation, validation, visualization, writing—original draft, writing—review and editing. Youwei Qi: Resources. Cunbao Wang: Data Curation.

Corresponding author

Correspondence to Qian Liu.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Research involving human participants and/or animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by Pietro Pala.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Q., Li, Z., Qi, Y. et al. Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes. Multimedia Systems 30, 213 (2024). https://doi.org/10.1007/s00530-024-01429-2

Download citation

Received: 18 December 2023
Accepted: 16 July 2024
Published: 23 July 2024
DOI: https://doi.org/10.1007/s00530-024-01429-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DRMNet: more efficient bilateral networks for real-time semantic segmentation of road scenes

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

HRDLNet: a semantic segmentation network with high resolution representation for urban street view images

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Research involving human participants and/or animals

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now