FPANet: Feature pyramid aggregation network for real-time semantic segmentation

Wu, Yun; Jiang, Jianyong; Huang, Zimeng; Tian, Youliang

doi:10.1007/s10489-021-02603-z

FPANet: Feature pyramid aggregation network for real-time semantic segmentation

Published: 05 July 2021

Volume 52, pages 3319–3336, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yun Wu¹,
Jianyong Jiang ORCID: orcid.org/0000-0002-2058-415X¹,
Zimeng Huang¹ &
…
Youliang Tian¹

1901 Accesses
35 Citations
1 Altmetric
Explore all metrics

Abstract

Semantic segmentation is used in many fields, and most fields not only require models with high-quality predictions but also require real-time speed in the forward inference phase. Therefore, our goal is to perform high-quality real-time semantic segmentation, thus proposing the feature pyramid aggregation network (FPANet). This network can be regarded as an encoder-decoder model. In the encoder stage, we use ResNet and atrous spatial pyramid pooling (ASPP) to extract more high-level semantic information. In the decoder stage, to simultaneously obtain the semantic and spatial information of the image, we propose a bilateral directional feature pyramid network for semantic segmentation to fuse features at different levels, it is named SeBiFPN. In SeBiFPN, we design a lightweight feature pyramid fusion module (FPFM) to fuse features from two different levels. In addition, when predicting the border region of an image, most real-time semantic segmentation models perform poorly; therefore, we propose a border refinement module (BRM) to improve the problem of inaccurate border segmentation. To reduce the computational complexity of the model, we redesign the ASPP module and reduce the number of feature channels during feature fusion. Our method achieves a better balance of speed and accuracy compared to the state-of-the-art methods on the Cityscapes and CamVid datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

Article 24 January 2024

ShaoJun Qu, Zhuo Wang, … YueWen Feng

FEENET: A Real-Time Semantic Segmentation via Feature Extraction and Enhancement

MAFNet: dual-branch fusion network with multiscale atrous pyramid pooling aggregate contextual features for real-time semantic segmentation

Article Open access 17 April 2024

Shan Zhao, Yunlei Wang, … Fukai Zhang

References

Csurka G, Perronnin F (2011) An efficient approach to semantic segmentation. Int J Comput Vis 95(2):198–212
Article MathSciNet Google Scholar
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10778–10787
Wang H, Peng J, Chen D, Jiang G, Zhao T, Fu X (2020) Attribute-guided feature learning network for vehicle reidentification. IEEE MultiMedia 27(4):112–121. https://doi.org/10.1109/MMUL.2020.2999464
Article Google Scholar
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. Computer Vision ECCV, Springer International Publishing, pp 334–349
Wang H, Peng J, Zhao Y, Fu X (2020) Multi-path deep cnns for fine-grained car recognition. IEEE Trans Veh Technol 69(10):10484–10493. https://doi.org/10.1109/TVT.2020.3009162
Article Google Scholar
Wang H, Wang Y, Zhang Z, Fu X, Zhuo L, Xu M, Wang M (2020) Kernelized multiview subspace analysis by self-weighted learning. IEEE Transactions on Multimedia, p 1–1. https://doi.org/10.1109/TMM.2020.3032023
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4):640–651
Article Google Scholar
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: A deep neural network architecture for real-time semantic segmentation. arXiv e-prints arXiv:1606.02147
Jiang W, Xie Z, Li Y, Liu C, Lu H (2020) Lrnnet: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation. In: IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp 1–6
Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. Computer Vision ECCV, Springer International Publishing, pp 418–434
Wu Z, Shen C, van den Hengel A (2017) Real-time semantic image segmentation via spatial sparsity. arXiv e-prints arXiv:1712.00213
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(12):2481–2495
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention MICCAI, Springer International Publishing, pp 234–241
Treml, M, Arjona-Medina, J, Unterthiner, T, Durgesh, R and Hochreiter, S: Speeding up semantic segmentation for autonomous driving. In: Nips Workshop-mlits
Liang-Chieh, C, Papandreou, G, Kokkinos, I, Murphy, K and Yuille, A: Semantic image segmentation with deep convolutional nets and fully connected crfs. In: International Conference on Learning Representations
Poudel RPK, Liwicki S, Cipolla R (2019) Fast-scnn: Fast semantic segmentation network. Journal, arXiv:1902.04502
Oric M, Kreo I, Bevandic P, egvic S (2019) In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12599–12608
Romera E, lvarez J M, Bergasa L M, Arroyo R (2018) Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272
Article Google Scholar
Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2020) Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. arXiv e-prints arXiv:2004.02147
Nekrasov V, Shen C, Reid I (2018) Light-weight refinenet for real-time semantic segmentation. arXiv:1810.03272
Doan V, Nguyen D, Tran Q, Nguyen D, Le T (2018) Real-time image semantic segmentation networks with residual depth-wise separable blocks. In: 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS), pp 174–179
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. CVPR, pp 770–778
Brostow GJ, Fauqueur J, Cipolla R (2009) Semantic object classes in video: A high-definition ground truth database. Pattern Recogn Lett 30(2):88–97
Article Google Scholar
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3213–3223
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. Computer Vision ECCV, Springer International Publishing, pp 833–851
Li H, Xiong P, Fan H, Sun J (2019) Dfanet: Deep feature aggregation for real-time semantic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9514–9523
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. Journal arXiv:1704.04861
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4510–4520
Howard A, Sandler M, Chen B, Wang W, Chen L, Tan M, Chu G, Vasudevan V, Zhu Y, Pang R, Adam H, Le Q (2019) Searching for mobilenetv3. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 1314–1324
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv preprint arXiv:1602.07360
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6848–6856
Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: international conference on machine learning, pp 6105–6114
Ioffe, S and Szegedy, C: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: international conference on machine learning, pp 448–456
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1800–1807
Lin T, Dollr P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 936–944
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: Towards balanced learning for object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 821–830
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8759–8768
Mohan, R and Valada, A: Efficientps: Efficient panoptic segmentation. arXiv:2004.02307
Kirillov A, Girshick R, He K, Dollr P (2019) Panoptic feature pyramid networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6392–6401
Chaib S, Liu H, Gu Y, Yao H (2017) Deep feature fusion for vhr remote sensing scene classification. IEEE Trans Geosci Remote Sens 55(8):4775–4784
Article Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6230–6239
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5168–5177
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4):834–848
Article Google Scholar
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. ICCV, pp 2980–2988
Kingma DP, Ba J (2014) Adam: A method for stochas- tic optimization. ICLR
Kong S, Fowlkes C (2019) Pixel-wise attentional gating for scene parsing. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1024–1033
Ochs M, Kretz A, Mester R (2019) SDNet: Semantic guided depth estimation network. In: German Conference on Pattern Recognition (GCPR)
Klingner M, Termhlen J-A, Mikolajczyk J, Fingscheidt T (2020) Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance. In: ECCV
Yang G, Zhao H, Shi J, Deng Z, Jia J (2018) Segstereo: Exploiting semantic information for disparity estimation. In: ECCV
Bolte J-A, Kamp M, Breuer A, Homoceanu S, Schlicht P, Hger F, Lipinski D, Fingscheidt T (2019) Unsupervised Domain Adaptation to Improve Image Segmentation Quality Both in the Source and Target Domain. In: Proc. of CVPR - Workshops. Long Beach, CA, USA
Erkent, Laugier C (2020) Semantic segmentation with unsupervised domain adaptation under varying weather conditions for autonomous vehicles. IEEE Robotics and Automation Letters 5(2):3580–3587. https://doi.org/10.1109/LRA.2020.2978666
Article Google Scholar
Liu Z, Li X, Luo P, Loy C, Tang X (2015) Semantic image segmentation via deep parsing network. In: IEEE International Conference on Computer Vision (ICCV), pp 1377– 1385
Huang P-Y, Hsu W-T, Chiu C-Y, Wu T-F, Sun M (2018) Efficient uncertainty estimation for semantic segmentation in videos. Computer Vision ECCV, Springer International Publishing, pp 536–552
Yu, F and Koltun, V: Multi-scale context aggregation by dilated convolutions. In: international conference on learning representations
Bilinski P, Prisacariu V (2018) Dense decoder shortcut connections for single-pass semantic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6596–6605
Li G, Jiang S, Yun I, Kim J, Kim J (2020) Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes. IEEE Access 8:27495–27506
Article Google Scholar
Chen P, Hang H, Chan S, Lin J (2019) Dsnet: An efficient cnn for road scene segmentation. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp 424–432
Hu P, Caba F, Wang O, Lin Z, Sclaroff S, Perazzi F (2020) Temporally distributed networks for fast video semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8815–8824
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P H S (2015) Conditional random fields as recurrent neural networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1529–1537
Ghiasi G, Fowlkes CC (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. Computer Vision ECCV, Springer International Publishing, pp 519–534
Pohlen T, Hermans A, Mathias M, Leibe B (2017) Full-resolution residual networks for semantic segmentation in street scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3309–3318
Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. Computer Vision ECCV, Springer International Publishing, pp 561–580
Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9182–9192
Wang J, Xiong H, Wang H, Nian X (2020) Adscnet: asymmetric depthwise separable convolution for semantic segmentation in real-time. Appl Intell 50:1045–1056
Article Google Scholar
Mazzini D (2018) Guided upsampling network for real-time semantic segmentation. Series Guided Upsampling Network for Real-Time Semantic Segmentation, p 117

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (Nos. 61662009 and 61772008), Key Program of the National Natural Science Union Foundation of China under Grant No.U1836205.

Author information

Authors and Affiliations

College of Computer Science and Technology, Guizhou University, Guiyang, Guizhou, 550025, People’s Republic of China
Yun Wu, Jianyong Jiang, Zimeng Huang & Youliang Tian

Authors

Yun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jianyong Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zimeng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Youliang Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianyong Jiang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Jiang, J., Huang, Z. et al. FPANet: Feature pyramid aggregation network for real-time semantic segmentation. Appl Intell 52, 3319–3336 (2022). https://doi.org/10.1007/s10489-021-02603-z

Download citation

Accepted: 08 June 2021
Published: 05 July 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10489-021-02603-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FPANet: Feature pyramid aggregation network for real-time semantic segmentation

Abstract

Access this article

Similar content being viewed by others

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

FEENET: A Real-Time Semantic Segmentation via Feature Extraction and Enhancement

MAFNet: dual-branch fusion network with multiscale atrous pyramid pooling aggregate contextual features for real-time semantic segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FPANet: Feature pyramid aggregation network for real-time semantic segmentation

Abstract

Access this article

Similar content being viewed by others

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

FEENET: A Real-Time Semantic Segmentation via Feature Extraction and Enhancement

MAFNet: dual-branch fusion network with multiscale atrous pyramid pooling aggregate contextual features for real-time semantic segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation