Skip to main content
Log in

FPANet: Feature pyramid aggregation network for real-time semantic segmentation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Semantic segmentation is used in many fields, and most fields not only require models with high-quality predictions but also require real-time speed in the forward inference phase. Therefore, our goal is to perform high-quality real-time semantic segmentation, thus proposing the feature pyramid aggregation network (FPANet). This network can be regarded as an encoder-decoder model. In the encoder stage, we use ResNet and atrous spatial pyramid pooling (ASPP) to extract more high-level semantic information. In the decoder stage, to simultaneously obtain the semantic and spatial information of the image, we propose a bilateral directional feature pyramid network for semantic segmentation to fuse features at different levels, it is named SeBiFPN. In SeBiFPN, we design a lightweight feature pyramid fusion module (FPFM) to fuse features from two different levels. In addition, when predicting the border region of an image, most real-time semantic segmentation models perform poorly; therefore, we propose a border refinement module (BRM) to improve the problem of inaccurate border segmentation. To reduce the computational complexity of the model, we redesign the ASPP module and reduce the number of feature channels during feature fusion. Our method achieves a better balance of speed and accuracy compared to the state-of-the-art methods on the Cityscapes and CamVid datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Csurka G, Perronnin F (2011) An efficient approach to semantic segmentation. Int J Comput Vis 95(2):198–212

    Article  MathSciNet  Google Scholar 

  2. Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10778–10787

  3. Wang H, Peng J, Chen D, Jiang G, Zhao T, Fu X (2020) Attribute-guided feature learning network for vehicle reidentification. IEEE MultiMedia 27(4):112–121. https://doi.org/10.1109/MMUL.2020.2999464

    Article  Google Scholar 

  4. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. Computer Vision ECCV, Springer International Publishing, pp 334–349

  5. Wang H, Peng J, Zhao Y, Fu X (2020) Multi-path deep cnns for fine-grained car recognition. IEEE Trans Veh Technol 69(10):10484–10493. https://doi.org/10.1109/TVT.2020.3009162

    Article  Google Scholar 

  6. Wang H, Wang Y, Zhang Z, Fu X, Zhuo L, Xu M, Wang M (2020) Kernelized multiview subspace analysis by self-weighted learning. IEEE Transactions on Multimedia, p 1–1. https://doi.org/10.1109/TMM.2020.3032023

  7. Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4):640–651

    Article  Google Scholar 

  8. Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: A deep neural network architecture for real-time semantic segmentation. arXiv e-prints arXiv:1606.02147

  9. Jiang W, Xie Z, Li Y, Liu C, Lu H (2020) Lrnnet: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation. In: IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp 1–6

  10. Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. Computer Vision ECCV, Springer International Publishing, pp 418–434

  11. Wu Z, Shen C, van den Hengel A (2017) Real-time semantic image segmentation via spatial sparsity. arXiv e-prints arXiv:1712.00213

  12. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(12):2481–2495

    Article  Google Scholar 

  13. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention MICCAI, Springer International Publishing, pp 234–241

  14. Treml, M, Arjona-Medina, J, Unterthiner, T, Durgesh, R and Hochreiter, S: Speeding up semantic segmentation for autonomous driving. In: Nips Workshop-mlits

  15. Liang-Chieh, C, Papandreou, G, Kokkinos, I, Murphy, K and Yuille, A: Semantic image segmentation with deep convolutional nets and fully connected crfs. In: International Conference on Learning Representations

  16. Poudel RPK, Liwicki S, Cipolla R (2019) Fast-scnn: Fast semantic segmentation network. Journal, arXiv:1902.04502

  17. Oric M, Kreo I, Bevandic P, egvic S (2019) In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12599–12608

  18. Romera E, lvarez J M, Bergasa L M, Arroyo R (2018) Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272

    Article  Google Scholar 

  19. Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2020) Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. arXiv e-prints arXiv:2004.02147

  20. Nekrasov V, Shen C, Reid I (2018) Light-weight refinenet for real-time semantic segmentation. arXiv:1810.03272

  21. Doan V, Nguyen D, Tran Q, Nguyen D, Le T (2018) Real-time image semantic segmentation networks with residual depth-wise separable blocks. In: 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS), pp 174–179

  22. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. CVPR, pp 770–778

  23. Brostow GJ, Fauqueur J, Cipolla R (2009) Semantic object classes in video: A high-definition ground truth database. Pattern Recogn Lett 30(2):88–97

    Article  Google Scholar 

  24. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3213–3223

  25. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. Computer Vision ECCV, Springer International Publishing, pp 833–851

  26. Li H, Xiong P, Fan H, Sun J (2019) Dfanet: Deep feature aggregation for real-time semantic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9514–9523

  27. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. Journal arXiv:1704.04861

  28. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4510–4520

  29. Howard A, Sandler M, Chen B, Wang W, Chen L, Tan M, Chu G, Vasudevan V, Zhu Y, Pang R, Adam H, Le Q (2019) Searching for mobilenetv3. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 1314–1324

  30. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv preprint arXiv:1602.07360

  31. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6848–6856

  32. Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: international conference on machine learning, pp 6105–6114

  33. Ioffe, S and Szegedy, C: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: international conference on machine learning, pp 448–456

  34. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1800–1807

  35. Lin T, Dollr P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 936–944

  36. Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: Towards balanced learning for object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 821–830

  37. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8759–8768

  38. Mohan, R and Valada, A: Efficientps: Efficient panoptic segmentation. arXiv:2004.02307

  39. Kirillov A, Girshick R, He K, Dollr P (2019) Panoptic feature pyramid networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6392–6401

  40. Chaib S, Liu H, Gu Y, Yao H (2017) Deep feature fusion for vhr remote sensing scene classification. IEEE Trans Geosci Remote Sens 55(8):4775–4784

    Article  Google Scholar 

  41. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6230–6239

  42. Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5168–5177

  43. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4):834–848

    Article  Google Scholar 

  44. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. ICCV, pp 2980–2988

  45. Kingma DP, Ba J (2014) Adam: A method for stochas- tic optimization. ICLR

  46. Kong S, Fowlkes C (2019) Pixel-wise attentional gating for scene parsing. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1024–1033

  47. Ochs M, Kretz A, Mester R (2019) SDNet: Semantic guided depth estimation network. In: German Conference on Pattern Recognition (GCPR)

  48. Klingner M, Termhlen J-A, Mikolajczyk J, Fingscheidt T (2020) Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance. In: ECCV

  49. Yang G, Zhao H, Shi J, Deng Z, Jia J (2018) Segstereo: Exploiting semantic information for disparity estimation. In: ECCV

  50. Bolte J-A, Kamp M, Breuer A, Homoceanu S, Schlicht P, Hger F, Lipinski D, Fingscheidt T (2019) Unsupervised Domain Adaptation to Improve Image Segmentation Quality Both in the Source and Target Domain. In: Proc. of CVPR - Workshops. Long Beach, CA, USA

  51. Erkent, Laugier C (2020) Semantic segmentation with unsupervised domain adaptation under varying weather conditions for autonomous vehicles. IEEE Robotics and Automation Letters 5(2):3580–3587. https://doi.org/10.1109/LRA.2020.2978666

    Article  Google Scholar 

  52. Liu Z, Li X, Luo P, Loy C, Tang X (2015) Semantic image segmentation via deep parsing network. In: IEEE International Conference on Computer Vision (ICCV), pp 1377– 1385

  53. Huang P-Y, Hsu W-T, Chiu C-Y, Wu T-F, Sun M (2018) Efficient uncertainty estimation for semantic segmentation in videos. Computer Vision ECCV, Springer International Publishing, pp 536–552

  54. Yu, F and Koltun, V: Multi-scale context aggregation by dilated convolutions. In: international conference on learning representations

  55. Bilinski P, Prisacariu V (2018) Dense decoder shortcut connections for single-pass semantic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6596–6605

  56. Li G, Jiang S, Yun I, Kim J, Kim J (2020) Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes. IEEE Access 8:27495–27506

    Article  Google Scholar 

  57. Chen P, Hang H, Chan S, Lin J (2019) Dsnet: An efficient cnn for road scene segmentation. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp 424–432

  58. Hu P, Caba F, Wang O, Lin Z, Sclaroff S, Perazzi F (2020) Temporally distributed networks for fast video semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8815–8824

  59. Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P H S (2015) Conditional random fields as recurrent neural networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1529–1537

  60. Ghiasi G, Fowlkes CC (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. Computer Vision ECCV, Springer International Publishing, pp 519–534

  61. Pohlen T, Hermans A, Mathias M, Leibe B (2017) Full-resolution residual networks for semantic segmentation in street scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3309–3318

  62. Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. Computer Vision ECCV, Springer International Publishing, pp 561–580

  63. Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9182–9192

  64. Wang J, Xiong H, Wang H, Nian X (2020) Adscnet: asymmetric depthwise separable convolution for semantic segmentation in real-time. Appl Intell 50:1045–1056

    Article  Google Scholar 

  65. Mazzini D (2018) Guided upsampling network for real-time semantic segmentation. Series Guided Upsampling Network for Real-Time Semantic Segmentation, p 117

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (Nos. 61662009 and 61772008), Key Program of the National Natural Science Union Foundation of China under Grant No.U1836205.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianyong Jiang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Jiang, J., Huang, Z. et al. FPANet: Feature pyramid aggregation network for real-time semantic segmentation. Appl Intell 52, 3319–3336 (2022). https://doi.org/10.1007/s10489-021-02603-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02603-z

Keywords

Navigation