Abstract
Semantic segmentation is used in many fields, and most fields not only require models with high-quality predictions but also require real-time speed in the forward inference phase. Therefore, our goal is to perform high-quality real-time semantic segmentation, thus proposing the feature pyramid aggregation network (FPANet). This network can be regarded as an encoder-decoder model. In the encoder stage, we use ResNet and atrous spatial pyramid pooling (ASPP) to extract more high-level semantic information. In the decoder stage, to simultaneously obtain the semantic and spatial information of the image, we propose a bilateral directional feature pyramid network for semantic segmentation to fuse features at different levels, it is named SeBiFPN. In SeBiFPN, we design a lightweight feature pyramid fusion module (FPFM) to fuse features from two different levels. In addition, when predicting the border region of an image, most real-time semantic segmentation models perform poorly; therefore, we propose a border refinement module (BRM) to improve the problem of inaccurate border segmentation. To reduce the computational complexity of the model, we redesign the ASPP module and reduce the number of feature channels during feature fusion. Our method achieves a better balance of speed and accuracy compared to the state-of-the-art methods on the Cityscapes and CamVid datasets.
Similar content being viewed by others
References
Csurka G, Perronnin F (2011) An efficient approach to semantic segmentation. Int J Comput Vis 95(2):198–212
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10778–10787
Wang H, Peng J, Chen D, Jiang G, Zhao T, Fu X (2020) Attribute-guided feature learning network for vehicle reidentification. IEEE MultiMedia 27(4):112–121. https://doi.org/10.1109/MMUL.2020.2999464
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: Bilateral segmentation network for real-time semantic segmentation. Computer Vision ECCV, Springer International Publishing, pp 334–349
Wang H, Peng J, Zhao Y, Fu X (2020) Multi-path deep cnns for fine-grained car recognition. IEEE Trans Veh Technol 69(10):10484–10493. https://doi.org/10.1109/TVT.2020.3009162
Wang H, Wang Y, Zhang Z, Fu X, Zhuo L, Xu M, Wang M (2020) Kernelized multiview subspace analysis by self-weighted learning. IEEE Transactions on Multimedia, p 1–1. https://doi.org/10.1109/TMM.2020.3032023
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4):640–651
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: A deep neural network architecture for real-time semantic segmentation. arXiv e-prints arXiv:1606.02147
Jiang W, Xie Z, Li Y, Liu C, Lu H (2020) Lrnnet: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation. In: IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp 1–6
Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. Computer Vision ECCV, Springer International Publishing, pp 418–434
Wu Z, Shen C, van den Hengel A (2017) Real-time semantic image segmentation via spatial sparsity. arXiv e-prints arXiv:1712.00213
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(12):2481–2495
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention MICCAI, Springer International Publishing, pp 234–241
Treml, M, Arjona-Medina, J, Unterthiner, T, Durgesh, R and Hochreiter, S: Speeding up semantic segmentation for autonomous driving. In: Nips Workshop-mlits
Liang-Chieh, C, Papandreou, G, Kokkinos, I, Murphy, K and Yuille, A: Semantic image segmentation with deep convolutional nets and fully connected crfs. In: International Conference on Learning Representations
Poudel RPK, Liwicki S, Cipolla R (2019) Fast-scnn: Fast semantic segmentation network. Journal, arXiv:1902.04502
Oric M, Kreo I, Bevandic P, egvic S (2019) In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12599–12608
Romera E, lvarez J M, Bergasa L M, Arroyo R (2018) Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272
Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2020) Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. arXiv e-prints arXiv:2004.02147
Nekrasov V, Shen C, Reid I (2018) Light-weight refinenet for real-time semantic segmentation. arXiv:1810.03272
Doan V, Nguyen D, Tran Q, Nguyen D, Le T (2018) Real-time image semantic segmentation networks with residual depth-wise separable blocks. In: 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS), pp 174–179
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. CVPR, pp 770–778
Brostow GJ, Fauqueur J, Cipolla R (2009) Semantic object classes in video: A high-definition ground truth database. Pattern Recogn Lett 30(2):88–97
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3213–3223
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. Computer Vision ECCV, Springer International Publishing, pp 833–851
Li H, Xiong P, Fan H, Sun J (2019) Dfanet: Deep feature aggregation for real-time semantic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9514–9523
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. Journal arXiv:1704.04861
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4510–4520
Howard A, Sandler M, Chen B, Wang W, Chen L, Tan M, Chu G, Vasudevan V, Zhu Y, Pang R, Adam H, Le Q (2019) Searching for mobilenetv3. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp 1314–1324
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv preprint arXiv:1602.07360
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6848–6856
Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: international conference on machine learning, pp 6105–6114
Ioffe, S and Szegedy, C: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: international conference on machine learning, pp 448–456
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1800–1807
Lin T, Dollr P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 936–944
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: Towards balanced learning for object detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 821–830
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8759–8768
Mohan, R and Valada, A: Efficientps: Efficient panoptic segmentation. arXiv:2004.02307
Kirillov A, Girshick R, He K, Dollr P (2019) Panoptic feature pyramid networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6392–6401
Chaib S, Liu H, Gu Y, Yao H (2017) Deep feature fusion for vhr remote sensing scene classification. IEEE Trans Geosci Remote Sens 55(8):4775–4784
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6230–6239
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5168–5177
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4):834–848
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. ICCV, pp 2980–2988
Kingma DP, Ba J (2014) Adam: A method for stochas- tic optimization. ICLR
Kong S, Fowlkes C (2019) Pixel-wise attentional gating for scene parsing. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1024–1033
Ochs M, Kretz A, Mester R (2019) SDNet: Semantic guided depth estimation network. In: German Conference on Pattern Recognition (GCPR)
Klingner M, Termhlen J-A, Mikolajczyk J, Fingscheidt T (2020) Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance. In: ECCV
Yang G, Zhao H, Shi J, Deng Z, Jia J (2018) Segstereo: Exploiting semantic information for disparity estimation. In: ECCV
Bolte J-A, Kamp M, Breuer A, Homoceanu S, Schlicht P, Hger F, Lipinski D, Fingscheidt T (2019) Unsupervised Domain Adaptation to Improve Image Segmentation Quality Both in the Source and Target Domain. In: Proc. of CVPR - Workshops. Long Beach, CA, USA
Erkent, Laugier C (2020) Semantic segmentation with unsupervised domain adaptation under varying weather conditions for autonomous vehicles. IEEE Robotics and Automation Letters 5(2):3580–3587. https://doi.org/10.1109/LRA.2020.2978666
Liu Z, Li X, Luo P, Loy C, Tang X (2015) Semantic image segmentation via deep parsing network. In: IEEE International Conference on Computer Vision (ICCV), pp 1377– 1385
Huang P-Y, Hsu W-T, Chiu C-Y, Wu T-F, Sun M (2018) Efficient uncertainty estimation for semantic segmentation in videos. Computer Vision ECCV, Springer International Publishing, pp 536–552
Yu, F and Koltun, V: Multi-scale context aggregation by dilated convolutions. In: international conference on learning representations
Bilinski P, Prisacariu V (2018) Dense decoder shortcut connections for single-pass semantic segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6596–6605
Li G, Jiang S, Yun I, Kim J, Kim J (2020) Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes. IEEE Access 8:27495–27506
Chen P, Hang H, Chan S, Lin J (2019) Dsnet: An efficient cnn for road scene segmentation. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp 424–432
Hu P, Caba F, Wang O, Lin Z, Sclaroff S, Perazzi F (2020) Temporally distributed networks for fast video semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 8815–8824
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P H S (2015) Conditional random fields as recurrent neural networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1529–1537
Ghiasi G, Fowlkes CC (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. Computer Vision ECCV, Springer International Publishing, pp 519–534
Pohlen T, Hermans A, Mathias M, Leibe B (2017) Full-resolution residual networks for semantic segmentation in street scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3309–3318
Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. Computer Vision ECCV, Springer International Publishing, pp 561–580
Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9182–9192
Wang J, Xiong H, Wang H, Nian X (2020) Adscnet: asymmetric depthwise separable convolution for semantic segmentation in real-time. Appl Intell 50:1045–1056
Mazzini D (2018) Guided upsampling network for real-time semantic segmentation. Series Guided Upsampling Network for Real-Time Semantic Segmentation, p 117
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (Nos. 61662009 and 61772008), Key Program of the National Natural Science Union Foundation of China under Grant No.U1836205.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, Y., Jiang, J., Huang, Z. et al. FPANet: Feature pyramid aggregation network for real-time semantic segmentation. Appl Intell 52, 3319–3336 (2022). https://doi.org/10.1007/s10489-021-02603-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02603-z