SPSSNet: a real-time network for image semantic segmentation

Mamoon, Saqib; Manzoor, Muhammad Arslan; Zhang, Fa-en; Ali, Zakir; Lu, Jian-feng

doi:10.1631/FITEE.1900697

SPSSNet: a real-time network for image semantic segmentation

Published: 23 December 2020

Volume 21, pages 1770–1782, (2020)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

182 Accesses
5 Citations
Explore all metrics

Abstract

Although deep neural networks (DNNs) have achieved great success in semantic segmentation tasks, it is still challenging for real-time applications. A large number of feature channels, parameters, and floating-point operations make the network sluggish and computationally heavy, which is not desirable for real-time tasks such as robotics and autonomous driving. Most approaches, however, usually sacrifice spatial resolution to achieve inference speed in real time, resulting in poor performance. In this paper, we propose a light-weight stage-pooling semantic segmentation network (SPSSN), which can efficiently reuse the paramount features from early layers at multiple stages, at different spatial resolutions. SPSSN takes input of full resolution 2048×1024 pixels, uses only 1.42 × 10⁶ parameters, yields 69.4% mIoU accuracy without pre-training, and obtains an inference speed of 59 frames/s on the Cityscapes dataset. SPSSN can run directly on mobile devices in real time, due to its light-weight architecture. To demonstrate the effectiveness of the proposed network, we compare our results with those of state-of-the-art networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Lightweight Multi-scale Feature Fusion Network for Real-Time Semantic Segmentation

ESNet: An Efficient Symmetric Network for Real-Time Semantic Segmentation

Lightweight and Progressively-Scalable Networks for Semantic Segmentation

Article 18 May 2023

Yiheng Zhang, Ting Yao, … Tao Mei

References

Badrinarayanan V, Kendall A, Cipolla R, 2017. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Patt Anal Mach Intell, 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Article Google Scholar
Brostow GJ, Fauqueur J, Cipolla R, 2009. Semantic object classes in video: a high-definition ground truth database. Patt Recogn Lett, 30(2):88–97. https://doi.org/10.1016/j.patrec.2008.04.005
Article Google Scholar
Chen LC, Papandreou G, Schroff F, et al., 2017. Rethinking atrous convolution for semantic image segmentation. https://arxiv.org/abs/1706.05587
Chen LC, Papandreou G, Kokkinos I, et al., 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Patt Anal Mach Intell, 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Article Google Scholar
Cheng J, Wang P, Li G, et al., 2018. Recent advances in efficient computation of deep convolutional neural networks. Front Inform Technol Electron Eng, 19(1):64–77. https://doi.org/10.1631/FITEE.1700789
Article Google Scholar
Chollet F, 2016. Xception: deep learning with depthwise separable convolutions. https://arxiv.org/abs/1610.02357
Christ PF, Elshaer MEA, Ettlinger F, et al., 2016. Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields. Proc 19^th Int Conf on Medical Image Computing and Computer-Assisted Intervention, p.415–423. https://doi.org/10.1007/978-3-319-46723-8_48
Cordts M, Omran M, Ramos S, et al., 2016. The Cityscapes dataset for semantic urban scene understanding. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.3213–3223. https://doi.org/10.1109/CVPR.2016.350
Dai JF, He KM, Li Y, et al., 2016a. Instance-sensitive fully convolutional networks. Proc 14^th European Conf on Computer Vision, p.534–549. https://doi.org/10.1007/978-3-319-46466-4_32
Dai JF, Li Y, He KM, et al., 2016b. R-FCN: object detection via region-based fully convolutional networks. Proc 30^th Int Conf on Neural Information Processing Systems, p.379–387.
Devlin J, Chang MW, Lee K, et al., 2018. BERT: pretraining of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805
Han S, Mao HZ, Dally WJ, 2016. Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. Proc 4^th Int Conf on Learning Representations, p.1–14.
He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. IEEE Conf on Computer Vision and Pattern Recognition, p.770–778. https://doi.org/10.1109/CVPR.2016.90
Howard AG, Zhu ML, Chen B, et al., 2017. MobileNets: efficient convolutional neural networks for mobile vision applications. https://arxiv.org/abs/1704.04861
Hu H, Gu JY, Zhang Z, et al., 2017. Relation networks for object detection. http://arxiv.org/abs/1711.11575
Huang G, Liu SC, van der Maaten L, et al., 2017. Condensenet: an efficient densenet using learned group convolutions. https://arxiv.org/abs/1711.09224
Hubara I, Courbariaux M, Soudry D, et al., 2016. Binarized neural networks. Proc 30^th Int Conf on Neural Information Processing Systems, p.4114–4122.
Hubara I, Courbariaux M, Soudry D, et al., 2018. Quantized neural networks: training neural networks with low precision weights and activations. J Mach Learn Res, 18(187):1–30.
MathSciNet MATH Google Scholar
Ioffe S, Szegedy C, 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. Proc 32^nd Int Conf on Machine Learning, p.1448–1456.
Jégou S, Drozdzal M, Vazquez D, et al., 2017. The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.1175–1183. https://doi.org/10.1109/CVPRW.2017.156
Lee H, Matin T, Gleeson F, et al., 2019. Efficient 3D fully convolutional networks for pulmonary lobe segmentation in CT images. https://arxiv.org/abs/1909.07474
Li C, Shi CJR, 2018. Constrained optimization based low-rank approximation of deep neural networks. Proc 15^th European Conf on Computer Vision, p.746–761. https://doi.org/10.1007/978-3-030-01249-6_45
Li H, Kadav A, Durdanovic I, et al., 2016. Pruning filters for efficient ConvNets. https://arxiv.org/abs/1608.08710
Li HC, Xiong PF, Fan HQ, et al., 2019. DFANet: deep feature aggregation for real-time semantic segmentation. https://arxiv.org/abs/1904.02216
Lin GS, Shen CH, van den Hengel A, et al., 2016. Efficient piecewise training of deep structured models for semantic segmentation. IEEE Conf on Computer Vision and Pattern Recognition, p.3194–3203. https://doi.org/10.1109/CVPR.2016.348
Lin GS, Liu FY, Milan A, et al., 2019. RefineNet: multi-path refinement networks for dense prediction. IEEE Trans Patt Anal Mach Intell, p.1228–1242. https://doi.org/10.1109/TPAMI.2019.2893630
Liu ZW, Li XX, Luo P, et al., 2015. Semantic image segmentation via deep parsing network. IEEE Int Conf on Computer Vision, p.1377–1385. https://doi.org/10.1109/ICCV.2015.162
Long J, Shelhamer E, Darrell T, 2014. Fully convolutional networks for semantic segmentation. https://arxiv.org/abs/1411.4038
Ma NN, Zhang XY, Zheng HT, et al., 2018. ShuffleNet V2: practical guidelines for efficient CNN architecture design. Proc 15^th European Conf on Computer Vision, p.122–138. https://doi.org/10.1007/978-3-030-01264-9_8
Mazzini D, 2018. Guided upsampling network for real-time semantic segmentation. https://arxiv.org/abs/1807.07466
Mehta S, Rastegari M, Caspi A, et al., 2018. ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. Proc 15^th European Conf on Computer Vision, p.561–580. https://doi.org/10.1007/978-3-030-01249-6_34
Mehta S, Rastegari M, Shapiro L, et al., 2019. ESPNetv2: a light-weight, power efficient, and general purpose convolutional neural network. IEEE Conf on Computer Vision and Pattern Recognition, p.9190–9200. https://doi.org/10.1109/CVPR.2019.00941
Nekrasov V, Shen CH, Reid I, 2018. Light-weight RefineNet for real-time semantic segmentation. British Machine Vision Conf, p.125.
Noh H, Hong S, Han B, 2015. Learning deconvolution network for semantic segmentation. https://arxiv.org/abs/1505.04366
Pan Y, 2019. On visual knowledge. Front Inform Technol Electon Eng, 20(8):1021–1025. https://doi.org/10.1631/FITEE.1910001
Article Google Scholar
Paszke A, Chaurasia A, Kim S, et al., 2016. ENet: a deep neural network architecture for real-time semantic segmentation. https://arxiv.org/abs/1606.02147
Peng YX, He XT, Zhao JJ, 2018. Object-part attention model for fine-grained image classification. IEEE Trans Image Process, 27(3):1487–1500. https://doi.org/10.1109/TIP.2017.2774041
Article MathSciNet Google Scholar
Poudel RPK, Bonde U, Liwicki S, et al., 2018. ContextNet: exploring context and detail for semantic segmentation in real-time. https://arxiv.org/abs/1805.04554
Poudel RPK, Liwicki S, Cipolla R, 2019. Fast-SCNN: fast semantic segmentation network. https://arxiv.org/abs/1902.04502
Rastegari M, Ordonez V, Redmon J, et al., 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. Proc 14^th European Conf on Computer Vision, p.525–542. https://doi.org/10.1007/978-3-319-46493-0_32
Ren SQ, He KM, Girshick R, et al., 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell, 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Romera E, Álvarez JM, Bergasa LM, et al., 2018. ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Trans Intell Transp Syst, 19(1):263–272. https://doi.org/10.1109/TITS.2017.2750080
Article Google Scholar
Salvador A, Bellver M, Campos V, et al., 2017. Recurrent neural networks for semantic instance segmentation. https://arxiv.org/abs/1712.00617
Sandler M, Howard A, Zhu ML, et al., 2018. MobileNetV2: inverted residuals and linear bottlenecks. IEEE Conf on Computer Vision and Pattern Recognition, p.4510–4520. https://doi.org/10.1109/CVPR.2018.00474
Sherrah J, 2016. Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery. https://arxiv.org/abs/1606.02585
Siam M, Gamal M, Abdel-Razek M, et al., 2018. A comparative study of real-time semantic segmentation for autonomous driving. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.587–597. https://doi.org/10.1109/CVPRW.2018.00101
Soudry D, Hubara I, Meir R, 2014. Expectation backpropagation: parameter-free training of multilayer neural networks with continuous or discrete weights. Proc 27^th Int Conf on Neural Information Processing Systems, p.963–971.
Sturgess P, Alahari K, Ladicky L, et al., 2009. Combining appearance and structure from motion features for road scene understanding. British Machine Vision Conf, p.1–11. https://doi.org/10.5244/C.23.62
Szegedy C, Vanhoucke V, Ioffe S, et al., 2015. Rethinking the inception architecture for computer vision. https://arxiv.org/abs/1512.00567
Türkmen S, Heikkilä J, 2019. An efficient solution for semantic segmentation: ShuffleNet V2 with atrous separable convolutions. Proc 21^st Scandinavian Conf on Image Analysis, p.41–53. https://doi.org/10.1007/978-3-030-20205-7_4
Visin F, Kastner K, Courville AC, et al., 2015. ReSeg: a recurrent neural network for object segmentation. https://arxiv.org/abs/1511.07053
Wen W, Wu CP, Wang YD, et al., 2016. Learning structured sparsity in deep neural networks. Proc 30^th Int Conf on Neural Information Processing Systems, p.1–9.
Wilson AC, Roelofs R, Stern M, et al., 2017. The marginal value of adaptive gradient methods in machine learning. Proc 31^st Int Conf on Neural Information Processing Systems, p.1–14.
Wu S, Li GQ, Chen F, et al., 2018. Training and inference with integers in deep neural networks. https://arxiv.org/abs/1802.04680
Xiang W, Mao HD, Athitsos V, 2019. ThunderNet: a turbo unified network for real-time semantic segmentation. IEEE Winter Conf on Applications of Computer Vision, p.1789–1796. https://doi.org/10.1109/WACV.2019.00195
Yang J, Liu QS, Zhang KH, 2017. Stacked hourglass network for robust facial landmark localisation. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.2025–2033. https://doi.org/10.1109/CVPRW.2017.253
Yu CQ, Wang JB, Peng C, et al., 2018. BiSeNet: bilateral segmentation network for real-time semantic segmentation. Proc 15^th European Conf on Computer Vision, p.334–349. https://doi.org/10.1007/978-3-030-01261-8_20
Yu F, Koltun V, 2016. Multi-scale context aggregation by dilated convolutions. Proc 4^th Int Conf on Learning Representations, p.1–13.
Yu F, Koltun V, Funkhouser T, 2017. Dilated residual networks. IEEE Conf on Computer Vision and Pattern Recognition, p.636–644. https://doi.org/10.1109/CVPR.2017.75
Zhang JC, Peng YX, 2019a. Hierarchical vision-language alignment for video captioning. Proc 25^th Int Conf on Multimedia Modeling, p.42–54. https://doi.org/10.1007/978-3-030-05710-7_4
Zhang JC, Peng YX, 2019b. Object-aware aggregation with bidirectional temporal graph for video captioning. https://arxiv.org/abs/1906.04375
Zhang QS, Zhu SC, 2018. Visual interpretatbility for deep learning: a survey. Front Inform Technol Electron Eng, 19(1):27–39. https://doi.org/10.1631/FITEE.1700808
Article Google Scholar
Zhao HS, Shi JP, Qi XJ, et al., 2017. Pyramid scene parsing network. IEEE Conf on Computer Vision and Pattern Recognition, p.6230–6239. https://doi.org/10.1109/CVPR.2017.660
Zhao HS, Qi XJ, Shen XY, et al., 2018. ICNet for real-time semantic segmentation on high-resolution images. Proc 15^th European Conf on Computer Vision, p.418–434. https://doi.org/10.1007/978-3-030-01219-9_25
Zheng S, Jayasumana S, Romera-Paredes B, et al., 2015. Conditional random fields as recurrent neural networks. IEEE Int Conf on Computer Vision, p.1529–1537. https://doi.org/10.1109/ICCV.2015.179

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science & Technology, Nanjing, 210094, China
Saqib Mamoon, Muhammad Arslan Manzoor, Zakir Ali & Jian-feng Lu
AInnovation, Beijing, 100080, China
Fa-en Zhang

Authors

Saqib Mamoon
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Arslan Manzoor
View author publications
You can also search for this author in PubMed Google Scholar
Fa-en Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zakir Ali
View author publications
You can also search for this author in PubMed Google Scholar
Jian-feng Lu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Saqib MAMOON and Jian-feng LU designed the research. Muhammad Arslan MANZOOR, Fa-en ZHANG, and Zakir ALI processed the data. Saqib MAMOON drafted the manuscript. Muhammad Arslan MANZOOR helped organize the manuscript. Jian-feng LU and Zakir ALI revised and finalized the paper.

Corresponding author

Correspondence to Jian-feng Lu.

Additional information

Compliance with ethics guidelines

Saqib MAMOON, Muhammad Arslan MANZOOR, Faen ZHANG, Zakir ALI, and Jian-feng LU declare that they have no conflict of interest.

Project supported by the National Key R&D Program of China (No. 2017YFB1300205)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mamoon, S., Manzoor, M.A., Zhang, Fe. et al. SPSSNet: a real-time network for image semantic segmentation. Front Inform Technol Electron Eng 21, 1770–1782 (2020). https://doi.org/10.1631/FITEE.1900697

Download citation

Received: 14 December 2019
Accepted: 21 February 2020
Published: 23 December 2020
Issue Date: December 2020
DOI: https://doi.org/10.1631/FITEE.1900697

Key words

CLC number

TP39

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SPSSNet: a real-time network for image semantic segmentation

Abstract

Access this article

Similar content being viewed by others

A Lightweight Multi-scale Feature Fusion Network for Real-Time Semantic Segmentation

ESNet: An Efficient Symmetric Network for Real-Time Semantic Segmentation

Lightweight and Progressively-Scalable Networks for Semantic Segmentation

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

SPSSNet: a real-time network for image semantic segmentation

Abstract

Access this article

Similar content being viewed by others

A Lightweight Multi-scale Feature Fusion Network for Real-Time Semantic Segmentation

ESNet: An Efficient Symmetric Network for Real-Time Semantic Segmentation

Lightweight and Progressively-Scalable Networks for Semantic Segmentation

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation