Skip to main content
Log in

Image semantic segmentation with an improved fully convolutional network

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

With the development of deep learning and the emergence of unmanned driving, fully convolutional networks are a feasible and effective for image semantic segmentation. DeepLab is an algorithm based on the fully convolutional networks. However, DeepLab algorithm still has room for improvement, and we design three improved methods: (1) the global context structure module, (2) highly efficient decoder module, and (3) multi-scale feature fusion module. The experimental results show that the three improved methods that we proposed in this paper can make the model obtain more expressive features and improve the accuracy of the algorithm. At the same time, we do some experiments on the Cityscapes dataset to further verify robustness and effectiveness of the improved algorithm. Finally, the improved algorithm is applied to the actual scene and has certain practical value.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  • Chaurasia A, Culurciello E (2017) LinkNet: exploiting encoder representations for efficient semantic segmentation. arXiv preprint arXiv:1707.03718

  • Chen LC, Papandreou G, Kokkinos I et al (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. Comput Sci 4:357–361

    Google Scholar 

  • Chen LC, Yang Y, Wang J et al (2016) Attention to scale: scale-aware semantic image segmentation. In: IEEE conference on computer vision and pattern recognition, Las Vegas, pp 3640–3649

  • Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587

  • Chen LC, Papandreou G, Kokkinos I et al (2018a) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  • Chen LC, Zhu Y, Papandreou G et al (2018b) Encoder–decoder with atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611

  • Chollet F (2017) Xception: deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357

  • Cordts M, Omran M, Ramos S et al (2016) The cityscapes dataset for semantic urban scene understanding. In: IEEE conference on computer vision and pattern recognition, Las Vegas, pp 3213–3223

  • Criminisi A, Shotton J, Konukoglu E (2012) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7(2–3):81–227

    MATH  Google Scholar 

  • Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition, San Diego, pp 886–893

  • Dvornik N, Shmelkov K, Mairal J et al (2017) BlitzNet: a Real-time deep network for scene understanding. In: IEEE international conference on computer vision, Venice, pp 4174–4182

  • Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: IEEE international conference on computer vision, Santiago, pp 2650–2658

  • Everingham M, Gool LV, Williams CKI et al (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  • Geiger A (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE conference on computer vision and pattern recognition, Portland, pp 3354–3361

  • Ghiasi G, Fowlkes CC (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. In: European conference on computer vision, Amsterdam, pp 519–534

  • Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feed forward neural networks. In: Proceedings of the 13th international conference on artificial intelligence and statistics, Sardinia, pp 249–256

  • Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics, Fort Lauderdale, pp 315–323

  • Hariharan B, Arbelaez P, Bourdev L et al (2011) Semantic contours from inverse detectors. In: IEEE international conference on computer vision, Barcelona, pp 991–998

  • He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, Las Vegas, pp 770–778

  • Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  Google Scholar 

  • Huang G, Liu Z, Weinberger K Q et al (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition, Hawaii, vol 1, no 2, p 3

  • Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on international conference on machine learning, Lille, pp 448–456

  • Kong T, Yao A, Chen Y et al (2016) HyperNet: towards accurate region proposal generation and joint object detection. In: IEEE conference on computer vision and pattern recognition, Las Vegas, pp 845–853

  • Krähenbühl P, Koltun V (2011) Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Advances in neural information processing systems, Granada, pp 109–117

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, Lake Tahoe, pp 1097–1105

  • Lecun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  • Lee CY, Xie S, Gallagher P et al (2014) Deeply-supervised nets. In: Artificial intelligence and statistics, Reykjavik, pp 562–570

  • Li H, Xiong P, Fan H, Sun J (2019) DFAnet: deep feature aggregation for real-time semantic segmentation. arXiv.org

  • Lienhart R, Maydt J (2002) An extended set of Haar-like features for rapid object detection. In: International conference on image processing, vol 1, I-900–I-903

  • Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision, Zurich, pp 740–755

  • Lin G, Milan A, Shen C et al (2017a) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: IEEE conference on computer vision and pattern recognition, Hawaii, pp 5168–5177

  • Lin T, Dollar P, Girshick RB et al (2017b) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition, Hawaii, pp 936–944

  • Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579

  • Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot MultiBox detector. In: European conference on computer vision, Amsterdam, pp 21–37

  • Liu X, Deng Z, Yang Y (2018) Recent progress in semantic image segmentation. Artif Intell Rev 6:1–18

    Article  Google Scholar 

  • Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, Boston, pp 3431–3440

  • Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation. arXiv.org

  • Pai-Hsuen Chen, Chih-Jen Lin, Bernhard Schölkopf (2005) A tutorial on v-support vector machines. Appl Stoch Models Bus Ind 21(2):111–136

    Article  MathSciNet  Google Scholar 

  • Romera E, Alvarez J, Bergasa L, Arroyo R (2017) Efficient ConvNet for real-time semantic segmentation. In: 2017 IEEE intelligent vehicles symposium (IV). IEEE

  • Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Munich, pp 234–241

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumenhart DE, McCelland JL (eds) Parallel distributed processing: explorations in the microstructure of cognition. MIT Press, Cambridge, pp 318–362

    Chapter  Google Scholar 

  • Russakovsky O, Deng J, Su H et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  • Salscheider N (2019) Simultaneous object detection and semantic segmentation. arXiv.org

  • Shetty S (2012) Application of convolutional neural network for image classification on pascal voc challenge 2012 dataset. arXiv.org

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  • Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way toprevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  • Sun Z, Xue L, Xu Y (2012) A review of in-depth learning. Comput Appl Res 29(8):2806–2810

    Google Scholar 

  • Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, Boston, pp 1–9

  • Wang P, Chen P, Yuan Y et al (2017) Understanding convolution for semantic segmentation. arXiv preprint arXiv:1702.08502

  • Wei Y, Zhao Y (2016) Review of image semantic segmentation based on DCNN. J Beijing Jiaotong Univ 40(4):82–91

    MathSciNet  Google Scholar 

  • Wu Y, He K (2018) Group normalization. arXiv preprint arXiv:1803.08494

  • Yang F (2014) Development status and prospects of driverless cars. Shanghai Automot 3:35–40

    Google Scholar 

  • Yu H, Yang Z, Tan L, Wang Y, Sun W, Sun M et al (2018) Methods and datasets on semantic segmentation: a review. Neurocomputing 304:S0925231218304077

    Article  Google Scholar 

  • Zhao H, Shi J, Qi X et al (2017a) Pyramid scene parsing network. In: IEEE conference on computer vision and pattern recognition, Hawaii, pp 2881–2890

  • Zhao H, Qi X, Shen X et al (2017b) Icnet for real-time semantic segmentation on high-resolution images. arXiv preprint arXiv:1704.08545

Download references

Funding

This study was funded by Shenzhen Government (Grant Nos. KQJSCX20170726104033 357, JCYJ20150513151706567 and JCYJ20160531191837793). Furthermore, this work was partially supported by the Department of Industrial and Systems Engineering of the Hong Kong Polytechnic University (Grant No. H-ZG3K).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuo-Kun Tseng.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by Mu-Yen Chen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tseng, KK., Sun, H., Liu, J. et al. Image semantic segmentation with an improved fully convolutional network. Soft Comput 24, 8253–8273 (2020). https://doi.org/10.1007/s00500-019-04537-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04537-8

Keywords

Navigation