Abstract
With the development of deep learning and the emergence of unmanned driving, fully convolutional networks are a feasible and effective for image semantic segmentation. DeepLab is an algorithm based on the fully convolutional networks. However, DeepLab algorithm still has room for improvement, and we design three improved methods: (1) the global context structure module, (2) highly efficient decoder module, and (3) multi-scale feature fusion module. The experimental results show that the three improved methods that we proposed in this paper can make the model obtain more expressive features and improve the accuracy of the algorithm. At the same time, we do some experiments on the Cityscapes dataset to further verify robustness and effectiveness of the improved algorithm. Finally, the improved algorithm is applied to the actual scene and has certain practical value.
Similar content being viewed by others
References
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Chaurasia A, Culurciello E (2017) LinkNet: exploiting encoder representations for efficient semantic segmentation. arXiv preprint arXiv:1707.03718
Chen LC, Papandreou G, Kokkinos I et al (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. Comput Sci 4:357–361
Chen LC, Yang Y, Wang J et al (2016) Attention to scale: scale-aware semantic image segmentation. In: IEEE conference on computer vision and pattern recognition, Las Vegas, pp 3640–3649
Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Chen LC, Papandreou G, Kokkinos I et al (2018a) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen LC, Zhu Y, Papandreou G et al (2018b) Encoder–decoder with atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357
Cordts M, Omran M, Ramos S et al (2016) The cityscapes dataset for semantic urban scene understanding. In: IEEE conference on computer vision and pattern recognition, Las Vegas, pp 3213–3223
Criminisi A, Shotton J, Konukoglu E (2012) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7(2–3):81–227
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition, San Diego, pp 886–893
Dvornik N, Shmelkov K, Mairal J et al (2017) BlitzNet: a Real-time deep network for scene understanding. In: IEEE international conference on computer vision, Venice, pp 4174–4182
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: IEEE international conference on computer vision, Santiago, pp 2650–2658
Everingham M, Gool LV, Williams CKI et al (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Geiger A (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE conference on computer vision and pattern recognition, Portland, pp 3354–3361
Ghiasi G, Fowlkes CC (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. In: European conference on computer vision, Amsterdam, pp 519–534
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feed forward neural networks. In: Proceedings of the 13th international conference on artificial intelligence and statistics, Sardinia, pp 249–256
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics, Fort Lauderdale, pp 315–323
Hariharan B, Arbelaez P, Bourdev L et al (2011) Semantic contours from inverse detectors. In: IEEE international conference on computer vision, Barcelona, pp 991–998
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, Las Vegas, pp 770–778
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Huang G, Liu Z, Weinberger K Q et al (2017) Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition, Hawaii, vol 1, no 2, p 3
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on international conference on machine learning, Lille, pp 448–456
Kong T, Yao A, Chen Y et al (2016) HyperNet: towards accurate region proposal generation and joint object detection. In: IEEE conference on computer vision and pattern recognition, Las Vegas, pp 845–853
Krähenbühl P, Koltun V (2011) Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Advances in neural information processing systems, Granada, pp 109–117
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, Lake Tahoe, pp 1097–1105
Lecun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lee CY, Xie S, Gallagher P et al (2014) Deeply-supervised nets. In: Artificial intelligence and statistics, Reykjavik, pp 562–570
Li H, Xiong P, Fan H, Sun J (2019) DFAnet: deep feature aggregation for real-time semantic segmentation. arXiv.org
Lienhart R, Maydt J (2002) An extended set of Haar-like features for rapid object detection. In: International conference on image processing, vol 1, I-900–I-903
Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: European conference on computer vision, Zurich, pp 740–755
Lin G, Milan A, Shen C et al (2017a) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: IEEE conference on computer vision and pattern recognition, Hawaii, pp 5168–5177
Lin T, Dollar P, Girshick RB et al (2017b) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition, Hawaii, pp 936–944
Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579
Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot MultiBox detector. In: European conference on computer vision, Amsterdam, pp 21–37
Liu X, Deng Z, Yang Y (2018) Recent progress in semantic image segmentation. Artif Intell Rev 6:1–18
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, Boston, pp 3431–3440
Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation. arXiv.org
Pai-Hsuen Chen, Chih-Jen Lin, Bernhard Schölkopf (2005) A tutorial on v-support vector machines. Appl Stoch Models Bus Ind 21(2):111–136
Romera E, Alvarez J, Bergasa L, Arroyo R (2017) Efficient ConvNet for real-time semantic segmentation. In: 2017 IEEE intelligent vehicles symposium (IV). IEEE
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Munich, pp 234–241
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumenhart DE, McCelland JL (eds) Parallel distributed processing: explorations in the microstructure of cognition. MIT Press, Cambridge, pp 318–362
Russakovsky O, Deng J, Su H et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Salscheider N (2019) Simultaneous object detection and semantic segmentation. arXiv.org
Shetty S (2012) Application of convolutional neural network for image classification on pascal voc challenge 2012 dataset. arXiv.org
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way toprevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Sun Z, Xue L, Xu Y (2012) A review of in-depth learning. Comput Appl Res 29(8):2806–2810
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, Boston, pp 1–9
Wang P, Chen P, Yuan Y et al (2017) Understanding convolution for semantic segmentation. arXiv preprint arXiv:1702.08502
Wei Y, Zhao Y (2016) Review of image semantic segmentation based on DCNN. J Beijing Jiaotong Univ 40(4):82–91
Wu Y, He K (2018) Group normalization. arXiv preprint arXiv:1803.08494
Yang F (2014) Development status and prospects of driverless cars. Shanghai Automot 3:35–40
Yu H, Yang Z, Tan L, Wang Y, Sun W, Sun M et al (2018) Methods and datasets on semantic segmentation: a review. Neurocomputing 304:S0925231218304077
Zhao H, Shi J, Qi X et al (2017a) Pyramid scene parsing network. In: IEEE conference on computer vision and pattern recognition, Hawaii, pp 2881–2890
Zhao H, Qi X, Shen X et al (2017b) Icnet for real-time semantic segmentation on high-resolution images. arXiv preprint arXiv:1704.08545
Funding
This study was funded by Shenzhen Government (Grant Nos. KQJSCX20170726104033 357, JCYJ20150513151706567 and JCYJ20160531191837793). Furthermore, this work was partially supported by the Department of Industrial and Systems Engineering of the Hong Kong Polytechnic University (Grant No. H-ZG3K).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by Mu-Yen Chen.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tseng, KK., Sun, H., Liu, J. et al. Image semantic segmentation with an improved fully convolutional network. Soft Comput 24, 8253–8273 (2020). https://doi.org/10.1007/s00500-019-04537-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04537-8