Abstract
Recognizing the content of an image is an important challenge in machine vision. Semantic segmentation is one of the most important ways to overcome this challenge. It is utilized in different applications such as autonomous driving, indoor navigation, virtual or augmented reality systems, and recognition tasks. In this paper, a novel and practical deep fully convolutional neural network architecture was introduced for semantic pixel-wise segmentation termed as P-DecovNet. The proposed architecture combines the Convolution-Deconvolution Neural Network architecture with the Pyramid Pooling Module. In this project, the high-level features were extracted from the image using the Convolutional Neural Network. To reinforce the local information, the Pooling module was added to the architecture. CamVid road scene dataset was used to evaluate the performance of the P-DecovNet. With respect to different criteria (including - but not limited to - accuracy and mIoU), the experimental results demonstrated that P-DecovNet practically has a good performance in the domain of Convolution-Deconvolution Network. To achieve such performance, this work uses a smaller number of training images with lesser iterations compared to the existing methods.
Similar content being viewed by others
References
Alhaija H, Mustikovela S, Mescheder L, Geiger A, Rother C (2018) Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes. International Journal of Computer Vision (IJCV)
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12)
Brostow G, Fauqueur J, Cipolla R (2009) Semantic object classes in video: A high-definition ground truth database. PRL 30(2):88–97
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected. crfs. In: ICLR
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4)
Dumoulin et al (2018) Feature-wise transformations. Distill. https://doi.org/10.23915/distill.00011
A. Garcia-Garcia, et al. (2017) A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv:1704.06857
Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hyper-columns for object segmentation and fine-grained localization. In: CVPR
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. CVPR
Jerripothula KR, Cai J, Yuan J (2016) Image Co-segmentation via saliency co-fusion. IEEE Trans on Multimedia 18(9):1896–1909
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR)
Kong S, Fowlkes C (2018) Pixel-wise Attentional Gating for Parsimonious Pixel Labeling. arXiv:1805.01556
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105
LeCun Y, Boser B, Denker J, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to hand-written zip code recognition. Neural Comput
LeCun YA, Bottou L, Orr GB, Müller K-R (1998) Efficient backprop. In: Neural networks: Tricks of the trade, pages 9–48. Springer
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multipath refinement networks with identity mappings for highresolution semantic segmentation. In: CVPR
F. Liu, C. Shen, G. Lin, and I. D. Reid (2015) Learning depth from single monocular images using deep convolutional neural fields. CoRR, abs/150207411
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR
Mortensen EN, Barrett WA (1998. [Online) Interactive Segmentation with Intelligent Scissors. Graphical Models and Image Processing 60(5):349–384. https://doi.org/10.1006/gmip.1998.0480
Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. Proc IEEE Conf Comput Vis Pattern Recognit:3376–3385
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In ICCV
Pohlen T, Hermans A, Mathias M, Leibe B (2017) Fullresolution residual networks for semantic segmentation in street scenes. In: CVPR
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In MICCAI
Rother C, Kolmogorov V, Blake A (2004) Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314
Thoma M (2016) A survey of semantic segmentation, CoRR, vol. abs/1602.06541, Available: http://arxiv.org/abs/1602.06541
Wang SH, Lv YD, Sui Y, Liu S, Wang SJ, Zhang YD (2018) Alcoholism Detection by Data Augmentation and Convolutional Neural Network with Stochastic Pooling. J Med Syst 42(2)
Wenzhe S, Jose C, Lucas T, Ference H, Andrew A, Christian L (2016) Wang Zehan: “Is the deconvolution layer the same as a convolutional layer,” arXiv 1609:07009
Yang G, Zhao H, Shi J, Deng Z, Jia J (2018) SegStereo: Exploiting Semantic Information for Disparity Estimation. arXiv:1807.11699
Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. In: International Conference on Learning Representations (ICLR), IEEE, Scottsdale, pp 1–7.
Zhang YD, Muhammad K, Tan C (2018) Twelve-layer deep convolutional neural network with stochastic pooling for tea category classification on GPU platform. Multimed Tools Appl 77:22821
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P (2015) Conditional random fields as recurrent neural networks. In: ICCV
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Malekijoo, A., Fadaeieslam, M.J. Convolution-deconvolution architecture with the pyramid pooling module for semantic segmentation. Multimed Tools Appl 78, 32379–32392 (2019). https://doi.org/10.1007/s11042-019-07990-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-07990-7