Skip to main content
Log in

Convolution-deconvolution architecture with the pyramid pooling module for semantic segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recognizing the content of an image is an important challenge in machine vision. Semantic segmentation is one of the most important ways to overcome this challenge. It is utilized in different applications such as autonomous driving, indoor navigation, virtual or augmented reality systems, and recognition tasks. In this paper, a novel and practical deep fully convolutional neural network architecture was introduced for semantic pixel-wise segmentation termed as P-DecovNet. The proposed architecture combines the Convolution-Deconvolution Neural Network architecture with the Pyramid Pooling Module. In this project, the high-level features were extracted from the image using the Convolutional Neural Network. To reinforce the local information, the Pooling module was added to the architecture. CamVid road scene dataset was used to evaluate the performance of the P-DecovNet. With respect to different criteria (including - but not limited to - accuracy and mIoU), the experimental results demonstrated that P-DecovNet practically has a good performance in the domain of Convolution-Deconvolution Network. To achieve such performance, this work uses a smaller number of training images with lesser iterations compared to the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://github.com/malekijoo/P-DecovNet.

References

  1. Alhaija H, Mustikovela S, Mescheder L, Geiger A, Rother C (2018) Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes. International Journal of Computer Vision (IJCV)

  2. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12)

    Article  Google Scholar 

  3. Brostow G, Fauqueur J, Cipolla R (2009) Semantic object classes in video: A high-definition ground truth database. PRL 30(2):88–97

    Article  Google Scholar 

  4. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected. crfs. In: ICLR

  5. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4)

    Article  Google Scholar 

  6. Dumoulin et al (2018) Feature-wise transformations. Distill. https://doi.org/10.23915/distill.00011

  7. A. Garcia-Garcia, et al. (2017) A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv:1704.06857

  8. Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hyper-columns for object segmentation and fine-grained localization. In: CVPR

  9. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. CVPR

  10. Jerripothula KR, Cai J, Yuan J (2016) Image Co-segmentation via saliency co-fusion. IEEE Trans on Multimedia 18(9):1896–1909

    Article  Google Scholar 

  11. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR)

  12. Kong S, Fowlkes C (2018) Pixel-wise Attentional Gating for Parsimonious Pixel Labeling. arXiv:1805.01556

  13. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105

  14. LeCun Y, Boser B, Denker J, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to hand-written zip code recognition. Neural Comput

  15. LeCun YA, Bottou L, Orr GB, Müller K-R (1998) Efficient backprop. In: Neural networks: Tricks of the trade, pages 9–48. Springer

  16. Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multipath refinement networks with identity mappings for highresolution semantic segmentation. In: CVPR

  17. F. Liu, C. Shen, G. Lin, and I. D. Reid (2015) Learning depth from single monocular images using deep convolutional neural fields. CoRR, abs/150207411

  18. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR

  19. Mortensen EN, Barrett WA (1998. [Online) Interactive Segmentation with Intelligent Scissors. Graphical Models and Image Processing 60(5):349–384. https://doi.org/10.1006/gmip.1998.0480

    Article  MATH  Google Scholar 

  20. Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feedforward semantic segmentation with zoom-out features. Proc IEEE Conf Comput Vis Pattern Recognit:3376–3385

  21. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In ICCV

  22. Pohlen T, Hermans A, Mathias M, Leibe B (2017) Fullresolution residual networks for semantic segmentation in street scenes. In: CVPR

  23. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In MICCAI

  24. Rother C, Kolmogorov V, Blake A (2004) Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314

    Article  Google Scholar 

  25. Thoma M (2016) A survey of semantic segmentation, CoRR, vol. abs/1602.06541, Available: http://arxiv.org/abs/1602.06541

  26. Wang SH, Lv YD, Sui Y, Liu S, Wang SJ, Zhang YD (2018) Alcoholism Detection by Data Augmentation and Convolutional Neural Network with Stochastic Pooling. J Med Syst 42(2)

  27. Wenzhe S, Jose C, Lucas T, Ference H, Andrew A, Christian L (2016) Wang Zehan: “Is the deconvolution layer the same as a convolutional layer,” arXiv 1609:07009

    Google Scholar 

  28. Yang G, Zhao H, Shi J, Deng Z, Jia J (2018) SegStereo: Exploiting Semantic Information for Disparity Estimation. arXiv:1807.11699

  29. Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. In: International Conference on Learning Representations (ICLR), IEEE, Scottsdale, pp 1–7.

  30. Zhang YD, Muhammad K, Tan C (2018) Twelve-layer deep convolutional neural network with stochastic pooling for tea category classification on GPU platform. Multimed Tools Appl 77:22821

    Article  Google Scholar 

  31. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on

  32. Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P (2015) Conditional random fields as recurrent neural networks. In: ICCV

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amirhossein Malekijoo.

Ethics declarations

Conflict of interest

The authors declared no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malekijoo, A., Fadaeieslam, M.J. Convolution-deconvolution architecture with the pyramid pooling module for semantic segmentation. Multimed Tools Appl 78, 32379–32392 (2019). https://doi.org/10.1007/s11042-019-07990-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-07990-7

Keywords

Navigation