Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions

Vo, Duc My; Lee, Sang-Woong

doi:10.1007/s11042-018-5653-x

Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions

Published: 22 February 2018

Volume 77, pages 18689–18707, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Duc My Vo¹ &
Sang-Woong Lee¹

1811 Accesses
32 Citations
6 Altmetric
Explore all metrics

Abstract

In this work, we investigate the effects of the cascade architecture of dilated convolutions and the deep network architecture of multi-resolution input images on the accuracy of semantic segmentation. We show that a cascade of dilated convolutions is not only able to efficiently capture larger context without increasing computational costs, but can also improve the localization performance. In addition, the deep network architecture for multi-resolution input images increases the accuracy of semantic segmentation by aggregating multi-scale contextual information. Furthermore, our fully convolutional neural network is coupled with a model of fully connected conditional random fields to further remove isolated false positives and improve the prediction along object boundaries. We present several experiments on two challenging image segmentation datasets, showing substantial improvements over strong baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Semantic Segmentation Based on Fully Convolutional Neural Network and CRF

Enhanced multi-scale networks for semantic segmentation

Article Open access 04 December 2023

Deep Context Convolutional Neural Networks for Semantic Segmentation

References

Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected crfs. In: ICLR
Cogswell M, Lin X, Purushwalkam S, Batra D (2014) Combining the best of graphical models and ConvNets for semantic segmentation. In: Arxiv preprint arXiv:1412.4313
Dai J, He K, Sun J (2015) Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: ICCV
Everingham M, Eslami SMA, Gool LV, Williams CKI, Winn J, Zisserma A (2014) The pascal visual object classes challenge a retrospective. In: IJCV
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE PAMI 35(8):1915–1929
Article Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR
Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. arXiv:1302.4389
Gutman D, Codella NC, Celebi E, Helba B, Marchetti M, Mishra N, Halpern A (2016) Skin lesion analysis toward melanoma detection: a challenge at the international symposium on biomedical imaging (ISBI) 2016, hosted by the international skin imaging collaboration (ISIC). arXiv:1605.01397
Hariharan B, Arbelaez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: International conference on computer vision (ICCV)
Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hyper-columns for object segmentation and fine-grained localization. In: CVPR
He X, Zemel R, Carreira-Perpindn M (2004) Multiscale conditional random fields for image labeling. In: CVPR 2004, vol 2, pp II–695–II–702
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385
Hft N, Schulz H, Behnke S (2014) Fast semantic segmentation of rgb-d scenes with gpu-accelerated deep neural networks. In: KI 2014: advances in artificial intelligence, vol 8736 of lecture notes in computer science. Springer International Publishing, pp 80–85
Kraenbuehl P, Koltun V (2007) Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Proceedings of the 20th international conference on neural information processing systems. Vancouver, British Columbia
Krizhevsky A, Sutskever I, Hinton GE (2013) Imagenet classification with deep convolutional neural networks. In: NIPS
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE
Lin TY (2014) Microsoft COCO: common objects in context. In: ECCV
Lin G, Shen C, Reid I (2015) Efficient piecewise training of deep structured models for semantic segmentation. arXiv:1504.01013
Liu Z, Li X, Luo P, Loy CC, Tang X (2015) Semantic image segmentation via deep parsing network. In: ICCV
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feed forward semantic segmentation with zoom-out features. In: CVPR
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. arXiv:1505.04366
Papandreou G, Kokkinos I, Savalle PA (2014) Untangling local and global deformations in deep convolutional networks for image classification and sliding window detection. arXiv:1412.0296
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147
Plath N, Toussaint M, Nakajima S (2009) Multi-class image segmentation using conditional random fields and global classification. In: Proceedings of the 26th annual international conference on machine learning, Montreal, Quebec, Canada
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: MICCAI
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229
Shotton J, Winn J, Rother C, Criminisi A (2006) Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: ECCV 2006. Springer, pp 1–15
Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image categorization and segmentation. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Socher R, Lin CC, Manning C, Ng AY (2011) Parsing natural scenes and natural language with recursive neural networks. In: ICML, pp 129–136
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842
Verbeek J, Triggs B Scene segmentation with conditional random fields learned from partially labeled images, Vancouver, British Columbia
Zhang Y, Brady M, Smith S (2001) Segmentation of brain MR images through a hidden Markov random field model and the expectation maximization algorithm. IEEE Trans Med Imaging 20(1):45–57
Article Google Scholar
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537

Download references

Acknowledgements

This work was supported by the GRRC program of Gyeonggi province [GRRC-Gachon2017(B01), Analysis of behavior based on senior life log].

Author information

Authors and Affiliations

Pattern Recognition and Machine Learning Lab, Gachon University, 1342 Seongnamdaero, Sujeonggu, Seongnam, 13120, Korea
Duc My Vo & Sang-Woong Lee

Authors

Duc My Vo
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Woong Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sang-Woong Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vo, D.M., Lee, SW. Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions. Multimed Tools Appl 77, 18689–18707 (2018). https://doi.org/10.1007/s11042-018-5653-x

Download citation

Received: 05 August 2017
Revised: 18 December 2017
Accepted: 14 January 2018
Published: 22 February 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s11042-018-5653-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions

Abstract

Access this article

Similar content being viewed by others

Image Semantic Segmentation Based on Fully Convolutional Neural Network and CRF

Enhanced multi-scale networks for semantic segmentation

Deep Context Convolutional Neural Networks for Semantic Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions

Abstract

Access this article

Similar content being viewed by others

Image Semantic Segmentation Based on Fully Convolutional Neural Network and CRF

Enhanced multi-scale networks for semantic segmentation

Deep Context Convolutional Neural Networks for Semantic Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation