Abstract
In this work, we investigate the effects of the cascade architecture of dilated convolutions and the deep network architecture of multi-resolution input images on the accuracy of semantic segmentation. We show that a cascade of dilated convolutions is not only able to efficiently capture larger context without increasing computational costs, but can also improve the localization performance. In addition, the deep network architecture for multi-resolution input images increases the accuracy of semantic segmentation by aggregating multi-scale contextual information. Furthermore, our fully convolutional neural network is coupled with a model of fully connected conditional random fields to further remove isolated false positives and improve the prediction along object boundaries. We present several experiments on two challenging image segmentation datasets, showing substantial improvements over strong baselines.
Similar content being viewed by others
References
Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected crfs. In: ICLR
Cogswell M, Lin X, Purushwalkam S, Batra D (2014) Combining the best of graphical models and ConvNets for semantic segmentation. In: Arxiv preprint arXiv:1412.4313
Dai J, He K, Sun J (2015) Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: ICCV
Everingham M, Eslami SMA, Gool LV, Williams CKI, Winn J, Zisserma A (2014) The pascal visual object classes challenge a retrospective. In: IJCV
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE PAMI 35(8):1915–1929
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR
Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. arXiv:1302.4389
Gutman D, Codella NC, Celebi E, Helba B, Marchetti M, Mishra N, Halpern A (2016) Skin lesion analysis toward melanoma detection: a challenge at the international symposium on biomedical imaging (ISBI) 2016, hosted by the international skin imaging collaboration (ISIC). arXiv:1605.01397
Hariharan B, Arbelaez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: International conference on computer vision (ICCV)
Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hyper-columns for object segmentation and fine-grained localization. In: CVPR
He X, Zemel R, Carreira-Perpindn M (2004) Multiscale conditional random fields for image labeling. In: CVPR 2004, vol 2, pp II–695–II–702
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385
Hft N, Schulz H, Behnke S (2014) Fast semantic segmentation of rgb-d scenes with gpu-accelerated deep neural networks. In: KI 2014: advances in artificial intelligence, vol 8736 of lecture notes in computer science. Springer International Publishing, pp 80–85
Kraenbuehl P, Koltun V (2007) Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Proceedings of the 20th international conference on neural information processing systems. Vancouver, British Columbia
Krizhevsky A, Sutskever I, Hinton GE (2013) Imagenet classification with deep convolutional neural networks. In: NIPS
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE
Lin TY (2014) Microsoft COCO: common objects in context. In: ECCV
Lin G, Shen C, Reid I (2015) Efficient piecewise training of deep structured models for semantic segmentation. arXiv:1504.01013
Liu Z, Li X, Luo P, Loy CC, Tang X (2015) Semantic image segmentation via deep parsing network. In: ICCV
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Mostajabi M, Yadollahpour P, Shakhnarovich G (2015) Feed forward semantic segmentation with zoom-out features. In: CVPR
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. arXiv:1505.04366
Papandreou G, Kokkinos I, Savalle PA (2014) Untangling local and global deformations in deep convolutional networks for image classification and sliding window detection. arXiv:1412.0296
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147
Plath N, Toussaint M, Nakajima S (2009) Multi-class image segmentation using conditional random fields and global classification. In: Proceedings of the 26th annual international conference on machine learning, Montreal, Quebec, Canada
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: MICCAI
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229
Shotton J, Winn J, Rother C, Criminisi A (2006) Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: ECCV 2006. Springer, pp 1–15
Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image categorization and segmentation. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Socher R, Lin CC, Manning C, Ng AY (2011) Parsing natural scenes and natural language with recursive neural networks. In: ICML, pp 129–136
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842
Verbeek J, Triggs B Scene segmentation with conditional random fields learned from partially labeled images, Vancouver, British Columbia
Zhang Y, Brady M, Smith S (2001) Segmentation of brain MR images through a hidden Markov random field model and the expectation maximization algorithm. IEEE Trans Med Imaging 20(1):45–57
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537
Acknowledgements
This work was supported by the GRRC program of Gyeonggi province [GRRC-Gachon2017(B01), Analysis of behavior based on senior life log].
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vo, D.M., Lee, SW. Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions. Multimed Tools Appl 77, 18689–18707 (2018). https://doi.org/10.1007/s11042-018-5653-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5653-x