ABSTRACT
Although Convolutional Neural Networks are effective visual models that generate hierarchies of features, there still exist some shortcomings in the application of Deep Convolutional Neural Networks to semantic image segmentation. In this work, our algorithm incorporates multi-scale atrous convolution, attention model and Conditional Random Fields to tackle this problem. Firstly, our method replaces deconvolutional layers with atrous convolutional layers to avoid reducing feature resolution when the Deep Convolutional Neural Networks is employed in a fully convolutional fashion. Secondly, multi-scale architecture and attention model are used to extract the existence of features at multiple scales. Thirdly, we use Conditional Random Fields to prevent the built-in invariance of Deep Convolutional Neural Networks reducing localization accuracy. Moreover, our network completely integrates Conditional Random Fields modelling with Deep Convolutional Neural Networks, making it possible to train the deep network end-to-end. In this paper, our method is used to the matters of semantic image segmentation and is demonstrated the effectiveness of our model with experiments on PASCAL VOC 2012.
- Lecun, Y. et al. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86.11(1998):2278--2324.Google Scholar
- Krizhevsky, Alex, I. Sutskever, and G. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. NIPS Curran Associates Inc. (2012).Google Scholar
- Simonyan, Karen, and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014).Google Scholar
- Erhan, Dumitru, et al. 2013. Scalable Object Detection using Deep Neural Networks. (2013).Google Scholar
- Girshick, Ross. 2015. Fast R-CNN. Computer Science (2015).Google Scholar
- Farabet, Clement, et al. 2013. Learning Hierarchical Features for Scene Labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35.8(2013):1915--1929.Google Scholar
- Hariharan, Bharath, et al. 2014. Hypercolumns for Object Segmentation and Fine-grained Localization. (2014).Google Scholar
- Lin, Guosheng, et al. 2015. Efficient piecewise training of deep structured models for semantic segmentation. (2015).Google Scholar
- Chen, Liang Chieh, et al. 2014. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. Computer Science (2014).Google Scholar
- Long, Jonathan, E. Shelhamer, and T. Darrell. 2014. Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence39.4(2014):640--651.Google Scholar
- Xie, Saining, and Z. Tu. 2015. Holistically-Nested Edge Detection. International Journal of Computer Vision 125.1-3(2015):3--18.Google Scholar
- Mnih, Volodymyr, et al. 2014. Recurrent Models of Visual Attention. Advances in neural information processing systems3(2014).Google Scholar
- Xu, Kelvin, et al. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Computer Science(2015):2048--2057.Google Scholar
- Bahdanau, Dzmitry, K. Cho, and Y. Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. Computer Science (2014).Google Scholar
- Chen, Liang Chieh, et al. 2015. Attention to Scale: Scale-aware Semantic Image Segmentation. (2015).Google Scholar
- Krähenbühl, Philipp, and V. Koltun. 2012. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. (2012).Google Scholar
- Ladicky, L'Ubor, et al. 2009. Associative Hierarchical CRFs for Object Class Image Segmentation. Computer Vision, 2009 IEEE 12th International Conference on IEEE, (2009).Google ScholarCross Ref
- Zheng, Shuai, et al. 2015. Conditional Random Fields as Recurrent Neural Networks. 2015 IEEE International Conference on Computer Vision (ICCV) IEEE, (2015).Google ScholarDigital Library
- Shotton, Jamie, et al. 2009. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context. International Journal of Computer Vision 81.1(2009):2--23.Google Scholar
- Tu, Zhuowen. 2008. Auto-context and its application to high-level vision tasks. Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on IEEE, (2008).Google Scholar
- Shotton, Jamie, M. Johnson, and R. Cipolla. 2008. Semantic texton forests for image categorization and segmentation. Proc IEEE Cvpr 5.7(2008):1--8.Google Scholar
- Fulkerson, B.. 2009. Class segmentation and object localization with superpixel neighborhoods. ICCV, 2009 (2009).Google ScholarCross Ref
- João Carreira, et al. 2012. Semantic Segmentation with Second-Order Pooling. (2012).Google Scholar
- He, Xuming, R. S. Zemel, and M. A. Carreira-Perpinan. 2004. Multiscale Conditional Random Fields for Image Labeling. (2004).Google Scholar
- Pont-Tuset, Jordi, et al. 2016. Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation. IEEE Transactions on Pattern Analysis and Machine Intelligence(2016):1--1.Google Scholar
- Uijlings, J. R. R., K. E. A. van de Sande, et al. 2013. Selective Search for Object Recognition. International Journal of Computer Vision 104.2(2013):154--171.Google Scholar
- Bell, Sean, et al. 2015. Material recognition in the wild with the materials in context database. Proceedings of the IEEE conference on computer vision and pattern recognition. (2015).Google ScholarCross Ref
- Papandreou, George, et al. 2015. Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation. (2015).Google Scholar
- Holschneider, M., et al. 1989. A Real-Time Algorithm for Signal Analysis with the Help of the Wavelet Transform. (1989).Google Scholar
- Chen, Liang Chieh, et al. 2016. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis & Machine Intelligence 40.4(2016):834--848.Google Scholar
- Chen, Liang Chieh, et al. 2017. Rethinking Atrous Convolution for Semantic Image Segmentation. (2017).Google Scholar
- Florack, Luc, et al. 1996. The Gaussian scale-space paradigm and the multiscale local jet. International Journal of Computer Vision 18.1(1996):61--75.Google Scholar
- Arbelaez, Pablo, et al. 2011. Contour Detection and Hierarchical Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33.5(2011):898--916.Google Scholar
- Cireşan, Dan, U. Meier, and J. Schmidhuber. 2012. Multi-column Deep Neural Networks for Image Classification. (2012).Google Scholar
- Dai, Jifeng, K. He, and J. Sun. 2015. BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. (2015).Google Scholar
- Felzenszwalb, Pedro F, et al. 2010. Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Software Engineering 32.9(2010):1627--1645.Google Scholar
- Papandreou, George, Iasonas Kokkinos, and Pierre-André Savalle. 2014. Untangling local and global deformations in deep convolutional networks for image classification and sliding window detection. arXiv preprint arXiv:1412.0296 (2014).Google Scholar
- Lafferty, John, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).Google Scholar
- Koller, Daphne, and N. Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press (2009).Google Scholar
- Liu, Wei, A. Rabinovich, and A. C. Berg. 2015. ParseNet: Looking Wider to See Better. Computer Science (2015). Mostajabi, Mohammadreza, P. Yadollahpour, and G. Shakhnarovich. 2014. Feedforward semantic segmentation with zoom-out features. (2014).Google Scholar
Index Terms
- Multi-Scale Deep Convolutional Nets with Attention Model and Conditional Random Fields for Semantic Image Segmentation
Recommendations
Convolutional neural network based deep conditional random fields for stereo matching
A deep CRF based stereo matching algorithm with CNN is proposed.The CNN potential function learns the potentials of CRF in a CNN framework.The inference of the deep CRF model is formulated as a Recurrent Neural Network.The deep CRF based algorithm ...
Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions
In this work, we investigate the effects of the cascade architecture of dilated convolutions and the deep network architecture of multi-resolution input images on the accuracy of semantic segmentation. We show that a cascade of dilated convolutions is ...
Multi-level graph convolutional recurrent neural network for semantic image segmentation
AbstractWith the advent of the Internet of Things (IoT) era, many devices have surfaced that capture and generate various visual data. To recognize and extract a meaningful pattern from these visual data, powerful methods are required for different IoT ...
Comments