skip to main content
10.1145/3372806.3372811acmotherconferencesArticle/Chapter ViewAbstractPublication PagesspmlConference Proceedingsconference-collections
research-article

Multi-Scale Deep Convolutional Nets with Attention Model and Conditional Random Fields for Semantic Image Segmentation

Authors Info & Claims
Published:21 January 2020Publication History

ABSTRACT

Although Convolutional Neural Networks are effective visual models that generate hierarchies of features, there still exist some shortcomings in the application of Deep Convolutional Neural Networks to semantic image segmentation. In this work, our algorithm incorporates multi-scale atrous convolution, attention model and Conditional Random Fields to tackle this problem. Firstly, our method replaces deconvolutional layers with atrous convolutional layers to avoid reducing feature resolution when the Deep Convolutional Neural Networks is employed in a fully convolutional fashion. Secondly, multi-scale architecture and attention model are used to extract the existence of features at multiple scales. Thirdly, we use Conditional Random Fields to prevent the built-in invariance of Deep Convolutional Neural Networks reducing localization accuracy. Moreover, our network completely integrates Conditional Random Fields modelling with Deep Convolutional Neural Networks, making it possible to train the deep network end-to-end. In this paper, our method is used to the matters of semantic image segmentation and is demonstrated the effectiveness of our model with experiments on PASCAL VOC 2012.

References

  1. Lecun, Y. et al. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86.11(1998):2278--2324.Google ScholarGoogle Scholar
  2. Krizhevsky, Alex, I. Sutskever, and G. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. NIPS Curran Associates Inc. (2012).Google ScholarGoogle Scholar
  3. Simonyan, Karen, and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014).Google ScholarGoogle Scholar
  4. Erhan, Dumitru, et al. 2013. Scalable Object Detection using Deep Neural Networks. (2013).Google ScholarGoogle Scholar
  5. Girshick, Ross. 2015. Fast R-CNN. Computer Science (2015).Google ScholarGoogle Scholar
  6. Farabet, Clement, et al. 2013. Learning Hierarchical Features for Scene Labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35.8(2013):1915--1929.Google ScholarGoogle Scholar
  7. Hariharan, Bharath, et al. 2014. Hypercolumns for Object Segmentation and Fine-grained Localization. (2014).Google ScholarGoogle Scholar
  8. Lin, Guosheng, et al. 2015. Efficient piecewise training of deep structured models for semantic segmentation. (2015).Google ScholarGoogle Scholar
  9. Chen, Liang Chieh, et al. 2014. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. Computer Science (2014).Google ScholarGoogle Scholar
  10. Long, Jonathan, E. Shelhamer, and T. Darrell. 2014. Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence39.4(2014):640--651.Google ScholarGoogle Scholar
  11. Xie, Saining, and Z. Tu. 2015. Holistically-Nested Edge Detection. International Journal of Computer Vision 125.1-3(2015):3--18.Google ScholarGoogle Scholar
  12. Mnih, Volodymyr, et al. 2014. Recurrent Models of Visual Attention. Advances in neural information processing systems3(2014).Google ScholarGoogle Scholar
  13. Xu, Kelvin, et al. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Computer Science(2015):2048--2057.Google ScholarGoogle Scholar
  14. Bahdanau, Dzmitry, K. Cho, and Y. Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. Computer Science (2014).Google ScholarGoogle Scholar
  15. Chen, Liang Chieh, et al. 2015. Attention to Scale: Scale-aware Semantic Image Segmentation. (2015).Google ScholarGoogle Scholar
  16. Krähenbühl, Philipp, and V. Koltun. 2012. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. (2012).Google ScholarGoogle Scholar
  17. Ladicky, L'Ubor, et al. 2009. Associative Hierarchical CRFs for Object Class Image Segmentation. Computer Vision, 2009 IEEE 12th International Conference on IEEE, (2009).Google ScholarGoogle ScholarCross RefCross Ref
  18. Zheng, Shuai, et al. 2015. Conditional Random Fields as Recurrent Neural Networks. 2015 IEEE International Conference on Computer Vision (ICCV) IEEE, (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Shotton, Jamie, et al. 2009. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context. International Journal of Computer Vision 81.1(2009):2--23.Google ScholarGoogle Scholar
  20. Tu, Zhuowen. 2008. Auto-context and its application to high-level vision tasks. Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on IEEE, (2008).Google ScholarGoogle Scholar
  21. Shotton, Jamie, M. Johnson, and R. Cipolla. 2008. Semantic texton forests for image categorization and segmentation. Proc IEEE Cvpr 5.7(2008):1--8.Google ScholarGoogle Scholar
  22. Fulkerson, B.. 2009. Class segmentation and object localization with superpixel neighborhoods. ICCV, 2009 (2009).Google ScholarGoogle ScholarCross RefCross Ref
  23. João Carreira, et al. 2012. Semantic Segmentation with Second-Order Pooling. (2012).Google ScholarGoogle Scholar
  24. He, Xuming, R. S. Zemel, and M. A. Carreira-Perpinan. 2004. Multiscale Conditional Random Fields for Image Labeling. (2004).Google ScholarGoogle Scholar
  25. Pont-Tuset, Jordi, et al. 2016. Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation. IEEE Transactions on Pattern Analysis and Machine Intelligence(2016):1--1.Google ScholarGoogle Scholar
  26. Uijlings, J. R. R., K. E. A. van de Sande, et al. 2013. Selective Search for Object Recognition. International Journal of Computer Vision 104.2(2013):154--171.Google ScholarGoogle Scholar
  27. Bell, Sean, et al. 2015. Material recognition in the wild with the materials in context database. Proceedings of the IEEE conference on computer vision and pattern recognition. (2015).Google ScholarGoogle ScholarCross RefCross Ref
  28. Papandreou, George, et al. 2015. Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation. (2015).Google ScholarGoogle Scholar
  29. Holschneider, M., et al. 1989. A Real-Time Algorithm for Signal Analysis with the Help of the Wavelet Transform. (1989).Google ScholarGoogle Scholar
  30. Chen, Liang Chieh, et al. 2016. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis & Machine Intelligence 40.4(2016):834--848.Google ScholarGoogle Scholar
  31. Chen, Liang Chieh, et al. 2017. Rethinking Atrous Convolution for Semantic Image Segmentation. (2017).Google ScholarGoogle Scholar
  32. Florack, Luc, et al. 1996. The Gaussian scale-space paradigm and the multiscale local jet. International Journal of Computer Vision 18.1(1996):61--75.Google ScholarGoogle Scholar
  33. Arbelaez, Pablo, et al. 2011. Contour Detection and Hierarchical Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33.5(2011):898--916.Google ScholarGoogle Scholar
  34. Cireşan, Dan, U. Meier, and J. Schmidhuber. 2012. Multi-column Deep Neural Networks for Image Classification. (2012).Google ScholarGoogle Scholar
  35. Dai, Jifeng, K. He, and J. Sun. 2015. BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. (2015).Google ScholarGoogle Scholar
  36. Felzenszwalb, Pedro F, et al. 2010. Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Software Engineering 32.9(2010):1627--1645.Google ScholarGoogle Scholar
  37. Papandreou, George, Iasonas Kokkinos, and Pierre-André Savalle. 2014. Untangling local and global deformations in deep convolutional networks for image classification and sliding window detection. arXiv preprint arXiv:1412.0296 (2014).Google ScholarGoogle Scholar
  38. Lafferty, John, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).Google ScholarGoogle Scholar
  39. Koller, Daphne, and N. Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press (2009).Google ScholarGoogle Scholar
  40. Liu, Wei, A. Rabinovich, and A. C. Berg. 2015. ParseNet: Looking Wider to See Better. Computer Science (2015). Mostajabi, Mohammadreza, P. Yadollahpour, and G. Shakhnarovich. 2014. Feedforward semantic segmentation with zoom-out features. (2014).Google ScholarGoogle Scholar

Index Terms

  1. Multi-Scale Deep Convolutional Nets with Attention Model and Conditional Random Fields for Semantic Image Segmentation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SPML '19: Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning
      November 2019
      135 pages
      ISBN:9781450372213
      DOI:10.1145/3372806

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 January 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader