skip to main content
10.1145/3240508.3243933acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

End2End Semantic Segmentation for 3D Indoor Scenes

Published:15 October 2018Publication History

ABSTRACT

This research is concerned with semantic segmentation of 3D point clouds arising from videos of 3D indoor scenes. It is an important building block of 3D scene understanding and has promising applications such as augmented reality and robotics. Although various deep learning based approaches have been proposed to replicate the success of 2D semantic segmentation in 3D domain, they either result in severe information loss or fail to model the geometric structures well. In this paper, we aim to model the local and global geometric structures of 3D scenes by designing an end-to-end 3D semantic segmentation framework. It captures the local geometries from point-level feature learning and voxel-level aggregation, models the global structures via 3D CNN, and enforces label consistency with high-order CRF. Through preliminary experiments conducted on two indoor datasets, we describe our insights on the proposed approach, and present some directions to be pursued in the future.

References

  1. Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1534--1543.Google ScholarGoogle ScholarCross RefCross Ref
  2. Anurag Arnab, Sadeep Jayasumana, Shuai Zheng, and Philip HS Torr. 2016. Higher order conditional random fields in deep neural networks. In European Conference on Computer Vision. Springer, 524--540.Google ScholarGoogle ScholarCross RefCross Ref
  3. Leila Besharati Tabrizi and Mehran Mahvash. 2015. Augmented reality--guided neurosurgery: accuracy and intraoperative application of an image projection technique. Journal of neurosurgery 123, 1 (2015), 206--211.Google ScholarGoogle ScholarCross RefCross Ref
  4. Alexandre Boulch, Joris Guerry, Bertrand Le Saux, and Nicolas Audebert. 2017. SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks. Computers & Graphics (2017).Google ScholarGoogle Scholar
  5. Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236 (2016).Google ScholarGoogle Scholar
  6. Nesrine Chehata, Li Guo, and Clément Mallet. 2009. Airborne lidar feature selection for urban classification using random forests. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 38, Part 3 (2009), W8.Google ScholarGoogle Scholar
  7. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3213--3223.Google ScholarGoogle ScholarCross RefCross Ref
  8. Gabriela Csurka and Florent Perronnin. 2008. A Simple High Performance Approach to Semantic Segmentation.. In BMVC. 1--10.Google ScholarGoogle Scholar
  9. Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Vol. 1.Google ScholarGoogle ScholarCross RefCross Ref
  10. Francis Engelmann, Theodora Kontogianni, Alexander Hermans, and Bastian Leibe. 2017. Exploring spatial context for 3d semantic segmentation of point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716--724.Google ScholarGoogle ScholarCross RefCross Ref
  11. Evangelos Kalogerakis, Melinos Averkiou, Subhransu Maji, and Siddhartha Chaudhuri. 2016. 3D Shape Segmentation with Projective Convolutional Networks. arXiv preprint arXiv:1612.02808 (2016).Google ScholarGoogle Scholar
  12. Hema S Koppula, Abhishek Anand, Thorsten Joachims, and Ashutosh Saxena. 2011. Semantic labeling of 3d point clouds for indoor scenes. In Advances in neural information processing systems. 244--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Felix Järemo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. Deep projective 3D semantic segmentation. In International Conference on Computer Analysis of Images and Patterns. Springer, 95--107.Google ScholarGoogle ScholarCross RefCross Ref
  14. Yangyan Li, Sören Pirk, Hao Su, Charles R Qi, and Leonidas J Guibas. 2016. FPNN: Field Probing Neural Networks for 3D Data. In Advances in Neural Information Processing Systems. 307--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.Google ScholarGoogle ScholarCross RefCross Ref
  16. Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1, 2 (2017), 4.Google ScholarGoogle Scholar
  17. Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems. 5105--5114.Google ScholarGoogle Scholar
  18. Andrew Rabinovich, Andrea Vedaldi, Carolina Galleguillos, Eric Wiewiora, and Serge Belongie. 2007. Objects in context. In Computer vision, 2007. ICCV 2007. IEEE 11th international conference on. IEEE, 1--8.Google ScholarGoogle Scholar
  19. Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. 2017. Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 3.Google ScholarGoogle ScholarCross RefCross Ref
  20. Baoguang Shi, Song Bai, Zhichao Zhou, and Xiang Bai. 2015. Deeppano: Deep panoramic representation for 3-d shape recognition. IEEE Signal Processing Letters 22, 12 (2015), 2339--2343.Google ScholarGoogle ScholarCross RefCross Ref
  21. Ayan Sinha, Jing Bai, and Karthik Ramani. 2016. Deep learning 3D shape surfaces using geometry images. In European Conference on Computer Vision. Springer, 223--240.Google ScholarGoogle ScholarCross RefCross Ref
  22. Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision. 945--953. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lyne P Tchapmi, Christopher B Choy, Iro Armeni, JunYoung Gwak, and Silvio Savarese. 2017. SEGCloud: Semantic Segmentation of 3D Point Clouds. arXiv preprint arXiv:1710.07563 (2017).Google ScholarGoogle Scholar
  24. Vibhav Vineet, Jonathan Warrell, and Philip HS Torr. 2014. Filter-based mean-field inference for random fields with higher-order terms and product label-spaces. International Journal of Computer Vision 110, 3 (2014), 290--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Daniel Wolf, Johann Prankl, and Markus Vincze. 2016. Enhancing semantic segmentation for robotics: the power of 3-D entangled forests. IEEE Robotics and Automation Letters 1, 1 (2016), 49--56.Google ScholarGoogle ScholarCross RefCross Ref
  26. Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.Google ScholarGoogle Scholar
  27. Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. 2015. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 1529--1537. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. End2End Semantic Segmentation for 3D Indoor Scenes

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '18: Proceedings of the 26th ACM international conference on Multimedia
      October 2018
      2167 pages
      ISBN:9781450356657
      DOI:10.1145/3240508

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 October 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader