research-article

End2End Semantic Segmentation for 3D Indoor Scenes

Author:
Na Zhao

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

MM '18: Proceedings of the 26th ACM international conference on MultimediaOctober 2018Pages 810–814https://doi.org/10.1145/3240508.3243933

Published:15 October 2018Publication History

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 810–814

ABSTRACT

This research is concerned with semantic segmentation of 3D point clouds arising from videos of 3D indoor scenes. It is an important building block of 3D scene understanding and has promising applications such as augmented reality and robotics. Although various deep learning based approaches have been proposed to replicate the success of 2D semantic segmentation in 3D domain, they either result in severe information loss or fail to model the geometric structures well. In this paper, we aim to model the local and global geometric structures of 3D scenes by designing an end-to-end 3D semantic segmentation framework. It captures the local geometries from point-level feature learning and voxel-level aggregation, models the global structures via 3D CNN, and enforces label consistency with high-order CRF. Through preliminary experiments conducted on two indoor datasets, we describe our insights on the proposed approach, and present some directions to be pursued in the future.

References

Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1534--1543.Google ScholarCross Ref
Anurag Arnab, Sadeep Jayasumana, Shuai Zheng, and Philip HS Torr. 2016. Higher order conditional random fields in deep neural networks. In European Conference on Computer Vision. Springer, 524--540.Google ScholarCross Ref
Leila Besharati Tabrizi and Mehran Mahvash. 2015. Augmented reality--guided neurosurgery: accuracy and intraoperative application of an image projection technique. Journal of neurosurgery 123, 1 (2015), 206--211.Google ScholarCross Ref
Alexandre Boulch, Joris Guerry, Bertrand Le Saux, and Nicolas Audebert. 2017. SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks. Computers & Graphics (2017).Google Scholar
Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236 (2016).Google Scholar
Nesrine Chehata, Li Guo, and Clément Mallet. 2009. Airborne lidar feature selection for urban classification using random forests. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 38, Part 3 (2009), W8.Google Scholar
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3213--3223.Google ScholarCross Ref
Gabriela Csurka and Florent Perronnin. 2008. A Simple High Performance Approach to Semantic Segmentation.. In BMVC. 1--10.Google Scholar
Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Vol. 1.Google ScholarCross Ref
Francis Engelmann, Theodora Kontogianni, Alexander Hermans, and Bastian Leibe. 2017. Exploring spatial context for 3d semantic segmentation of point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716--724.Google ScholarCross Ref
Evangelos Kalogerakis, Melinos Averkiou, Subhransu Maji, and Siddhartha Chaudhuri. 2016. 3D Shape Segmentation with Projective Convolutional Networks. arXiv preprint arXiv:1612.02808 (2016).Google Scholar
Hema S Koppula, Abhishek Anand, Thorsten Joachims, and Ashutosh Saxena. 2011. Semantic labeling of 3d point clouds for indoor scenes. In Advances in neural information processing systems. 244--252. Google ScholarDigital Library
Felix Järemo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. Deep projective 3D semantic segmentation. In International Conference on Computer Analysis of Images and Patterns. Springer, 95--107.Google ScholarCross Ref
Yangyan Li, Sören Pirk, Hao Su, Charles R Qi, and Leonidas J Guibas. 2016. FPNN: Field Probing Neural Networks for 3D Data. In Advances in Neural Information Processing Systems. 307--315. Google ScholarDigital Library
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.Google ScholarCross Ref
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1, 2 (2017), 4.Google Scholar
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems. 5105--5114.Google Scholar
Andrew Rabinovich, Andrea Vedaldi, Carolina Galleguillos, Eric Wiewiora, and Serge Belongie. 2007. Objects in context. In Computer vision, 2007. ICCV 2007. IEEE 11th international conference on. IEEE, 1--8.Google Scholar
Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. 2017. Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 3.Google ScholarCross Ref
Baoguang Shi, Song Bai, Zhichao Zhou, and Xiang Bai. 2015. Deeppano: Deep panoramic representation for 3-d shape recognition. IEEE Signal Processing Letters 22, 12 (2015), 2339--2343.Google ScholarCross Ref
Ayan Sinha, Jing Bai, and Karthik Ramani. 2016. Deep learning 3D shape surfaces using geometry images. In European Conference on Computer Vision. Springer, 223--240.Google ScholarCross Ref
Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision. 945--953. Google ScholarDigital Library
Lyne P Tchapmi, Christopher B Choy, Iro Armeni, JunYoung Gwak, and Silvio Savarese. 2017. SEGCloud: Semantic Segmentation of 3D Point Clouds. arXiv preprint arXiv:1710.07563 (2017).Google Scholar
Vibhav Vineet, Jonathan Warrell, and Philip HS Torr. 2014. Filter-based mean-field inference for random fields with higher-order terms and product label-spaces. International Journal of Computer Vision 110, 3 (2014), 290--307. Google ScholarDigital Library
Daniel Wolf, Johann Prankl, and Markus Vincze. 2016. Enhancing semantic segmentation for robotics: the power of 3-D entangled forests. IEEE Robotics and Automation Letters 1, 1 (2016), 49--56.Google ScholarCross Ref
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.Google Scholar
Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. 2015. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 1529--1537. Google ScholarDigital Library

Index Terms

End2End Semantic Segmentation for 3D Indoor Scenes
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Monocular SLAM System in Dynamic Scenes Based on Semantic Segmentation
Image and Graphics
Abstract
The traditional feature-based visual SLAM algorithm is based on the static environment assumption when recovering scene information and camera motion. The dynamic objects in the scene will affect the positioning accuracy. In this paper, we propose ...
Read More
Classifier aided training for semantic segmentation
Abstract
Semantic segmentation is a prominent problem in scene understanding expressed as a dense labeling task with deep learning models being one of the main methods to solve it. Traditional training algorithms for semantic segmentation ...
Highlights
- Developed a classifier aided training algorithm for segmentation models.
- ...
Read More
Semantic Segmentation of Street Scenes Using Disparity Information
Image and Graphics
Abstract
In this work, we address the task of semantic segmentation in street scenes. Recent approaches based on convolutional neural networks have shown excellent results on several semantic segmentation benchmarks. Most of them, however, only exploit RGB ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3d point cloud
deep learning
semantic segmentation
Qualifiers
- research-article
Conference

Acceptance Rates
MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 336
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

End2End Semantic Segmentation for 3D Indoor Scenes

MM '18: Proceedings of the 26th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Monocular SLAM System in Dynamic Scenes Based on Semantic Segmentation

Classifier aided training for semantic segmentation

Semantic Segmentation of Street Scenes Using Disparity Information