ABSTRACT
This research is concerned with semantic segmentation of 3D point clouds arising from videos of 3D indoor scenes. It is an important building block of 3D scene understanding and has promising applications such as augmented reality and robotics. Although various deep learning based approaches have been proposed to replicate the success of 2D semantic segmentation in 3D domain, they either result in severe information loss or fail to model the geometric structures well. In this paper, we aim to model the local and global geometric structures of 3D scenes by designing an end-to-end 3D semantic segmentation framework. It captures the local geometries from point-level feature learning and voxel-level aggregation, models the global structures via 3D CNN, and enforces label consistency with high-order CRF. Through preliminary experiments conducted on two indoor datasets, we describe our insights on the proposed approach, and present some directions to be pursued in the future.
- Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1534--1543.Google ScholarCross Ref
- Anurag Arnab, Sadeep Jayasumana, Shuai Zheng, and Philip HS Torr. 2016. Higher order conditional random fields in deep neural networks. In European Conference on Computer Vision. Springer, 524--540.Google ScholarCross Ref
- Leila Besharati Tabrizi and Mehran Mahvash. 2015. Augmented reality--guided neurosurgery: accuracy and intraoperative application of an image projection technique. Journal of neurosurgery 123, 1 (2015), 206--211.Google ScholarCross Ref
- Alexandre Boulch, Joris Guerry, Bertrand Le Saux, and Nicolas Audebert. 2017. SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks. Computers & Graphics (2017).Google Scholar
- Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236 (2016).Google Scholar
- Nesrine Chehata, Li Guo, and Clément Mallet. 2009. Airborne lidar feature selection for urban classification using random forests. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 38, Part 3 (2009), W8.Google Scholar
- Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3213--3223.Google ScholarCross Ref
- Gabriela Csurka and Florent Perronnin. 2008. A Simple High Performance Approach to Semantic Segmentation.. In BMVC. 1--10.Google Scholar
- Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Vol. 1.Google ScholarCross Ref
- Francis Engelmann, Theodora Kontogianni, Alexander Hermans, and Bastian Leibe. 2017. Exploring spatial context for 3d semantic segmentation of point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716--724.Google ScholarCross Ref
- Evangelos Kalogerakis, Melinos Averkiou, Subhransu Maji, and Siddhartha Chaudhuri. 2016. 3D Shape Segmentation with Projective Convolutional Networks. arXiv preprint arXiv:1612.02808 (2016).Google Scholar
- Hema S Koppula, Abhishek Anand, Thorsten Joachims, and Ashutosh Saxena. 2011. Semantic labeling of 3d point clouds for indoor scenes. In Advances in neural information processing systems. 244--252. Google ScholarDigital Library
- Felix Järemo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. Deep projective 3D semantic segmentation. In International Conference on Computer Analysis of Images and Patterns. Springer, 95--107.Google ScholarCross Ref
- Yangyan Li, Sören Pirk, Hao Su, Charles R Qi, and Leonidas J Guibas. 2016. FPNN: Field Probing Neural Networks for 3D Data. In Advances in Neural Information Processing Systems. 307--315. Google ScholarDigital Library
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.Google ScholarCross Ref
- Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1, 2 (2017), 4.Google Scholar
- Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems. 5105--5114.Google Scholar
- Andrew Rabinovich, Andrea Vedaldi, Carolina Galleguillos, Eric Wiewiora, and Serge Belongie. 2007. Objects in context. In Computer vision, 2007. ICCV 2007. IEEE 11th international conference on. IEEE, 1--8.Google Scholar
- Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. 2017. Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 3.Google ScholarCross Ref
- Baoguang Shi, Song Bai, Zhichao Zhou, and Xiang Bai. 2015. Deeppano: Deep panoramic representation for 3-d shape recognition. IEEE Signal Processing Letters 22, 12 (2015), 2339--2343.Google ScholarCross Ref
- Ayan Sinha, Jing Bai, and Karthik Ramani. 2016. Deep learning 3D shape surfaces using geometry images. In European Conference on Computer Vision. Springer, 223--240.Google ScholarCross Ref
- Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision. 945--953. Google ScholarDigital Library
- Lyne P Tchapmi, Christopher B Choy, Iro Armeni, JunYoung Gwak, and Silvio Savarese. 2017. SEGCloud: Semantic Segmentation of 3D Point Clouds. arXiv preprint arXiv:1710.07563 (2017).Google Scholar
- Vibhav Vineet, Jonathan Warrell, and Philip HS Torr. 2014. Filter-based mean-field inference for random fields with higher-order terms and product label-spaces. International Journal of Computer Vision 110, 3 (2014), 290--307. Google ScholarDigital Library
- Daniel Wolf, Johann Prankl, and Markus Vincze. 2016. Enhancing semantic segmentation for robotics: the power of 3-D entangled forests. IEEE Robotics and Automation Letters 1, 1 (2016), 49--56.Google ScholarCross Ref
- Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.Google Scholar
- Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. 2015. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 1529--1537. Google ScholarDigital Library
Index Terms
- End2End Semantic Segmentation for 3D Indoor Scenes
Recommendations
Monocular SLAM System in Dynamic Scenes Based on Semantic Segmentation
Image and GraphicsAbstractThe traditional feature-based visual SLAM algorithm is based on the static environment assumption when recovering scene information and camera motion. The dynamic objects in the scene will affect the positioning accuracy. In this paper, we propose ...
Classifier aided training for semantic segmentation
AbstractSemantic segmentation is a prominent problem in scene understanding expressed as a dense labeling task with deep learning models being one of the main methods to solve it. Traditional training algorithms for semantic segmentation ...
Highlights- Developed a classifier aided training algorithm for segmentation models.
- ...
Semantic Segmentation of Street Scenes Using Disparity Information
Image and GraphicsAbstractIn this work, we address the task of semantic segmentation in street scenes. Recent approaches based on convolutional neural networks have shown excellent results on several semantic segmentation benchmarks. Most of them, however, only exploit RGB ...
Comments