Abstract
We present a fully convolutional neural network that jointly predicts a semantic 3D reconstruction of a scene as well as a corresponding octree representation. This approach leverages the efficiency of an octree data structure to improve the capacities of volumetric semantic 3D reconstruction methods, especially in terms of scalability. At every octree level, the network predicts a semantic class for every voxel and decides which voxels should be further split in order to refine the reconstruction, thus working in a coarse-to-fine manner. The semantic prediction part of our method builds on recent work that combines traditional variational optimization and neural networks. In contrast to previous networks that work on dense voxel grids, our network is much more efficient in terms of memory consumption and inference efficiency, while achieving similar reconstruction performance. This allows for a high resolution reconstruction in case of limited memory. We perform experiments on the SUNCG and ScanNetv2 datasets on which our network shows comparable reconstruction results to the corresponding dense network while consuming less memory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bresson, X., Esedoḡlu, S., Vandergheynst, P., Thiran, J.P., Osher, S.: Fastglobal minimization of the active contour/snake model. J. Math. Imaging Vis. 28(2), 151–167 (2007)
Chan, T., Esedoḡlu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1362–1648 (2006)
Chen, L.C., et al.: Searching for efficient multi-scale architectures for dense image prediction. In: Proceedings of Neural Information Processing Systems (NIPS) (2018)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Cherabier, I., Schönberger, J.L., Oswald, M.R., Pollefeys, M., Geiger, A.: Learning priors for semantic 3D reconstruction. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 325–341. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_20
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Dai, A., Nießner, M.: 3DMV: joint 3D-multi-view prediction for 3D semantic scene segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 458–474. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_28
Gargantini, I.: Linear octree for fast processing of three-dimensional objects. Comput. Graph. Image Process. 20 (1982)
Häne, C., Zach, C., Cohen, A., Angst, R., Pollefeys, M.: Joint 3D scene reconstruction and class segmentation. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 97–104 (2013). https://doi.org/10.1109/CVPR.2013.20
Häne, C., Zach, C., Cohen, A., Pollefeys, M.: Dense semantic 3D reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1730–1743 (2017). https://doi.org/10.1109/TPAMI.2016.2613051
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of International Conference on Computer Vision (ICCV) (2017)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Pock, T., Chambolle, A.: Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In: International Conference on Computer Vision (ICCV) (2011)
Riegler, G., Ulusoy, A.O., Bischof, H., Geiger, A.: OctNetFusion: learning depth fusion from data. In: International Conference on 3D Vision (3DV) (2017)
Riegler, G., Ulusoy, A.O., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.A.: Semantic scene completion from a single depth image. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: Efficient convolutional architectures for high-resolution 3D outputs. In: Proceedings of International Conference on Computer Vision (ICCV) (2017). http://lmb.informatik.uni-freiburg.de/Publications/2017/TDB17b
Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: octree-based Convolutional neural networks for 3D shape analysis. ACM Trans. Graph. (SIGGRAPH) 36(4), 72 (2017)
Wang, P.S., Sun, C.Y., Liu, Y., Tong, X.: Adaptive O-CNN: a patch-based deep representation of 3D shapes. ACM Transactions on Graphics (SIGGRAPH Asia), vol. 37, no. 6 (2018)
Acknowledgements
This research was partially supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DOI/IBC) contract number D17PC00280. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, X., Oswald, M.R., Cherabier, I., Pollefeys, M. (2019). Learning 3D Semantic Reconstruction on Octrees. In: Fink, G., Frintrop, S., Jiang, X. (eds) Pattern Recognition. DAGM GCPR 2019. Lecture Notes in Computer Science(), vol 11824. Springer, Cham. https://doi.org/10.1007/978-3-030-33676-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-33676-9_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33675-2
Online ISBN: 978-3-030-33676-9
eBook Packages: Computer ScienceComputer Science (R0)