Abstract
When constructing a dense 3D model of an indoor static scene from a sequence of RGB-D images, the choice of the 3D representation (e.g. 3D mesh, cloud of points or implicit function) is of crucial importance. In the last few years, the volumetric truncated signed distance function (TSDF) and its extensions have become popular in the community and largely used for the task of dense 3D modelling using RGB-D sensors. However, as this representation is voxel based, it offers few possibilities for manipulating and/or editing the constructed 3D model, which limits its applicability. In particular, the amount of data required to maintain the volumetric TSDF rapidly becomes huge which limits possibilities for portability. Moreover, simplifications (such as mesh extraction and surface simplification) significantly reduce the accuracy of the 3D model (especially in the color space), and editing the 3D model is difficult. We propose a novel compact, flexible and accurate 3D surface representation based on parametric surface patches augmented by geometric and color texture images. Simple parametric shapes such as planes are roughly fitted to the input depth images, and the deviations of the 3D measurements to the fitted parametric surfaces are fused into a geometric texture image (called the Bump image). A confidence and color texture image are also built. Our 3D scene representation is accurate yet memory efficient. Moreover, updating or editing the 3D model becomes trivial since it is reduced to manipulating 2D images. Our experimental results demonstrate the advantages of our proposed 3D representation through a concrete indoor scene reconstruction application.





























Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The notation \(\llbracket a,b \rrbracket \) denotes the integer interval between a and b.
Note that in our implementation the plane detection is run every 20 frames.
For information, Infinitam was reported to run above 20 fps.
The GPU memory usage at run-time depends only on the complexity of the scene (i.e., number and size of planar patches in the current visual frustrum).
Memory usage at run-time was less than 150 MB on the GPU and less than 35 MB on the CPU.
At run-time, the GPU memory usage never exceeded 300 MB, while the number of visible planar patches never exceeded 28 planes.
The memory usage at run-time with data Library never exceeded 447 MB in the GPU and 2180 MB in the CPU. The memory usage at run-time with data Library-2 never exceeded 311 MB in the GPU and 881 MB in the CPU.
(1) The tangent vector is made orthogonal to the normal vector and normalised and (2) the bitangent vector is made orthogonal to both the normal and tangent vectors and then normalised.
References
Anasosalu, P.K., Thomas, D., & Sugimoto, A. (2013). Compact and accurate 3-D face modeling using an RGB-D camera: Let s open the door to 3-D video conference. In: Proc. of CDC4CV.
Besl, P. J., & McKay, N. D. (1992). A method for registration of 3-D shapes. IEEE Transactions on PAMI, 14(2), 239–256.
Blanz, V., & Vetter, T. (2003). Face recognition based on fitting a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9), 1063–1074.
Canny, J. (1986). A Computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 679–698.
Chen, J., Bautembach, D., & Izadi, S. (2013). Scalable real-time volumetric surface reconstruction. ACM Transactions on Graphics, 32(4), 113:1–113:8.
Davison, A., Reid, I., Molton, N., & Stasse, O. (2007). Monoslam: Real-time single camera slam. IEEE Transactions on PAMI, 29, 1052–1067.
Henry, P., Fox, D., Bhowmik, A., & Mongia, R. (2013). Patch volumes: Segmentation-based consistent mapping with RGB-D cameras. In: Proceedings of 3DV’13.
Henry, P., Krainin, M., Herbst, E., Ren, X., & Fox, D. (2012). RGB-D mapping: Using Kinect-style depth cameras for dense 3D modelling of indoor environments. International Journal of Robotics Research, 31(5), 647–663.
Hernandez, M., Choi, J., & Medioni, G. (2012). Laser scan quality 3-D face modeling using a low-cost depth camera. In: Proceedings of the 20th European signal processing conference (EUSIPCO), pp. 1995–1999.
Jaeggli, T., Konenckx, T., & Gool, L. (2003). Online 3D acquisition and model integration. In: Proceedings of Procam’03.
Kahler, O., Prisacariu, V. A., Ren, C. Y., Sun, X., Torr, P. H. S., & Murray, D. W. (2015). Very high frame rate volumetric integration of depth images on mobile device. In: IEEE Transactions on Visualization and Computer Graphics (proceedings international symposium on mixed and augmented reality).
Kazhdan, M., Bolitho, M., & Hoppe, H. (2006). Poisson surface reconstruction. In: Proceedings of Eurographics symposium on geometry.
Lengyel, E. (2001). Computing tangent space basis vectors for an arbitrary mesh. In: Terathon Software 3D Graphics Library.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In: Proceedings of ICCV, pp. 1150–1157.
Neibner, M., Zollhofer, M., Izadi, S., & Stamminger, M. (2013). Real-time 3D reconstruction at scale using voxel hashing. ACM Transactions on Graphics, 32(6), 169:1–169:11.
Newcombe, R., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A., Kohli, P., Shotton, J., Hodges, S., & Fitzgibbon, A. (2011). Kinectfusion: Real-time dense surface mapping and tracking. In: Proceedings of ISMAR’11, pp. 127–136.
Nguyen, C., Izadi, S., & Lovell, D. (2012). Modeling kinect sensor noise for improved 3D reconstruction and tracking. In: Proceedings of 3DIM/PVT’12, pp. 524–530.
Pfister, H., Zwicker, M., Baar, J., & Gross, M. (2000). Surfels: Surface elements as rendering primitives. In: ACM Transactions on Graphics (Proceedings of SIGGRAPH’00).
Roth, H., & Vona, M. (2012). Moving volume kinectfusion. In: Proceedings of BMVC.
Segal, A., Haehnel, D., & Thrun, S. (2009). Generalized-ICP. In: Robotics: Science and systems.
Steinbrucker, F., Kerl, C., Sturm, J., & Cremers, D. (2013). Large-scale multi-resolution surface reconstruction from RGB-D sequences. In: Proceedings of international conference on computer vision (ICCV 13).
Thomas, D., & Sugimoto, A. (2013). A flexible scene representation for 3D reconstruction using an RGB-D camera. In: Proceedings of ICCV.
Thomas, D., & Sugimoto, A. (2014). A two-stage strategy for real-time dense 3D reconstruction of large-scale scenes. In: Proceedings of ECCV workshops’14 (CDC4CV).
Weise, T., Wismer, T., Leibe, B., & Gool, L. (2009). In-hand scanning with online loop closure. Proceedings of ICCV Workshops’09, pp. 1630–1637.
Whelan, T., McDonald, J., Kaess, M., Fallon, M., Johansson, H., & Leonard, J. (2012). Kintinuous: Spatially extended kinectfusion. Proceedings of RSS Workshop on RGB-D: Advanced reasoning with depth camera.
Zeng, M., Zhao, F., Zheng, J., & Liu, X. (2013). Octree-based fusion for realtime 3D reconstruction. Transaction of Graphical Models, 75(3), 126–136.
Zhou, Q.-Y., & Koltun, V. (2013). Dense scene reconstruction with points of interest. ACM Transaction on Graphics, 32(4), 112:1–112:8.
Zhou, Q.-Y., Miller, S., & Koltun, V. (2013). Elastic fragments for dense scene reconstruction. In: Proceedings of ICCV.
Acknowledgements
This work is in part supported by Grant-in-Aid for Scientific Research of the Ministry of Education, Culture, Sports, Science and Technology of Japan.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by S.-C. Zhu.
Rights and permissions
About this article
Cite this article
Thomas, D., Sugimoto, A. Parametric Surface Representation with Bump Image for Dense 3D Modeling Using an RBG-D Camera. Int J Comput Vis 123, 206–225 (2017). https://doi.org/10.1007/s11263-016-0969-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-016-0969-3