Abstract
Advanced sensor technology has allowed us to acquire three-dimensional (3D) information from a scene using a low-cost RGB-D sensor such as Kinect. Although this sensor can recover the 3D structure of a scene, it cannot distinguish a target object from the background. In view of this, we incorporate an interactive 3D segmentation algorithm with a well-known Kinect scene reconstruction system, the KinectFusion, to effectively extract an object from the scene, and hence obtain a 3D point cloud of the object. With this system, a user can freely move the Kinect sensor to reconstruct the scene and then select the foreground/background seeds from the reconstructed point cloud. The system can take over the following tasks to complete the 3D reconstruction of the selected object. The advantage of this system is that users need not select the foreground/background seeds very carefully, which greatly reduces the operational complexity. Moreover, previous segmentation results are inherited to the next phase as new foreground/background seeds, which minimizes the required user intervention. With a simple seed selection, the point cloud of the selected object can be gradually recovered when a user moves the sensor to different viewpoints. Several experiments were conducted, and the results confirmed the effectiveness of the proposed system. The 3D structures of objects with complex shapes are well reconstructed by our system.
Similar content being viewed by others
References
Albitar, I., Graebling, P., Doignon, C.: Robust structured light coding for 3D reconstruction. In: 11th IEEE International Conference on Computer Vision, pp. 1–6 (2007)
Anwer, A., Ali, S.S.A., Mériaudeau, F.: Underwater online 3D mapping and scene reconstruction using low cost Kinect RGB-D sensor. In: 6th International Conference on Intelligent and Advanced Systems (ICIAS) (2016)
Bay, H., Ess, A., Tuytelaars, T., van Gool, L.: Speeded-up robust features (SURF). Comput. Vision Image Underst. 110(3), 346–359 (2008)
Boykov, Y., Kolomogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26, 1124–1137 (2004)
Bradley, D., Boubekeur, T., Heidrich, W.: Accurate multi-view reconstruction using robust binocular stereo and surface meshing. In: International Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Chen, C., Zou, W., Wang, J.: 3D surface reconstruction based on Kinect. In: 2013 IEEE Third International Conference on Information Science and Technology (ICIST), pp. 986–990 (2013)
Chen, K., Lai, Y.K., Wu, Y.X., Martin, R., Hu, S.M.: Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information. ACM Trans. Graph. 33(6), 208:1–208:12 (2014)
Clark, J.J.: Active photometric stereo. In: International Conference on Computer Vision and Pattern Recognition, pp. 29–34 (1992)
Cochran, S.D., Medioni, G.: 3D surface description from binocular stereo. IEEE Trans. Pattern Anal. Mach. Intell. 14(10), 981–994 (1992)
Dibra, E., Jain, H., Öztireli, A.C., Ziegler, R., Gross, M.H.: HS-Nets: Estimating human body shape from silhouettes with convolutional neural networks. In: Fourth International Conference on 3D Vision, 3DV 2016, Stanford, CA, USA, October 25-28, 2016, pp. 108–117 (2016). http://dx.doi.org/10.1109/3DV.2016.19
Du, H., Henry, P., Ren, X., Cheng, M., Goldman, D.B., Seitz, S.M., Fox, D.: Interactive 3D modeling of indoor environments with a consumer depth camera. In: 13th ACM International Conference on Ubiquitous Computing, pp. 75–84. Beijing, China (2011)
Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An evaluation of the RGB-D SLAM system. In: IEEE International Conference on Robotics and Automation (2012)
Engelhard, N., Endres, F., Hess, J., Sturm, J., Burgard, W.: Real-time 3D visual slam with a hand-held RGB-D camera. In: The RGB-D Workshop on 3D Perception in Robotics at the European Robotics Forum (2011)
Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: Dense 3D reconstruction in real-time. In: IEEE Intelligent Vehicles Symposium, pp. 963–968 (2011)
Geng, J.: Structured-light 3D surface imaging: a tutorial. Adv. Opt. Photonics 3, 128–160 (2011)
Han, J., Shao, L., Xu, D., Shotton, J.: Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans. Cybern. 43(5), 1318–1334 (2013)
Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: Using depth cameras for dense 3D modeling of indoor environments. In: 12th International Symposium on Experimental Robotics (ISER) (2010)
Higo, T., Matsushita, Y., Joshi, N., Ikeuchi, K.: A hand-held photometric stereo camera for 3D modeling. In: 12th IEEE International Conference on Computer Vision, pp. 1234–1241 (2009)
Horaud, R., Hansard, M., Evangelidis, G., Ménier, C.: An overview of depth cameras and range scanners based on time-of-flight technologies. Mach. Vis. Appl. 27, 1005–1020 (2016)
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559–568 (2011)
Li, Y., Sun, J., Tang, C.K., Shum, H.Y.: Lazy snapping. ACM Trans. Graph. 23(3), 303–308 (2004)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Meilland, M., Comport, A., Rives, P.: Real-time dense visual tracking under large lighting variations. In: British Machine Vision Conference, pp. 1–11 (2011)
Meilland, M., Comport, A., Rives, P.: Dense RGB-D mapping for real-time localisation and navigation. In: Workshop Navigation Positioning Mapping (2012)
Meister, S., Izadi, S., Kohli, P., Haemmerle, M., Rother, C., Kondermann, D.: When can we use KinectFusion for ground truth acquisition? In: Workshop Color-Depth Camera Fusion Robot (2012)
Morana, M.: 3D scene reconstruction using Kinect. In: Gaglio, S., Lo Re, G. (eds.) Advances onto the Internet of Things: How Ontologies Make the Internet of Things Meaningful, pp. 179–190. Springer, Cham (2014)
Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohli, P., Shotton, J., Hodges, S., Fitzgibbon, A.: KinectFusion: Real-time dense surface mapping and tracking. In: IEEE International Symposium on Mixed and Augmented Reality, pp. 127–136. Basel, Switzerland (2011)
Pan, R., Taubin, G.: Automatic segmentation of point clouds from multi-view reconstruction using graph-cut. Vis. Comput. 32(5), 601–609 (2016)
Pollefeys, M., Gool, L.V., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. Int. J. Comput. Vis. 59(3), 207–232 (2004)
Ribo, M., Brandner, M.: State of the art on vision-based structured light systems for 3D measurements. In: International Workshop on Robotic Sensors: Robotic and Sensor Environments 2005, pp. 2–6 (2005)
Roth, H., Vona, M.: Moving volume KinectFusion. In: British Machine Vision Conference, pp. 1–11 (2012)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: International Conference on Computer Vision, pp. 2564–2571 (2011)
Sahillioğlu, Y., Yemez, Y.: Coarse-to-fine surface reconstruction from silhouettes and range data using mesh deformation. Comput. Vis. Image Underst. 114, 334–348 (2010)
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. (SIGGRAPH 2006) 25(3), 835–846 (2006)
Song, T., Lyu, Z., Ding, X., Wan, Y.: 3D surface reconstruction based on kinect sensor. Int. J. Comput. Theory Eng. 5(3), 567–573 (2013)
Song, X., Zhong, F., Wang, Y., Qin, X.: Estimation of Kinect depth confidence through self-training. Vis. Comput. 30(6), 855–865 (2014)
Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: International Conference on Computer Vision (1998)
Valentin, J., Vineet, V., Cheng, M.M., Kim, D., Shotton, J., Kohli, P., NieBner, M., Criminisi, A., Izadi, S., Torr, P.: SemanticPaint: interactive 3D labeling and learning at your fingertips. ACM Trans. Graph. 34(5), 154 (2015)
van den Hengel, A., Dick, A., Thormählen, T., Ward, B., Torr, P.H.S.: VideoTrace: rapid interactive scene modeling from video. ACM Trans. Graph. 26(3), 86 (2007). doi:10.1145/1276377.1276485
Whelan, T., Kaess, M., Fallon, M., Johannsson, H., Leonard, J., McDonald, J.: Kintinuous: Spatially extended KinectFusion. Tech. rep., Computer Science and Artificial Intelligence Laboratory (2012). MIT-CSAIL-TR-2012-020
Wolberg, G., Zokai, S.: PhotoSketch: a photocentric urban 3D modeling system. The Visual Computer (2017). doi:10.1007/s00371-017-1365-x
Zhang, Z.: Iterative point matching for registration of free-form curves and surfaces. Int. J. Comput. Vis. 13(2), 119–152 (1994)
Zheng, J.Y.: Acquiring 3-D models from sequences of contours. IEEE Trans. Pattern Anal. Mach. Intell. 16(2), 163–178 (1994)
Acknowledgements
This work was supported by the Ministry of Science and Technology, Taiwan, under Grant Nos. NSC 102-2221-E-155-075 and MOST 105-2218-E-155-010.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Teng, CH., Chuo, KY. & Hsieh, CY. Reconstructing three-dimensional models of objects using a Kinect sensor. Vis Comput 34, 1507–1523 (2018). https://doi.org/10.1007/s00371-017-1425-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-017-1425-2