Reconstructing non-rigid object with large movement using a single depth camera☆
Introduction
3D reconstruction of real-world scenes from depth cameras is a widely studied problem in the fields of computer vision and computer graphics. After long-term efforts, the 3D model of a scene can be now accurately built by fusing its depth maps captured in multiple views, as long as the scene is static (e.g. KinectFusion (Newcombe et al., 2011, Izadi et al., 2011)). However, reconstructing non-rigid scenes with a single depth camera is still largely unsolved due to a number of challenges, such as non-rigid deformation, incomplete scans, and large movement which might cause the inconsistency of topological structures of the scene.
In recent years, the challenges of handling non-rigid deformation and incomplete scans have been well studied and addressed by various previous works (Sumner et al., 2007, Xu et al., 2007, Li et al., 2009, Liao et al., 2009, Zhou et al., 2010, Oikonomidis et al., 2011, Taylor et al., 2012, Li et al., 2013, Yang et al., 2013, Zollhöfer et al., 2014, Dou et al., 2015, Zhang et al., 2015a, Yang et al., 2015, Dou et al., 2016). However, these methods rely on strong priors based on pre-designed templates, user direct manipulation, multiple depth sensors, or pre-learned statistical models. Moreover, some techniques need seconds to minutes to compute a single frame which is a waste of time for reconstruction. Newcombe et al. (2015) proposed the first system for densely reconstructing general dynamic scenes, which can generate high-quality results from a single camera in real-time. Although significant successes were made by these approaches, however, most of them do not explicitly consider the challenging problem of large movement (i.e. potential topological change) of the input scene, which frequently happens for non-rigid objects. As illustrated in Fig. 1, with the arms of the person stretching away from the body, the topology of the person becomes inconsistent. In this case, the previous methods using fixed topological structure cannot reconstruct the input scene consistently. Slavcheva et al. (2017) have made the attempt towards this problem by a level-set evolution approach. However, the reconstruction without correspondences makes the results somewhat shaking in appearance and loose essential details.
To address this problem, this paper presents a novel approach to reconstructing non-rigid scenes with large movement from a single depth camera. As summarized in Fig. 2, the proposed approach takes the depth sequence captured by Kinect v2.0 sensor as input, and incrementally fuses the depth maps to generate a canonical model that can best fit the scene on each frame under certain deformations. To this end, we propose a novel adaptive strategy to identify the most fine-grained scene topology as the canonical model by analyzing the topological structure. Given the canonical model, we then deform it to each depth map constrained by robust inter-frame correspondences established from object contours and scene flows. Finally, we fuse the depth maps onto the deformed canonical models through a novel scheme that can adaptively select the appropriate interval of frames for fusion, which can generate high-quality reconstruction results without over-smoothing model details. Experimental results demonstrate that our approach can effectively handle various input scenes with topological structure change due to large movement.
The contributions of this paper are summarized as follows: 1) we present a novel approach via identifying the canonical frame to reconstruct the non-rigid scenes with large movement; 2) we efficiently deform the canonical model to fit each depth map using contour and scene flow cues; 3) we propose an adaptive fusion algorithm which can largely suppress the noise during fusion and preserve the model details.
Section snippets
Related work
There were various previous works on 3D scene reconstruction based on consumer-level depth cameras. While a large group of them focused on static scenes (Newcombe et al., 2011, Izadi et al., 2011, Roth and Vona, 2012, Whelan et al., 2012; Shao et al., 2012, Lin et al., 2013, Steinbrucker et al., 2013, Chen et al., 2013, Nießner et al., 2013, Kahler et al., 2015, Zhang et al., 2015b), this section mainly reviews recent advances on non-rigid scene reconstruction that are tightly correlated with
Overview
We aim to reconstruct the non-rigid dynamic object in real-world scene using a single depth camera, where the object movement is large and topologies may change significantly in the depth video. For example, as shown in the first row of Fig. 3(a, b), the person's hands and head touch with each other at the first time, and then gradually separate in the next a few frames. This kind of large movement often happens in our daily life, while most state-of-the-art 3D reconstruction methods, such as
Method
In this section, we will describe the components of our approach of reconstructing the non-rigid object with large movement. First, we introduce how to efficiently identify the most fine-grained scene topology as the canonical model (Fig. 4). Second, we deform the canonical model to the depth map, constrained by object contour and scene flows. Finally, we present a novel fusion strategy yielding compelling detailed models.
Experimental settings
We implemented our method on a 64-bit desktop machine with a 12-core 3.6 GHz Intel Xeon CPU, 64GB of memory and a Nvidia TITAN X graphics card. We use a single depth camera (e.g. Microsoft Kinect v2.0) to capture the depth sequence. At each time step, a depth map recorded at resolution. To evaluate the proposed approach, we have captured several scenes with different actors behaving in real-life scenarios: opening hands, opening arms, pushing the pillow, stretching hands, playing plush
Conclusion
In this paper, we present a novel approach to achieve non-rigid reconstruction with large movement. In contrast to previous methods, we identify the most fine-grained scene topology as the canonical model, then preform “model-to-frame” deformation and adaptive fusion. Comparisons on several challenging real-world examples suggest that the proposed approach achieves smooth results with less noise. Furthermore, we are interested to develop efficient algorithms to jointly reconstruct the sequence
Acknowledgements
We would like to thank the anonymous reviewers for their valuable comments and suggestions. We are also grateful to Tao Yu for running BodyFusion (Yu et al., 2017) on our data and helpful discussions. This work was supported by National Natural Science Foundation of China (Grant No. 61502023 & U1736217).
References (40)
- et al.
Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera
ACM Trans. Graph. (TOG)
(2017) - et al.
Robust Single-View Geometry and Motion Reconstruction
(2009) - et al.
3d self-portraits
ACM Trans. Graph. (TOG)
(2013) - et al.
An efficient volumetric method for non-rigid registration
Graph. Models
(2015) - et al.
3d shape regression for real-time facial animation
ACM Trans. Graph. (TOG)
(2013) - et al.
Scalable real-time volumetric surface reconstruction
ACM Trans. Graph. (TOG)
(2013) - et al.
A volumetric method for building complex models from range images
New Methods for Surface Reconstruction from Range Images
(1997)- et al.
Scanning and tracking dynamic objects with commodity depth cameras
- et al.
Fusion4D: real-time performance capture of challenging scenes
ACM Trans. Graph. (TOG)
(2016)
3d scanning deformable objects with a single rgbd sensor
Real-time volumetric non-rigid reconstruction
KinectFusion: real-time 3d reconstruction and interaction using a moving depth camera
A primal-dual framework for real-time dense RGB-D scene flow
Very high frame rate volumetric integration of depth images on mobile devices
IEEE Trans. Vis. Comput. Graph.
Modeling deformable objects from a single depth camera
Semantic decomposition and reconstruction of residential scenes from Lidar data
ACM Trans. Graph. (TOG)
Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time
KinectFusion: real-time dense surface mapping and tracking
Real-time 3d reconstruction at scale using voxel hashing
ACM Trans. Graph. (TOG)
Cited by (4)
Real-time 3D reconstruction techniques applied in dynamic scenes: A systematic literature review
2021, Computer Science ReviewCitation Excerpt :Most popular method is surface matching method and are used in state-of-the-art methods like [22,25,27,36,37] where the current frame is aligned to the previous based on point-to-point [38], point-to-plain [39] error matrices or the combination of both is also common [16,17]. Some of the 3D reconstruction algorithms like [19] have used frame-to-point [24] and tsdf-to-tsdf alignment technique, whereas in [12] convex hull constrain is adopted. Instead of using color information as photometric correspondence, [38] used reflection constrain and implemented a novel method in which the photometric information along with surface geometry is used as the correspondence to estimate inter-frame motion, thus increasing the accuracy in estimating correspondence.
A three-dimensional reconstruction algorithm for extracting parameters of the banana pseudo-stem
2019, OptikCitation Excerpt :Compared to laser sensors, depth cameras can obtain images containing crop depth information. Measurement or positioning of the crop stems could be achieved by processing of these depth images [15–17]. Plant spacing sensing systems have been developed by Nakarmi et al. for identification of the corn stem positions by processing depth images [18].
Banana Pseudostem Width Detection Based on Kinect V2 Depth Sensor
2022, Computational Intelligence and NeuroscienceDeep Learning for Digital Geometry Processing and Analysis: A Review
2019, Jisuanji Yanjiu yu Fazhan/Computer Research and Development
- ☆
This paper has been recommended for acceptance by Ligang Liu.