Reconstructing non-rigid object with large movement using a single depth camera

doi:10.1016/j.cagd.2018.06.002

Computer Aided Geometric Design

Volume 64, August 2018, Pages 15-26

https://doi.org/10.1016/j.cagd.2018.06.002 Get rights and content

Highlights

•
A novel approach via identifying the canonical frame to reconstruct the non-rigid scenes with large movement.
•
Efficiently deforming the canonical model to fit each depth map using contour and scene flow cues.
•
An adaptive fusion algorithm which can largely suppress the noise during fusion and preserve the model details.

Abstract

Non-rigid detailed 3D reconstruction of real world scenes has witnessed great success in recent years. However, most existing methods take the first frame as canonical model and the topological structure of the input scenes are fixed during the reconstruction process, which is an assumption that may not hold in practice for highly non-rigid scenes. Regarding this issue, this work proposes a novel approach to reconstruct non-rigid object with large movement which often results in topological structure change. In this paper, we firstly introduce an adaptive strategy that can effectively identify the most fine-grained scene topology as the canonical model. Such model is then deformed to each depth map, constrained by robust inter-frame correspondences established from object contour and scene flows. After deformation, we further fuse the depth map to the canonical model via a novel adaptive selection scheme, so as to remove spurious noise without smoothing model details. Experimental results show that the proposed approach can effectively handle various input scenes with large movement and generate models with high-fidelity details.

Introduction

3D reconstruction of real-world scenes from depth cameras is a widely studied problem in the fields of computer vision and computer graphics. After long-term efforts, the 3D model of a scene can be now accurately built by fusing its depth maps captured in multiple views, as long as the scene is static (e.g. KinectFusion (Newcombe et al., 2011, Izadi et al., 2011)). However, reconstructing non-rigid scenes with a single depth camera is still largely unsolved due to a number of challenges, such as non-rigid deformation, incomplete scans, and large movement which might cause the inconsistency of topological structures of the scene.

In recent years, the challenges of handling non-rigid deformation and incomplete scans have been well studied and addressed by various previous works (Sumner et al., 2007, Xu et al., 2007, Li et al., 2009, Liao et al., 2009, Zhou et al., 2010, Oikonomidis et al., 2011, Taylor et al., 2012, Li et al., 2013, Yang et al., 2013, Zollhöfer et al., 2014, Dou et al., 2015, Zhang et al., 2015a, Yang et al., 2015, Dou et al., 2016). However, these methods rely on strong priors based on pre-designed templates, user direct manipulation, multiple depth sensors, or pre-learned statistical models. Moreover, some techniques need seconds to minutes to compute a single frame which is a waste of time for reconstruction. Newcombe et al. (2015) proposed the first system for densely reconstructing general dynamic scenes, which can generate high-quality results from a single camera in real-time. Although significant successes were made by these approaches, however, most of them do not explicitly consider the challenging problem of large movement (i.e. potential topological change) of the input scene, which frequently happens for non-rigid objects. As illustrated in Fig. 1, with the arms of the person stretching away from the body, the topology of the person becomes inconsistent. In this case, the previous methods using fixed topological structure cannot reconstruct the input scene consistently. Slavcheva et al. (2017) have made the attempt towards this problem by a level-set evolution approach. However, the reconstruction without correspondences makes the results somewhat shaking in appearance and loose essential details.

To address this problem, this paper presents a novel approach to reconstructing non-rigid scenes with large movement from a single depth camera. As summarized in Fig. 2, the proposed approach takes the depth sequence captured by Kinect v2.0 sensor as input, and incrementally fuses the depth maps to generate a canonical model that can best fit the scene on each frame under certain deformations. To this end, we propose a novel adaptive strategy to identify the most fine-grained scene topology as the canonical model by analyzing the topological structure. Given the canonical model, we then deform it to each depth map constrained by robust inter-frame correspondences established from object contours and scene flows. Finally, we fuse the depth maps onto the deformed canonical models through a novel scheme that can adaptively select the appropriate interval of frames for fusion, which can generate high-quality reconstruction results without over-smoothing model details. Experimental results demonstrate that our approach can effectively handle various input scenes with topological structure change due to large movement.

The contributions of this paper are summarized as follows: 1) we present a novel approach via identifying the canonical frame to reconstruct the non-rigid scenes with large movement; 2) we efficiently deform the canonical model to fit each depth map using contour and scene flow cues; 3) we propose an adaptive fusion algorithm which can largely suppress the noise during fusion and preserve the model details.

Section snippets

Related work

There were various previous works on 3D scene reconstruction based on consumer-level depth cameras. While a large group of them focused on static scenes (Newcombe et al., 2011, Izadi et al., 2011, Roth and Vona, 2012, Whelan et al., 2012; Shao et al., 2012, Lin et al., 2013, Steinbrucker et al., 2013, Chen et al., 2013, Nießner et al., 2013, Kahler et al., 2015, Zhang et al., 2015b), this section mainly reviews recent advances on non-rigid scene reconstruction that are tightly correlated with

Overview

We aim to reconstruct the non-rigid dynamic object in real-world scene using a single depth camera, where the object movement is large and topologies may change significantly in the depth video. For example, as shown in the first row of Fig. 3(a, b), the person's hands and head touch with each other at the first time, and then gradually separate in the next a few frames. This kind of large movement often happens in our daily life, while most state-of-the-art 3D reconstruction methods, such as

Method

In this section, we will describe the components of our approach of reconstructing the non-rigid object with large movement. First, we introduce how to efficiently identify the most fine-grained scene topology as the canonical model (Fig. 4). Second, we deform the canonical model to the depth map, constrained by object contour and scene flows. Finally, we present a novel fusion strategy yielding compelling detailed models.

Experimental settings

We implemented our method on a 64-bit desktop machine with a 12-core 3.6 GHz Intel Xeon CPU, 64GB of memory and a Nvidia TITAN X graphics card. We use a single depth camera (e.g. Microsoft Kinect v2.0) to capture the depth sequence. At each time step, a depth map recorded at $512 \times 424$ resolution. To evaluate the proposed approach, we have captured several scenes with different actors behaving in real-life scenarios: opening hands, opening arms, pushing the pillow, stretching hands, playing plush

Conclusion

In this paper, we present a novel approach to achieve non-rigid reconstruction with large movement. In contrast to previous methods, we identify the most fine-grained scene topology as the canonical model, then preform “model-to-frame” deformation and adaptive fusion. Comparisons on several challenging real-world examples suggest that the proposed approach achieves smooth results with less noise. Furthermore, we are interested to develop efficient algorithms to jointly reconstruct the sequence

Acknowledgements

We would like to thank the anonymous reviewers for their valuable comments and suggestions. We are also grateful to Tao Yu for running BodyFusion (Yu et al., 2017) on our data and helpful discussions. This work was supported by National Natural Science Foundation of China (Grant No. 61502023 & U1736217).

References (40)

K. Guo et al.
Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera
ACM Trans. Graph. (TOG)
(2017)
H. Li et al.
Robust Single-View Geometry and Motion Reconstruction
(2009)
H. Li et al.
3d self-portraits
ACM Trans. Graph. (TOG)
(2013)
R. Zhang et al.
An efficient volumetric method for non-rigid registration
Graph. Models
(2015)
C. Cao et al.
3d shape regression for real-time facial animation
ACM Trans. Graph. (TOG)
(2013)
J. Chen et al.
Scalable real-time volumetric surface reconstruction
ACM Trans. Graph. (TOG)
(2013)
B. Curless et al.
A volumetric method for building complex models from range images
B.L. Curless
New Methods for Surface Reconstruction from Range Images
(1997)
M. Dou et al.
Scanning and tracking dynamic objects with commodity depth cameras
M. Dou et al.
Fusion4D: real-time performance capture of challenging scenes
ACM Trans. Graph. (TOG)
(2016)

M. Dou et al.

3d scanning deformable objects with a single rgbd sensor

M. Innmann et al.

Real-time volumetric non-rigid reconstruction

S. Izadi et al.

KinectFusion: real-time 3d reconstruction and interaction using a moving depth camera

M. Jaimez et al.

A primal-dual framework for real-time dense RGB-D scene flow

O. Kahler et al.

Very high frame rate volumetric integration of depth images on mobile devices

IEEE Trans. Vis. Comput. Graph.

(2015)

M. Liao et al.

Modeling deformable objects from a single depth camera

H. Lin et al.

Semantic decomposition and reconstruction of residential scenes from Lidar data

ACM Trans. Graph. (TOG)

(2013)

R.A. Newcombe et al.

Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time

R.A. Newcombe et al.

KinectFusion: real-time dense surface mapping and tracking

M. Nießner et al.

Real-time 3d reconstruction at scale using voxel hashing

ACM Trans. Graph. (TOG)

(2013)

Cited by (4)

Real-time 3D reconstruction techniques applied in dynamic scenes: A systematic literature review
2021, Computer Science Review
Citation Excerpt :
Most popular method is surface matching method and are used in state-of-the-art methods like [22,25,27,36,37] where the current frame is aligned to the previous based on point-to-point [38], point-to-plain [39] error matrices or the combination of both is also common [16,17]. Some of the 3D reconstruction algorithms like [19] have used frame-to-point [24] and tsdf-to-tsdf alignment technique, whereas in [12] convex hull constrain is adopted. Instead of using color information as photometric correspondence, [38] used reflection constrain and implemented a novel method in which the photometric information along with surface geometry is used as the correspondence to estimate inter-frame motion, thus increasing the accuracy in estimating correspondence.
Recent developments in capturing devices like kinect, Intel real sense camera etc., has impelled research in 3D reconstruction especially in the dynamic scene and the performance in terms of both reconstruction quality and speed has increased and thus, have supported many application like teleportation, gaming, free view point video, CG films etc. This paper provides systematic literature review of 3D reconstruction techniques applied in dynamic scene. The objective of this systematic literature review is to provide the detail technical progress in 3D reconstruction techniques for dynamic scene and to find the research gap in this field.
This paper presents a systematic literature review of the current state of the art that focuses on 3D reconstruction of non-rigid object, articulated motion and human performance in real-time. We further discuss the limitations of current methods and emphasize promising technologies for future development.
Search was conducted on five databases to find 3D reconstruction techniques for dynamic scene. As reconstruction of dynamic scene can be further categorized as rigid object 3D reconstruction and non-rigid object reconstruction based on the object being reconstructed in the dynamic scene. Thus we have searched for both categories for review and we have concentrated on the dynamic scene generated where object is dynamic while camera is static.
281 papers were initially searched further than after abstract screening 100 were selected later after detail study 46 were selected for systematic literature review and are presented in the table.
A three-dimensional reconstruction algorithm for extracting parameters of the banana pseudo-stem
2019, Optik
Citation Excerpt :
Compared to laser sensors, depth cameras can obtain images containing crop depth information. Measurement or positioning of the crop stems could be achieved by processing of these depth images [15–17]. Plant spacing sensing systems have been developed by Nakarmi et al. for identification of the corn stem positions by processing depth images [18].
Pseudo-stem diameter of banana is an important phenotype parameter reflecting the crop health and growth. Moreover, it indicates the lodging resistance as an important factor for peasants. For rapid measurement of pseudo-stems diameters, a 3D reconstruction algorithm is proposed. This algorithm consists of pseudo-stem reconstruction, point clouds data fitting, and parameter extraction of a depth image obtained by the red, green, blue-depth (RGB-D) camera by using a Particle Swarm Optimization algorithm. The experimental results show that the average measurement error of the pseudo-stem is 4.06 mm. The average relative measurement error is 2.34%. The mentioned results demonstrate the superior treatment effect of the proposed method.
Banana Pseudostem Width Detection Based on Kinect V2 Depth Sensor
2022, Computational Intelligence and Neuroscience
Deep Learning for Digital Geometry Processing and Analysis: A Review
2019, Jisuanji Yanjiu yu Fazhan/Computer Research and Development

^☆: This paper has been recommended for acceptance by Ligang Liu.

View full text

Reconstructing non-rigid object with large movement using a single depth camera☆

Highlights

Abstract

Introduction

Section snippets

Related work

Overview

Method

Experimental settings

Conclusion

Acknowledgements

ACM Trans. Graph. (TOG)

ACM Trans. Graph. (TOG)

Graph. Models

3d shape regression for real-time facial animation

ACM Trans. Graph. (TOG)

Scalable real-time volumetric surface reconstruction

ACM Trans. Graph. (TOG)

A volumetric method for building complex models from range images

New Methods for Surface Reconstruction from Range Images

Scanning and tracking dynamic objects with commodity depth cameras

Fusion4D: real-time performance capture of challenging scenes

ACM Trans. Graph. (TOG)

3d scanning deformable objects with a single rgbd sensor

Real-time volumetric non-rigid reconstruction

KinectFusion: real-time 3d reconstruction and interaction using a moving depth camera

A primal-dual framework for real-time dense RGB-D scene flow

Very high frame rate volumetric integration of depth images on mobile devices

IEEE Trans. Vis. Comput. Graph.

Modeling deformable objects from a single depth camera

Semantic decomposition and reconstruction of residential scenes from Lidar data

ACM Trans. Graph. (TOG)

Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time

KinectFusion: real-time dense surface mapping and tracking

Real-time 3d reconstruction at scale using voxel hashing

ACM Trans. Graph. (TOG)