Keywords

1 Introduction

In recent years there have been many studies on the 3D restoration of deformable objects in successive frames [1,2,3,4]. However, it is still challenging to reconstruct 3D models of deformable objects from consecutive frames in the field of computer vision and graphics [2]. One of the methods for reconstructing the 3D model of a deformable object is to estimate the transformation matrix for the object’s transformed part from a continuous frame. One of the more detailed approaches is to use non-rigid deformation [5,6,7] to transform the vertices of the model. This approach uses spatial transformations that consist of graph vertices. The transformation for one vertex affects the deformation of the neighboring space because it is associated with the connected node. They solve the non-linear minimization problem and estimate the optimal value of the affine transformation matrix for each vertex. However, because it takes a lot of computation time to estimate the transformation matrix for all vertices of an object, most of them work offline or they use the GPU’s parallel processing technique to reduce non-rigid registration computation time. Another method is to use only non-rigid parts, except rigid parts like backgrounds [2, 8]. This method can reduce calculation time. Another method [8, 12, 17, 18] is to apply the GPU’s parallel processing technique to reduce non-rigid registration computation time. If calculations do not need to be calculated sequentially, the application of parallelism can produce very effective results.

The goal of this paper is to quickly and precisely match sequential 3D data of a dynamically moving whole body model. Our approach does not estimate the 3D transformation matrix for all points to efficiently perform deformable registration. After selecting uniformly sampled points as in [5], we obtain the 3D transformation matrix. Unselected points are related to the movement of neighboring points. This property can have the form of a graph structure [5]. This method results in computationally very efficient and smooth outputs. However, it still requires a high computational complexity to estimate the 3D transformation matrix of all selected points. Therefore, a lot of processing time is needed. We propose a method that a fast and accurate estimation of the 3D transformation matrix of selected points using the GPU’s parallel processing.

Figure 1 shows our system flow chart. First, we obtain dynamic full-body model data from a single RGBD camera. Moreover, we use only the non-rigid part by removing the background based on the distance threshold. We use two sequential data as inputs to our system, as shown in Fig. 1. The input data is used as a pair of color and depth images. We use Gunnar Farneback’s optical flow [10] to search for the correspondence of sequence data. We used a simple method of selecting pixels at regular intervals as nodes. We project the pixels selected as nodes into the 3D space based on the camera coordinate system. Next, we estimate the 3D affine transformation matrix between the previous frame and the current frame for all nodes through our proposed GPU-based non-rigid ICP algorithm. The 3D transformation matrix of each node has a size of 3 × 4 in the form of an affine matrix. Finally, we use weight-based affine transformation method we propose to apply the 3D transformation of all points.

Fig. 1.
figure 1

Overview of the proposed approach

The last section of this paper shows the accuracy and runtime test results for non-rigid ICP for all nodes. It also shows the result of applying the weight-based affine transformation method.

2 Corresponding Pixel Search, Node Selection, and 3D Reconstruction [-]

2.1 Corresponding Pixel Search, Node Selection

We search for corresponding pixels for successive frames using Optical-flow algorithm. We then use the uniform-sampling method to select pixels without using all the pixels. Selected pixels are used as nodes. We used a uniform-sampling method to select pixels at regular intervals. The selected nodes estimate 3D transformation matrix using the method in Sect. 3. The unselected pixels estimate 3D transformation matrix using the 3D transformation matrix of neighboring nodes.

We use Gunnar Farneback’s optical flow [10] algorithm to find the corresponding pixels of the previous frame and the current frame. Figure 2 shows an example of the Optical-flow algorithm.

Fig. 2.
figure 2

Corresponding pixel search using Optical-flow algorithm: The yellow pixel is the selected pixel. The red straight line is a straight line that visualizes the vector pointing to the corresponding pixel. (Color figure online)

2.2 3D Reconstruction

In this paper, we aim to do 3D non-rigid registration of deformable whole body model. So we project all the pixels into the 3D space. We transformed the pixels into the 3D coordinate system of the camera reference using Eq. (1).

$$ P_{{\left( {x, y} \right)}} = D_{{\left( {x,y} \right)}} K^{ - 1} \left[ {x,y,1} \right]^{T} $$
(1)

In Eq. (1), (x, y) is the pixel coordinates in the color image. \( P_{{\left( {x,y} \right)}} \) is a 3D point based on the camera coordinate system. That is, \( P_{{\left( {x,y} \right)}} \) is a result of converting a 2D pixel into 3D coordinates. \( D_{{\left( {x,y} \right)}} \) is the depth value for the pixel obtained from the depth image. \( K \) is an internal parameter of the camera. The resolution of the color image and the depth image are the same.

3 Non-rigid Registration Using GPU

3.1 Grouping Nearest Neighbor Node

To estimate the 3D transformation matrix between two point groups in 3D space, we must know three or more corresponding points. When the number of corresponding points is less than three, the 3D rotation vector cannot be estimated. In this case, only a 3D translation vector can be estimated.

In this paper, nodes are grouped to estimate the 3D transformation matrix of selected nodes. Our method is as follows. As shown in Fig. 3, we set a distance to a threshold from one node. We have grouped N-closest points within this area. If there are not N-number of nodes in this area, we have chosen the center node. We applied this method to all nodes. If the grouping is successful, the movement of neighbor points affects all neighbor points.

Fig. 3.
figure 3

Grouping the nearest points: We grouped the N-closest points around a point.

When the grouping method is applied as shown in Fig. 3, edges are created between the points. And the overall structure shows the graph form. Therefore, we refer to these points as nodes in this paper. In Fig. 3, \( i \) is the index of the selected node, and \( j \) is the index of the neighboring node.

3.2 Energy Function

In general, Non-rigid registration algorithm for the deformation model uses a method of defining the 3D transformation matrix as an affine matrix and optimizing the non-rigid ICP energy function [1,2,3, 5,6,7,8]. In this paper, we use the affine matrix as the 3D transformation matrix of each node. The size of the affine matrix is 3 × 4. We defined the energy function to estimate the 3D transformation matrix of all nodes for the deformation model for the previous frame and the current frame. This section describes how to optimize energy functions at high speed using the GPU. We have defined two energy functions, \( E_{data} \) and \( E_{rigid} \), as in Eq. (2). We then use Levenberg-Marquardt (LM) algorithm to optimize the energy function and estimate the affine matrix for all nodes.

$$ E_{tol} \left( X \right) = \alpha_{data} E_{data} \left( X \right) + \alpha_{rigid} E_{rigid} \left( X \right) $$
(2)

In the Eq. (2), \( X \) is defined as \( X = \left\{ {X_{1} , \cdots ,X_{n} } \right\} \). \( X_{i} \) is a 3 \( \times \) 4 affine transformation matrix for the \( i \)-th node. \( E_{data} \) and \( E_{rigid} \) are called the data term and rigid term. \( E_{rigid} \) is also called an orthogonality constraint [2, 11]. \( \alpha_{data} \) and \( \alpha_{rigid} \) are constant values for weighting each energy function.

3.2.1 Data Term

We define the Data term as Eq. (3). The data term is an energy term for estimating the optimal 3D transformation matrix for a point group corresponding to a specific 3D point cloud. After applying the method in Sect. 3.1, suppose that there is one group of point clouds. In Eq. (3), \( M \) is the total number of nodes for the whole-body model, and \( N \) is the number of grouped nodes. \( X_{i} \) represents the affine matrix of the \( i \)-th node. \( P \) is the node of the previous frame, and \( Q \) represents the node corresponding to \( P \) in the current frame.

$$ E_{data} = \sum\nolimits_{i = 0}^{M} {\left( {\left\| {\sum\nolimits_{j = 0}^{N} {X_{i} p_{j}^{i} - q_{j}^{i} } } \right\|_{2}^{2} } \right)} $$
(3)

3.2.2 Rigid Term

We define the rigid term as Eq. (4). The rigid term allows Affine matrix to have a property similar to a rigid transformation matrix. In other words, the rigid term is an energy term that causes the affine matrix to have the property of an orthogonal matrix. In Eq. (4), \( a_{i} = \left( {a_{i1} , a_{i2} , a_{i3} } \right) \) is a row vector of \( X_{i} \) in Eq. (3). \( a_{i} = \left( {a_{i1} , a_{i2} , a_{i3} } \right) \) is shown in Fig. 4. \( i \) is the index number of the nodes. \( M \) is the total number of nodes for the whole body model.

$$ E_{rigid} = \sum\nolimits_{i = 0}^{M} {\left( {\begin{array}{*{20}c} {\left( {a_{i0}^{T} a_{i1} } \right)^{2} + \left( {a_{i1}^{T} a_{i2} } \right)^{2} + \left( {a_{i2}^{T} a_{i1} } \right)^{2} + } \\ {\left( {1 - a_{i0}^{T} a_{i0} } \right)^{2} + \left( {1 - a_{i1}^{T} a_{i1} } \right)^{2} + \left( {1 - a_{i2}^{T} a_{i2} } \right)^{2} } \\ \end{array} } \right)} $$
(4)
Fig. 4.
figure 4

Row vector of \( X_{i} \)

3.3 Optimization

3.3.1 Energy Minimization Method

We use a GPU-based optimization algorithm to obtain the \( X \) matrix with the size of \( E_{tol} \left( X \right) \) in Sect. 3.2 minimized. The optimization algorithm as shown in Eq. (5) uses a combination of Levenberg-Marquardt (LM) algorithm and the Gauss-Newton algorithm. In Eq. (5), \( {\mathcal{J}} \) is a Jacobian matrix for \( E_{tol} \left( X \right) \). \( \lambda \) is a constant value and \( I \) is an identity matrix.

$$ \left( {{\mathcal{J}}^{T} {\mathcal{J}} + \lambda I} \right)h = - {\mathcal{J}}^{T} E_{tol} \left( X \right) $$
(5)
$$ Ax = B $$
(6)

We initialize the initial affine transformation matrix for each node to the Identity matrix. We used Eq. (6) to find \( h \) of Eq. (5). \( A \) of Eq. (6) uses \( \left( {{\mathcal{J}}^{T} {\mathcal{J}} + \lambda I} \right) \) in Eq. (5). B of Eq. (6) uses \( - {\mathcal{J}}^{T} E_{tol} \left( X \right) \) in Eq. (5). We used GPU based PCG (Preconditioned Conjugate Gradient) algorithm for the linear solver. We applied the variation \( h \) obtained from the linear solver for optimization to Eq. (7). Equation (7) is an expression for reaching the optimal solution of X.

$$ X = X + h $$
(7)

The size of the affine transformation matrix for each node is 3 \( \times \) 4. Therefore, we must estimate 12 non-parameters per node through the solver. The larger the number of non-parameter, the larger the number of nodes, the higher the amount of computation required to obtain the Jacobian matrix. Section 3.3.2 suggests how to process Jacobian operations at high speed using the GPU.

3.3.2 GPU Parallelism for Jacobian Calculations

Independent and iterative calculation structures are advantageous for parallel processing systems. Jacobian calculations are dependent on a group of nodes and are independent of other groups. We designed the data term and rigid term as two asynchronous kernels in parallel, as shown in Fig. 5.

Fig. 5.
figure 5

Asynchronous kernel structure for Jacobian calculations based on GPU

In Data term kernel, we have assigned the number of CUDA blocks equal to the number of nodes. We used GPU shared memory as much as possible by assigning one node group to one CUDA block. CUDA shared memory has the advantage of faster access speed than CUDA global memory. This structure makes maximum use of parallel processing. The number of threads in Data term kernel is ‘neighboring node size + 1’. The reason for adding one is because it includes the core node. In Rigid term kernel, the number of CUDA blocks is the total number of nodes divided by the number of threads allocated in a block.

4 Weight-Based Affine Transformation

In this paper, we propose a weight-based affine transformation method to apply the spatial transformation of all points using the affine transformation matrix for each point. Our method is defined by Eq. (8). This method weights each transformation matrix of all points to estimate a 3D transformation matrix of points. We used the exponential function to define the weighted value in inverse proportion to the distance as shown in Eq. (8). The nearest node has a greater effect than the farther node.

$$ P^{{\prime }} = \frac{{\mathop \sum \nolimits_{i = 0}^{N} e^{{ - \lambda d_{i} X_{i} P}} }}{{\mathop \sum \nolimits_{i = 0}^{N} e^{{ - \lambda d_{i} }} }} $$
(8)

In the above equation, \( P \) is the point of the previous frame, and \( P^{{\prime }} \) is the result of applying the weight-based affine transformation. And \( i \) denotes an index of a node. \( N \) is the total number of nodes. \( X_{i} \) is the affine transformation matrix of the \( i \)-th node and \( \lambda \) is a constant.

5 Experimental Results

We used one Intel RealSense D415 camera. All the experiments are worked in a general purpose Intel i7-4790 computer running Windows 10 (64 bit) and an NVIDIA Titan XP graphics card. We evaluated the performance by experimenting with four data sets as shown in Table 1. Table 1 shows information in the dataset and the experimental results. It shows the average number of points and the average number of selected nodes in order. And it shows the calculation time of the non-rigid ICP algorithm and distance error value between nodes. The average non-rigid ICP calculation time was measured at 71.17 FPS. The average distance error of the nodes was measured as 4.24 mm. Figure 6 shows the result of applying our distortion transformation using the transformation matrix obtained from the non-rigid ICP in the previous frame and the current frame. In Fig. 6(a), the red part is the 3D data of the previous frame. In Fig. 6(b), the red color is the result of our proposed method.

Table 1. Experiments dataset
Fig. 6.
figure 6

Experiment results: (a) previous frame + current frame, (b) current frame + our result frame, (c) gaussian distribution based on distance difference: the upper graph is (a), the lower graph is (b).

6 Conclusion

This paper presents a non-rigid registration method for deformable objects and a GPU-based non-rigid ICP acceleration method. To reduce the amount of computation, we first select uniformly sampled points and then estimate the transformation matrix of each point. And we proposed a method of smooth registration of all points through the weight-based affine transformation method. Also, we evaluated the accuracy and speed of the proposed method through experiments.

We used a single camera, and there was an occlusion problem between frames due to the movement of the whole body model. It caused a loss of 3D data. Also, when the deformation of the model between successive frames occurs to a large extent, we have a problem that the error increases. In the future, we will study ways to improve our limitations.