Keywords

1 Introduction

Texture mapping plays an important role in reconstructing virtual versions of real-life products for E-Commerce applications with inexpensive consumer-grade scanning devices. This raises challenging problems to reconstruct seamless, high-quality texture maps from noisy data, such as inaccurate geometry, imprecise camera poses, and optical distortions of consumer-grade cameras. Existing methods such as Waechter et al. [24] efficiently select suitable images to texture faces on geometric models, but their method may generate visible seams, blurring and ghosting artifacts on the generated texture maps. Recently, Zhou and Koltun [26] use dense, global color information to correct the misalignments of images used for texturing, which produces impressive color maps. However, their approach suffers from large computational consumption. In this paper, we improve seamless texture maps by generating optimal seams with a bypass optimization and correcting the misaligned seams efficiently using local color information.

Fig. 1.
figure 1

The pipeline of our texture mapping process: (a) a set of images registered to a corresponding geometry (b) are taken as input, then the seam generation process (c) selects suitable images to texture faces on geometry, which creates most of the seams across low-frequency color areas. Finally, a seam correction optimization (d) corrects misalignments around seams (shown in translucent orange blocks) and generates a high-quality texture mapping output (e) (Color figure online)

Our approach achieves both efficiency and seamless texture maps by a two-step global optimization. For the first step, we present a novel optimization based on Markov random field (MRF) that selects suitable images to texture geometric meshes and generates optimal seams located at low-frequency texture regions on texture maps. Our optimization incorporates color discrepancies between the textures of adjacent faces on meshes. As a result, low-frequency texture regions will be more appropriate to create seams, which results in lower energy for MRF-based optimization, and visible seams are thus diminished. As the seams cannot be completely eliminated by the first step, we perform a joint optimization in the second step in order to maximize the color consistency around the seams, which further eliminates the misaligned seams. We estimate camera poses and local warping of images used for texturing geometry. Specially, we only estimate the color consistency of vertices around seams and warp local image patches where seams exist for efficiency.

The contributions of our approach can be summarized as follows. Firstly, we present a seam generation optimization based on MRF to create optimal seams on low-frequency color areas. Then, we propose a seam correction optimization which can efficiently correct misaligned errors. Finally, we present a two-step optimization framework that can efficiently generate seamless texture maps, a main problem of texture mapping. Experimental results demonstrate that our approach can provide a better color representation with much lower computational cost compared to existing methods.

2 Related Works

3D Acquisition. As consumer-grade depth cameras make 3D acquisition more and more affordable and convenient, geometric acquisition using RGB-D is highly anticipated [19, 23, 25]. The pioneer work of KinectFusion proposed by Izadi et al. [11] reconstructs the scene’s geometry with a volumetric representation. Nießner et al. [20] propose a real-time online 3D reconstruction system using an efficient geometric representation based on hashing. Zhou and Koltun [25] reconstruct dense scenes with points of interest using RGB-D cameras. Another popular method of geometric reconstruction is structure from motion. Ackermann et al. [1] propose a photometric stereo technique to reconstruct outdoor scenes. These methods based on structure from motion and RGB-D images are flexible enough to reconstruct geometry ranging from fine-scale objects to large-scale scenes. However, they generate 3D models with much noises [4, 10, 14]. Accurate geometric reconstruction can be obtained by structured light scanning systems [9, 18]. Gupta et al. [8] present a structured light system to reconstruct high-quality geometry with global illumination. In this paper, we use data scanned from a low-cost structured light system, which consists of an ordinary projector and a RGB industry camera.

Vertex Texturing Methods. Vertex texturing methods encode color information as per-vertex color. Nießner et al. [20] and Shan et al. [22] integrate multi-view color samples, which lead to blurring and ghosting artifacts due to the misaligned errors. Zhou and Koltun [26] use dense, global color information to estimate the photometric consistency of all vertices on the object’s mesh. Their method corrects misaligned errors and improves texture mapping fidelity. However, in order to describe the high-quality details of objects, their approach estimates the color for lots of vertices on the objects’ meshes, which is time-consuming and may lose the advantage of texture mapping that represents high-quality details with low geometric representations.

Face Texturing Methods. Face texturing methods such as [7, 15, 24] are based on Markov random field, and they select one single image to texture each face on the objects’ meshes. These methods can generate texture maps with lots of details. However, these methods cannot perfectly address the misalignments of images resulting in blurring and ghosting artifacts. Lempitsky and Ivanov [15] diminish visible seams by performing a global color adjustment following with Poisson editing [21]. However, blurring and ghosting artifacts around seams may still occur due to noisy input data. Other approaches are proposed to generate seamless texture maps using geometry information. For example, Barnes et al. [3] provide an interactive method to manually correct misaligned errors between the geometry model and images, which is not suitable for E-Commerce applications. Bi et al. [5] propose a patch-based optimization that incorporates geometry information. Their method estimates the bidirectional similarity of different images, which suffers from high computational cost. Recently, some deep learning-based approaches are developed for real-world texture reconstruction using texture synthesis [16, 17].

3 Overview

As shown in Fig. 1, our pipeline takes an object’s mesh and a set of images registered to the mesh as input, and generates a high-quality and seamless texture map for the object. Our approach starts with the seam generation process that takes advantage of a novel MRF formulation to select the “best view” texturing per face on the mesh. Existing methods consider “best view” selection as a Graph Cuts optimization [6]. The main idea of our seam generation optimization formulation consists of two energy terms. The first term (data term) selects high-resolution images to texture each face, and the second term (smooth term) provides a smooth representation. The energy function can be solved by a MRF solution [13]. However, traditional methods are not robust for noisy data, resulting in blurring and ghosting artifacts [24]. We redefine the energy function of MRF by employing an easy-to-compute data term and a smooth term by taking advantage of the color differences between adjacent faces, which generates more reliable invisible seams. Details of our seam generation are presented in Sect. 4.1.

To reduce the remaining visible seams that cannot be fully diminished by the seam generation step, we develop a seam correction optimization to deal with misalignments in Sect. 4.2. Inspired by Zhou and Koltun [26] and Bi et al. [5], we correct the texture regions around the seams to generate a consistent color 3D representation. Compared to the existing method [24], we design a close-form solution (Fig. 1(d)) to obtain plausible results with a low computational cost. Finally, we use the color adjustment method of Waechter et al. [24] to deal with luminance inconsistency caused by variance of lighting on the textured results.

4 Approach

Our approach can texture a 3D object with less perceptible texture seams and higher fidelity. This section details the two key steps. Section 4.1 describes the seam generation optimization step, and Sect. 4.2 describes the seam correction step.

4.1 Seam Generation Optimization

Our seam generation optimization divides the mesh into blocks of faces (as shown in Fig. 2), and the faces in a block corresponds to the same image. The boundaries between blocks are perceived as texture seams in the textured mesh. The input of our method includes an object’s triangle mesh and a set of object’s images registered to the object mesh. We represent the triangle faces on the mesh as \(\varvec{F}=\{F_1,F_2, \cdots ,F_m\}\), and the corresponding texture images as \(\varvec{I}=\{I_{1}, I_{2}, \cdots , I_{n}\}\), where \(I_{i}\) is the texture image for view i. The projection between images and faces are calculated according to camera intrinsic and extrinsic parameters.

Fig. 2.
figure 2

A mesh labeling example. Each color block in (a) represents a rendering texture to a face, and (b) shows the texturing result where pink contours indicate seams (Color figure online)

The generation of optimized blocks of faces or texture seams is formulated as a labeling problem, and we label each face \(F_j\) with a suitable image \(I_{l_i}\). We assume a vector to represent the label relationship \(\varvec{L}=\{L_1, L_2, \cdots , L_j, \cdots , L_m \} \in \{1, 2, \cdots , i, \cdots , n\}^m\), and \(L_{j} = i \) indicates that image \(I_{i}\) is used for texturing face \(F_j\). As multiple images may correspond to one face, we should select the best candidate image view for each face. Here, we adopt MRF to solve this problem, and a common MRF energy function can be defined as:

$$\begin{aligned} E(\varvec{L}) = E_{d}(\varvec{L}) + \alpha \cdot E_{s}(\varvec{L}). \end{aligned}$$
(1)

The first energy term, namely the data term \(E_{d}(\varvec{L})\) represents the cost of texturing faces with texture images from a certain selection of views. The second term \( E_{s}(\varvec{L}) \) represents the energy measuring the smooth level of the generated texture. \(\alpha \) is a parameter to adjust the weight. In this paper, we propose a novel data term and a smooth term to effectively generate more accurate seams at low-frequency color areas, which cannot be solved by existing MRF techniques.

We aim to reconstruct textures of objects with an ordinary size, and there are no scaling issues in our image data. Thus, it’s not necessary to consider scaling in the data term as in Waechter et al. [24] which is time-consuming. Besides, Allene et al. [2] utilize a projected size as the data term, which is easy to calculate, but they cannot deal with blurring artifacts. Inspired by the method in [15], the metric employed by our data term is to measure the angle between the normal of a face and the camera view direction of an image. This metric accelerates our algorithm since it is computationally efficient. For face \(F_j \in \varvec{F} \), we use \(f(L_{j})\) to evaluate the texturing quality of \(F_j\) with view image \(I_{L_{j}} \in \varvec{I}\) \((L_{j} = i)\). If we can observe \(F_j\) from image \(I_{L_j}\), we have \(f(L_{j}) =1- (\varvec{n}_{j} \cdot \varvec{n}_{L_j} )^2\); otherwise, \(f(L_{j}) =\infty \), where \(\varvec{n}_{j}\) is the normal vector of face \(F_j\) and \( \varvec{n}_{L_j}\) is the unit camera view vector of image \(I_{L_{j}}\). We then have our data term \(E_{d}(\varvec{L}) \) as follows:

$$\begin{aligned} E_{d}(\varvec{L}) = \sum _{j=1}^{j=m} f(L_{j}). \end{aligned}$$
(2)

For the smooth term, we minimize the average color difference between co-edged faces along the texture seams. We suppose that \(e_{jk} \in \varvec{E_{dge}} \) is the edge shared by face \(F_{j}\) and face \(F_{k}\), \(\varvec{E_{dge}}\) is the set of edges shared by adjacent faces of the mesh. Since faces have greater color differences than edges, the average color of faces can better express discrepancies than method [15], which utilizes color discrepancies on the edges. Let \(\varvec{C}_{L_{j}}\) be the average color of pixels on the area where \(F_j\) is projected onto an image \(I_{L_j}\), we use the following function to measure the cost of edge \(e_{jk}\):

$$\begin{aligned} D(L_j,L_k) = {\left\{ \begin{array}{ll} 0, &{} \text {if } L_j = L_k \\ d(\varvec{C}_{L_{j}}, \varvec{C}_{L_{k}})^2, &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(3)

where \(d( \cdot ,\cdot )\) is the Euclidean distance on RGB color space. Thus, we have the smooth term \(E_{s}(\varvec{L})\):

$$\begin{aligned} E_{s}(\varvec{L}) =\alpha \cdot \sum _{e_{jk} \in \varvec{E_{dge}}}D(L_j,L_k). \end{aligned}$$
(4)

The overall seam generation energy in Eq. (1) can be re-written as:

$$\begin{aligned} E(\varvec{L}) = \sum _{j=1}^{j=m} f(L_{j}) + \alpha \cdot \sum _{e_{jk} \in \varvec{E_{dges}}}D(L_j,L_k). \end{aligned}$$
(5)

The energy function in Eq. (5) can be formulated as a probability distribution problem with Markov random field, which can be efficiently solved by the \(\alpha \)-expansion Graph Cuts [6].

4.2 Seam Correction Optimization

Seam generation optimization produces reasonable seams that are less perceptible. There are still some noticeable seams as shown in Fig. 4 in the reconstructed model due to large misalignments of images. Seam correction optimization is designed to correct such seams by adjusting the content of selected images used for textures. Different from other global optimization methods such as Zhou and Koltun [26] that estimate colors for all vertices, our approach is more efficient since our optimization is only performed for a small set of vertices around seams.

Fig. 3.
figure 3

The seam generation results of our method are tested on several datasets. We compare our results with Waechter et al. [24]. Our method outperforms the state-of-the-art view selection methods

When performing seam correction optimization, we take the following two factors into consideration. First, the color along seams should be consistent. Second, the textured appearance of the area around seams should be similar to the corresponding area in the selected images. To this end, we estimate the colors for all vertices within the ranges on geometry where seams exist, as well as the camera poses for all images used for texturing geometry and the local warping of images patches that contain the seams to maximize the color similarity of the mapping around seams.

After seam generation, we divide the mesh into different blocks textured with some selected images. We represent the image patch corresponding to each mesh block respectively as \(\varvec{P} = \{\varvec{P}_{1}, \varvec{P}_{2}, \cdots , \varvec{P}_{k}\}\). One image can be used to texture multiple blocks (e.g. \(\{\varvec{P}_{1}, \varvec{P}_{2}, \varvec{P}_{3}\} \in I_{i}\)). We define the mesh blocks as \(\varvec{B} = \{\varvec{B}_{1}, \varvec{B}_{2}, \cdots , \varvec{B}_{k}\}\). The relationship between an image patch and its corresponding mesh block is a perspective transformation calculated as \(\varvec{P}_{k} = \varvec{\mathcal {K}}\varvec{T}_{k}\varvec{B}_{k}\) (\(\varvec{T} = \{\varvec{T}_{1}, \varvec{T}_{2}, \cdots , \varvec{T}_{k}\}\) denotes the external camera parameters, and \(\varvec{\mathcal {K}}\) denotes the internal camera parameters). \(\varvec{E} = \{E_{1}, E_{2}, \cdots , E_{k}\}\) denotes the edges set for each mesh block \(B_{k}\). \(\varvec{F} = \{\varvec{F}_{1}, \varvec{F}_{2}, \cdots , \varvec{F}_{k}\}\) denotes the control lattices for each image patch \(\varvec{P}_{k}\), which is used to warp image patch \(\varvec{P}_{k}\). We choose a vertex \(\varvec{v} \in \varvec{B}_{k}\) as a candidate vertex for optimization, and the shortest geodetic distance from vertex \(\varvec{v}\) to edge set \(E_{k}\) is defined as \(g(\varvec{v},E_{k})\). We define a proper control range in each image patch for correction, in which only the vertex \(\varvec{v}\) with \(g(\varvec{v},E_{k})\) less than \(\gamma \) is used. The objective function can then be described as:

$$\begin{aligned} E_{correction}(\varvec{C},\varvec{F},\varvec{T}) = \sum _{k}(\sum _{\begin{array}{c} \varvec{v} \in \varvec{B}_k, \\ g(\varvec{v},E_{k}) < \gamma \end{array} } w(\varvec{v}) \cdot e^2 + \beta \cdot \varvec{F}_{k}^{\top }\varvec{F}_{k} ), \end{aligned}$$
(6)

where \(\varvec{C} = \{C_{\varvec{v}}| g(\varvec{v},E_{k}) < \gamma \}\) denotes the set of gray-scale color estimated for vertices around seams on the mesh, \(\varvec{F}\) denotes the pixel color corrections and \(\varvec{T}\) denotes the camera pose transformations. During optimization, the variables are optimized to correct visible seams. The second term is a regular term penalizing image patches from excessive deformation \(\varvec{F}\). \(w(\varvec{v})\) is the weight of vertex \(\varvec{v}\) representing the color discrepancies of the faces around vertex \(\varvec{v}\):

$$\begin{aligned} w(\varvec{v}) = \frac{1}{|\varvec{\mathcal {E}}(\varvec{v})|} \sum _{e_{jk} \in \varvec{\mathcal {E}}(\varvec{v})} d(C_{L_{j}},C_{L_{k}})^2, \end{aligned}$$
(7)

where \(\varvec{\mathcal {E}(v)} \) is the set of edges that connect with vertex \(\varvec{v}\). Since we texture each face on geometry in the seam generation step, \(w(\varvec{v}) \) can be precomputed and is constant in the seam correction step, which gives more weight to the vertex located at high-frequency color areas around seams.

The first term of the objective function estimates the color similarity for each vertex in \(\{\varvec{v}|g(\varvec{v},E_{k}) < \gamma \} \), \(e^{2}\) is the residual value, and is described as:

$$\begin{aligned} e = C_{\varvec{v}} - \mathcal {L} \circ \varvec{ \mathcal {F}}_{k} (\varvec{\mathcal {K}} \cdot \varvec{T}_{k} \cdot \varvec{v}). \end{aligned}$$
(8)

For each vertex \(\varvec{v}\), we project \(\varvec{v}\) into the image plane of \(\varvec{P}_{k}\) according to camera intrinsic parameter \(\varvec{\mathcal {K}}\) and camera poses \(\varvec{T}_{k}\) (we assume that \(\varvec{u} = \varvec{\mathcal {K}} \cdot \varvec{T}_{k} \cdot \varvec{v}\)), and then warp image patch \(\varvec{P}_{k}\) according to the following color correction function:

$$\begin{aligned} \varvec{ \mathcal {F}}_{k}(\varvec{u}) = \varvec{u} + \varvec{B}(\varvec{u}) \cdot \varvec{F}_{k}, \end{aligned}$$
(9)

where \(\varvec{B}(\varvec{u})\) is a vector of B-spline base function which controls the vector of control lattices. To this end, we use \(\mathcal {L}(\varvec{u})\) to evaluate the gray-scale intensity of the projective point of vertex \(\varvec{v}\).

We solve Eq. (6) using an alternating optimization approach to optimize the image correction inspired by Zhou et al. [27]. We first fix \(\varvec{F}\) and \(\varvec{T}\) to optimize \(\varvec{C}\), then fix \(\varvec{C}\) to optimize \(\varvec{F}\) and \(\varvec{T}\), and vice versa. When \(\varvec{F}\) and \(\varvec{T}\) are fixed, Eq. (6) degenerates to a least-square optimization problem, and we use the following equation to calculate the average gray-scale value of vertex \(\varvec{v}\):

$$\begin{aligned} C_{\varvec{v}} = \frac{1}{|\varvec{\mathcal {I}}_{\varvec{v}} |} \sum _{\begin{array}{c} k,\\ \varvec{P}_{k} \in \varvec{\mathcal {I}}_{\varvec{v}} \end{array}} \mathcal {L} \circ \varvec{ \mathcal {F}}_{k} (\varvec{\mathcal {K}} \cdot \varvec{T}_{k} \cdot \varvec{v}), \end{aligned}$$
(10)

where \(\varvec{\mathcal {I}}_{\varvec{v}} \) is the set of image patches that can observe \(\varvec{v}\).

When \(\varvec{C}\) is fixed, we perform an inner iterative strategy to solve \(\varvec{F}\) and \(\varvec{T}\). We fix \(\varvec{F}\) and assume that we perform little rotation for each image. With this assumption, we can solve the camera pose as a linear system. By approximating the external camera matrix as a 6-vector \(\varvec{T}_{k} = \{\alpha _{k}, \beta _{k}, \lambda _{k}, a_{k}, b_{k}, c_{k}\}\), we can independently solve a linear system with 6 parameters for each image patch \(\varvec{P}_{k}\).

After that, we fix \(\varvec{C}\) and \(\varvec{T}\) to optimize \(\varvec{F}\). Then \(\varvec{u} = \varvec{\mathcal {K}} \cdot \varvec{T}_{k} \cdot \varvec{v}\) is a constant, and \(\varvec{\mathcal {F}}_{k}(\varvec{u})\) is a linear combination of \(\varvec{F}_{k}\), and we have:

$$\begin{aligned} E_{correction}(\varvec{F}) = \sum _{k}(\sum _{\begin{array}{c} \varvec{v} \in \varvec{B}_k, \\ g(\varvec{v},E_{k}) < \gamma \end{array} } w(\varvec{v}) \cdot (C_{\varvec{v}} - \mathcal {L} \circ \varvec{\mathcal {F}}_{k}(\varvec{F}_{k}) )^{2} + \beta \cdot \varvec{F}_{k}^{\top }\varvec{F}_{k} ). \end{aligned}$$
(11)

Equation (11) is a least-square system and can be efficiently solved. We continue the alternating optimization iteratively until it converges.

5 Results

We evaluated the performance of our proposed method using our test datasets. All experiments were performed on a commodity workstation with an Intel i5 3.2 GHz CPU and 8 GB of RAM. We first presented details about the test data, and then evaluated the seam generation process and seam correction process. Finally, we compare our method to the state-of-the-art approaches.

Fig. 4.
figure 4

Some noticeable seams still remain in the textured object due to the misalignment of texture images that cannot be completely avoided by seam generation optimizations

Test Datasets. Our datasets were captured from real-life products (shoes, arts, crafts, etc.) and were reconstructed by the following steps. For each object, we first used a consumer-level structured light 3D scanner to generate a registered point cloud. The scanner contained an RGB industry camera and a normal projector, which was inexpensive. Calibration was performed before the scanning procedure. Then we meshed the point cloud by surface reconstruction [12]. Since the calibrated parameters changed slightly due to the heat transfer in the environment (especially for the parameters of projector), the reconstructed 3D geometry, camera poses and images suffered from noises. Besides, geometric errors would also be introduced by the point cloud registration step. In general, the error of the geometry model was about 3 mm to 5 mm, and the texture re-projection error was about 5 pixels to 15 pixels (the resolution of the captured images was \(3456 \times 2304\)).

Seam Generation Evaluation. We first evaluated the contribution of the weight \(\alpha \) in Eq. (1). As discussed in Sect. 4.1, the weight \(\alpha \) kept a balance between the data term and the smooth term. As shown in Fig. 6, we found that seams will bypass most of the high-frequency color areas on texture images if we set a larger \(\alpha \). Since a larger weight of \(E_{s}(\varvec{L})\) might result in a larger value of \(E_{d}(\varvec{L})\), the resolution of texture might be reduced. Hence, we need to find an appropriate \(\alpha \) value which will not increase \(E_{d}(\varvec{L})\) significantly. We estimated the incremental percentage of \(E_{d}(\varvec{L})\) for different values of \(\alpha \):

$$\begin{aligned} \varDelta E_{d,\alpha _{i}}(\varvec{L})=\frac{E_{d,\alpha _{i}}(\varvec{L})-E_{d,\alpha _{i-1}}(\varvec{L})}{E_{d,\alpha _{i-1}}(\varvec{L})}(i \geqslant 2). \end{aligned}$$
(12)

We set different \(\alpha \) values (from 50 to 300) marked as \(\alpha _{i}\), and tested its influence on \(E_{d}(\varvec{L})\). Estimated data were shown in Table 1 and Fig. 5, when \(\alpha \leqslant 200\), the average incremental percentage of \(E_{d}(\varvec{L})\) was about 5% and increased to about 15% when \(\alpha >200\). It meant that if we set \(\alpha \) too large for the seam generation optimization, the optimization will select images with larger intersection angles between the view direction and the face normal to texture geometric meshes, which decreased the resolution of the texture. Thus, we set \(\alpha = 200\) as a trade-off to balance the seam visibility and the resolution of texture.

Since [7, 15, 24] used similar ideas dealing with seams considering color differences, labels of vertices or edges for faces along seams, we compared our seam generation strategy to the method of Waechter et al. [24]. The method of Waechter et al. [24] integrated the colors of image patches projected to a face as the data term energy, which favored close-up views. This approach is suitable for large-scale models. Different from their method, we used the angle between the face normal and the camera view direction for the data term. In addition, the smooth term in [24] is based on the Potts model, while our smooth term was based on the color difference between faces adjacent to seams. As shown in Figs. 3 and 8, our method can generate more reasonable seams.

Table 1. Some experimental examples of \(\alpha \) in Eq. (1). For our datasets, we set \(\alpha = 200\) as a trade-off to balance the seam effect and the resolution of texture

We also compared the computational cost of the MRF-based optimization between Waechter et al.’ method and our approach. As the authors described in [24], their main computational bottleneck relied on the data term, while the main computational cost of our MRF-based optimization relied on the smooth term, which calculates the average color of each face. Theoretically, the computation of average color was cheaper to calculate than the computation of Waechter et al.’ data term, which needs to compute the projected size and integrate pixel colors. In Table 2, we compared the MRF computational cost between our method and Waechter et al.’ method quantitatively. The listed statistics conform to our theoretical analysis. Our computational advantage is more obvious when the data size grows large.

Table 2. The comparison of computational cost between our seam generation and Waechter et al. [24] for datasets with different sizes (Ours/Waechter et al.’ method). Our approach is more computationally efficient than Waechter et al. [24] (“k”, “m” represent thousand and million, respectively)
Fig. 5.
figure 5

The incremental percentages of \(E_d(\varvec{L})\)

Fig. 6.
figure 6

The seam generation results. The seam generation scheme can bypass more high-frequency color areas as \(\alpha \) increases

Seam Correction Evaluation. We first compared our results to the approach of Zhou and Koltun [26]. Their approach used images of all views to optimize color for each vertex. However, if some images were blurred, it generated blurring artifacts. Different from them, our approach selected the best images to texture faces on meshes. As a result, our method can avoid blurring effects effectively. Moreover, since we used high-resolution images to texture faces, we can generate high-quality texture maps even for low-resolution meshes. Results shown in Fig. 7 indicate that our approach can generate better results for blurring cases.

To evaluate the optimization performance quantitatively, we defined a normalized residual error by dividing it with the number of vertices used for optimization, and was described as:

$$\begin{aligned} RE_{normalized}=\frac{\sum _{i=1}^{k}\sum e^{2}}{ |\varvec{v}|}. \end{aligned}$$
(13)

In this way, the residual errors of ours and Zhou and Koltun’ method [26] were comparable. We have shown the \(RE_{normalized}\) results in Table 3.

Table 3. Normalized residual error and average time cost per iteration. Our optimization converges faster and has a lower computational cost in each iteration
Fig. 7.
figure 7

(a)–(d) are the results of Zhou and Koltun’ method by rendering meshes with vertex color. From left to right, the number of vertices of the models are 0.05 million, 0.2 million, 0.5 million, and 1 million, respectively. (e) is our result with 0.05 million vertices. We reconstruct high-quality texture maps with low geometric complexity

Fig. 8.
figure 8

We compare our method to the state-of-the-art methods with the same inputs. Shan et al. [22] (a, e) generate blurring and ghosting artifacts for noise data. Zhou and Koltun [26] perform better, but their results suffer from low-resolution geometry (b, f). Waechter et al. [24] select an image texturing per face by penalizing a global optimization, resulting in visible seams for noisy data (c, g). Our method can produce realistic and high-fidelity texture maps (d, h)

From Table 3, we can find that our method converges faster than the method by Zhou and Koltun [26] (see the column named “Time per iter.” and “# of iter.”). Moreover, the computational cost of our approach outperforms Zhou and Koltun’ method [26], especially for high-resolution meshes (see the column named “Total time” in Table 3). This can be explained as follows. Zhou and Koltun [26] utilized all vertices for optimization. Different from them, we only utilized related pixels around the seams for color optimization, and our computational cost was related to the number of edges of seams instead of the number of vertices.

Texture Maps Evaluation. Finally, we compared our final results to the approaches of Shan et al. [22], Waechter et al. [24] and Zhou and Koltun [26] qualitatively. For a fair comparison, all methods shared the same inputs. The results were shown in Fig. 8. Both [22] and [26] produced color for all vertices only. Shan et al. [22] blended color for vertices from views, resulting in blurry and ghosting artifacts shown in Fig. 8(a) and (e). Zhou and Koltun [26] generated better results in Fig. 8(b) and (f), but their performance was limited by the number of vertices. Waechter et al. [24] performed texturing per face on mesh with a single image, their approach generated obvious seams because of noises (shown in Fig. 8(c) and (g)). With our two-step optimization, our approach was able to produce visually seamless texture maps (see Fig. 8(d) and (h)). The comparison results show that our approach can substantially improve texture mapping.

6 Conclusions

It is a challenging problem to reconstruct virtual versions of real-life products realistically with inexpensive consumer-grade scanning devices. We have presented a two-step optimization solution for seamless texture mapping with noisy data. The seams are generated from imperceptible texture regions, and the seam misalignments are corrected by the color consistency strategy. We evaluate our approach on a number of objects. Experimental results have shown that our method can efficiently generate visually seamless high-fidelity texture maps with realistic appearance at a low cost. More experimental results are shown in the supplementary video.

It is worth noting that our approach uses a small set of data around seams to correct misalignments, and thus may not be able to correct large noisy data. We mainly focus on indoor objects, and the occlusion problem is not yet addressed. We plan to extend our approach to data with even larger noises in our future work.