Edge Orientation Driven Depth Super-Resolution for View Synthesis

Yao, Chao; Xiao, Jimin; Jin, Jian; Ban, Xiaojuan

doi:10.1007/978-3-030-34113-8_10

Chao Yao¹⁴,
Jimin Xiao¹⁵,
Jian Jin¹⁶ &
…
Xiaojuan Ban¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11903))

Included in the following conference series:

International Conference on Image and Graphics

1626 Accesses

Abstract

The limited resolution of depth images is a constraint for most of practical computer vision applications. To solve this problem, in this paper, we present a novel depth super-resolution method based on machine learning. The proposed super-resolution method incorporates an edge-orientation based depth patch clustering method, which classifies the patches into several categories based on gradient strength and directions. A linear mapping between the low resolution (LR) and high resolution (HR) patch pairs is learned for each patch category by minimizing the synthesis view distortion. Since depth maps are not viewed directly, they are used to generate the virtual views, our method takes synthesis view distortion as the optimization strategy. Experimental results show that our proposed depth super-resolution approach performs well on depth super-resolution performance and the view synthesis compared to other depth super-resolution approaches.

You have full access to this open access chapter, Download conference paper PDF

Fast Hierarchical Depth Super-Resolution via Guided Attention

Multi-orientation depthwise extraction for stereo image super-resolution

Article 15 July 2023

Depth map super-resolution based on edge-guided joint trilateral upsampling

Article 09 February 2021

Keywords

1 Introduction

Depth super-resolution is one of important research topics in the image processing and computer vision field. In practical applications, since depth information is always captured in a low resolution, depth images have to be interpolated to the full size corresponding to the texture images. For example, the resolution of depth image captured by the Swiss Range SR4000 is only QCIF format ($176\times 144$ pixels). Even for Kinect, the resolution of captured depth image is only $640\times 480$ ($512\times 424$ for Kinect v2), which is much lower than its corresponding color image ($1920\times 1080$ for Kinect v2). Hence, interpolation and other image enhancement techniques are essential to improve the resolution and quality of depth images. For the applications like 3D viewpoint reconstruction, action recognition and object detection [28], high resolution and accuracy depth information can help to improve the system performance.

Reconstructing High Resolution (HR) images from Low Resolution (LR) images is an ill-posed inverse problem [14, 36]. It is difficult to produce high quality. Nevertheless, for depth super-resolution, it could be slightly easier because depth image has more homogeneous regions and more similar structures than the natural images. Generally speaking, researches on depth super-resolution can be divided into two categories: single depth super-resolution and depth super-resolution with multiple images. For single depth super-resolution, depth maps are directly interpolated into the full size of the corresponding color images, without other side information. Consequently, depth super-resolution is equivalent with the general image super-resolution, some classical interpolation filters including bi-linear, bi-cubic interpolation filter can be used. However, since filter-based method rarely consider the property of depth maps, i.e. the importance of edges, the performance of the depth super-resolution is largely limited. Therefore, to preserve depth edges in the interpolation process, optimization based methods which regard depth super-resolution as a Markov Random Field (MRF) or least squares optimization problem are proposed. Kim et al. proposed a novel MRF-based depth super-resolution method taking the noise characteristics of depth map into account [13]. Zhu et al. further extended the traditional spatial MRF by considering temporal coherence [39]. In [6], depth super-resolution was formulated as a convex optimization problem which utilizes anisotropic total generalized variation. Then, patch-based features in a depth map were employed to optimize depth super-resolution. [9] proposed to exploit the self-similar patch in the rigid body to reconstruct the high resolution depth maps. In [16], depth edges were preserved in the interpolation process by adding geometric constraints from self-similar structures. In [4, 7, 11, 33, 35], sparse representation of depth image patches were introduced by imposing a locality constraint. Unfortunately, the performance of the above methods could be limited due to the failure of establishing patch correspondences either from the external dataset or within the same depth map, which leads to edge artifacts between patches or incorrect depth pattern estimation.

To eliminate the edge artifacts after depth super-resolution operation, the corresponding color images are utilized as geometric constraints [20]. A classical method is to apply bilateral filter to enhance depth quality [24, 30], in which color information are jointly utilized as weights of bilateral filter. In [3] and [32], color images were directly used to guide the depth image super-resolution. In [17] and [18], the edge and structure similarity between depth images and color images are considered for depth images up-sampling. Park et al. extended the nonlocal mean filtering with an edge weighting term in [23]. Xie et al. proposed an edge-guided depth super-resolution, which produces sharp edges [34]. Then, Yang et al. proposed to use multiple views to assist the depth super-resolution in [37]. Choi et al. proposed a region segmentation based method to tackle the texture-transfer and depth-bleeding artifacts in [2]. Recently, some convolutional neural network based depth super-resolution methods are also proposed to learn the texture-depth mapping [19, 31], however, these methods still need to use the classical interpolation methods to obtain HR depth in the first stage, ignoring the relations between LR-depth and HR-texture. And most importantly, only depth distortion is considered which is similar as the learning based super-resolution method.

In fact, for most depth based applications, depth images are generally not provided for watching but for enhancing the applicability. For instance, in 3D video framework, depth images are used to assist virtual views synthesis, instead of being watched by users. Hence, integrating view synthesis quality into depth super-resolution problem is necessary. Jin et al. design a natural image super-resolution framework, in which depth images are utilized to synthesize the image, and the synthesis artifacts are used as a criteria to guide the image super-resolution [12]. [10] introduces the difference between color image and synthesized images as a regularization term for depth super-resolution. In [17], the fractal dimension and texture-depth boundary consistencies are jointly considered in depth super-resolution.

In this paper, we present a depth super-resolution method based on the relations between HR and LR depth patches. Considering the sharpness of depth edge, the LR-depth patches are firstly clustered based on their edge orientations into different edge-orientation classes. Here, the edge-orientation feature is extracted based on our designed gradient operators, in which the edge strength and direction are employed as the basis for the LR patches cluster. Then for each edge-orientation class, a class-dependent linear mapping function is learned using LR-HR patch pairs. Moreover, the view synthesis distortion is integrated into the linear mapping learning process. Therefore, depth super-resolution problem is formulated as view synthesis distortion driven linear mapping learning optimization. Experimental results show that our proposed depth super-resolution method achieves superior performance for the synthesized virtual view compared with other depth super-resolution approaches.

The paper is organized as follows: Sect. 2 describes the proposed depth super-resolution framework, along with details. Section 3 shows the settings of our experiment and the performance of our proposed approach. At last, we conclude this paper in Sect. 4.

2 Methodology

Typically, learning-based image super-resolution aims to learn a linear mapping relation between LR-HR patch pairs. For example, as shown in Fig. 1,

$$\begin{aligned} \mathbf {x = My}, \end{aligned}$$

(1)

where $y \in \mathbb {R}^m$, $x \in \mathbb {R}^n$, $m \le n$, $\mathbf {M}$ is a linear mapping operator.

For LR-to-HR conversions, the linear mapping between LR-HR pairs should be learned. Specifically, the LR image $\mathbf {I_l}$ is denoted as

$$\begin{aligned} \mathbf {I_l} = \{l_i\}_{i=1}^{N}, \end{aligned}$$

(2)

where $l_i$ is $i-$th LR image patch, N is the total number of depth image patches in $\mathbf {I_l}$. Similarly, the HR image is also represented as

$$\begin{aligned} \mathbf {I_h} = \{ h_i \}_{i=1}^{N}. \end{aligned}$$

(3)

Then, the LR-HR patch pairs are classified into different classes based on the specified rule. U is the number of the classes, $N_{j}$ is the number of patches which belong to $j-$th class, $\sum _{j=1}^U N_j = N$. For each class j, the linear mapping can be learned from an error minimization equation as follows [14]

$$\begin{aligned} M_j = \arg \min _{M_j} \sum _{j=1}^{N_{j}} \Vert h_i^j - M_j\cdot l_i^j\Vert _2^2 + \lambda \Vert M_j\Vert _F^2, \end{aligned}$$

(4)

where $h_i^j, l_i^j$ are the concatenate matrix of the vectorized HR, LR patches which belong to the $j-$th class, $M_j$ is the mapping kernel of the $j-$th class. $\Vert M_j\Vert _F^2$ is a regularization term with Frobenius norm which can prevent overfitting, and $\lambda $ is a penalty factor which is empirically set to 1 in general. Therefore, the learning-based image super-resolution is expressed as a multivariate regression problem. The goal of the regression is to minimize the Mean Squared Error (MSE) between the ground-truth HR patches and the patches which are interpolated from the corresponding LR patches,

$$\begin{aligned} J = \min \frac{1}{N_{j}} \sum _{j=1}^N \Vert h_i^j - M_j\cdot l_i^j\Vert _2^2. \end{aligned}$$

(5)

Nevertheless, depth images are different from the natural images, they are just used to assist various applications, i.e. view synthesis, object recognition and action recognition etc. Hence, the goal of depth image super-resolution should be different from the traditional image super-resolution. In this work, we assume that depth images are used for view synthesis. In the following of this paper, we are addressing the depth image super-resolution problem in the framework of view synthesis.

2.1 Depth Patches Classification Based on Edge Orientation

The published super-resolution methods [1, 27, 29] use edge-orientation information to implement the LR-to-HR interpolation for texture images. However, since color images usually poss very complicated texture, the edge-orientation information is difficult to extract for patch clustering. Compared to texture images, depth images represent the distance between the camera and objects in a scene and generally have more homogeneous regions and sharp edges, without much texture. Consequently, edge information in depth images is vivid and the corresponding features are easily extracted. Motivated by this observation, we design a new edge-orientation feature based on the above conventional learning-based super-resolution scheme to learned the mapping between the LR patches and the HR patches, which aims to preserve the depth edge in the LR-to-HR conversion process.

To find edge orientation of depth LR image patches, we employ two simple gradient operators as

$$\begin{aligned} K_h = [1~ -1] \text { and } K_v = \left[ \begin{array}{c} 1\\ -1 \end{array}\right] \end{aligned}$$

(6)

where $K_h$ and $K_v$ indicate horizontal and vertical gradient operators, respectively. Here, considering that (5) calculates the pixel-level statistical error between the interpolated patches and the ground-truth patches, we take the pixel variations as the basis for patch classification. In theory, LR depth patches with similar gradient variations between adjacent-pixel pairs are likely to share similar linear mappings in LR-to-HR conversions.

For demonstration, let us take a $2\times 2$ LR depth patch as an example, which is

$$\begin{aligned} P = \left[ \begin{array}{cc} p_{1,1}\; p_{1,2}\\ p_{2,1}\; p_{2,2} \end{array}\right] . \end{aligned}$$

(7)

The edge orientation is determined in terms of the edge strength and edge directions. Both of two operators $K_h$ and $K_v$ are applied to obtain the horizontal and vertical edge strength, as

$$\begin{aligned} \begin{array}{c} g_h = K_h *P \\ g_v = K_v *P, \end{array} \end{aligned}$$

(8)

where $*$ indicates the convolutional operator. $g_h$ and $g_v$ are horizontal and vertical gradients, respectively. Then, the edge strength and edge direction can be computed as

$$\begin{aligned} \begin{array}{c} S = \sqrt{g_h^2 + g_v^2} \\ \phi = \tan ^{-1}(\frac{g_h}{g_v})+\frac{\pi }{2} \end{array} \end{aligned}$$

(9)

where S indicates the edge strength and $\phi $ is the edge direction for the given LR depth patch.

To correctly distinguish the edge and the homogeneous parts of depth images, we set a threshold T to constrain the edge strength. When the edge strength is lower than T, the corresponding regions in depth patch are regarded as the homogeneous regions. Then,

$$\begin{aligned} S = \left\{ \begin{array}{cc} S &{} \text {if } S > T \\ 0 &{} \text {otherwise} \end{array}\right. \end{aligned}$$

(10)

The edge direction in (9) takes on values from 0 to $2\pi $. Note that, each edge direction and its opposite direction can be seen as the same edge direction. Therefore, we map the direction information to a range of value $[0,\pi ]$ obtaining a new field

$$\begin{aligned} \hat{\phi }=\left\{ \begin{array}{cc} \phi &{} 0 \le \phi< \pi \\ \phi -\pi &{} \pi \le \phi < 2\pi \end{array}\right. \end{aligned}$$

(11)

Finally, the edge orientation feature can be represented using the following formula

$$\begin{aligned} \mathbf {\Phi } = S e^{j\hat{\phi }}. \end{aligned}$$

(12)

The calculated feature points for the pre-given LR depth patches are clustered into different classes using K-means [8].

2.2 Depth Super-Resolution in View Synthesis

View synthesis technique is often employed to generate the extra virtual viewpoints in 3D video system [12]. In this framework, depth images are used to describe the distance between the camera and objects in a scene. Based on depth information, the virtual view images are synthesized by applying DIBR [5]. Consequently, depth images are only a sort of supplement data for view synthesis rather than an independent image data. The quality of depth images would not linearly affect the quality of the synthesized view images, and the relation varies according to its corresponding texture image information as mentioned in [21, 22]. Thereby, in the learning based depth super-resolution problem, the goal of regression should consider the property of depth images in view synthesis. Instead, the distortion of the synthesized view introduced by the possible depth distortion in super-resolution process can be integrated, which is written as

$$\begin{aligned} \begin{array}{lll} SSD &{} = &{} \sum |V-\tilde{V}|^2 \\ &{} = &{} \sum |f_w(C,D) - f_w(C,\tilde{D})|^2, \end{array} \end{aligned}$$

(13)

where C and V indicate the texture images and its virtually synthesized view, respectively. D is the ground-truth full size depth images, and $\tilde{D}$ denotes the corresponding interpolated HR depth images. For the synthesized view, it is synthesized based on C and D by the pre-defined warping function, $f_w$.

Based on (5) and (13), the goal of the learning-based depth super-resolution problem can be expressed as a view synthesis distortion minimization problem, as

$$\begin{aligned} \begin{array}{lll} J &{} = &{} \sum \nolimits _{M} |V-\tilde{V}|^2 \\ &{} = &{} \sum \nolimits _{M} |f_w(C,D) - f_w(C,\tilde{D})|^2, \\ &{}\text {where}&{} \tilde{D} = M\cdot d \end{array} \end{aligned}$$

(14)

here, M denotes the learned mapping functions.

To further simplify this distortion, following [22], (13) can be approximately written as

$$\begin{aligned} \begin{array}{lll} SSD &{} = &{} \sum \nolimits _{\forall (x,y)} |f_w(C,D) - f_w(C,\tilde{D})|^2 \\ &{} \approx &{} \sum \nolimits _{\forall (x,y)} | C_{x,y} - C_{x-\triangle p(x,y),y} |^2 , \end{array} \end{aligned}$$

(15)

where (x, y) represents pixel position, and $\triangle p$ denotes the translational rendering position, which has been proven to be proportional to the depth image error

$$\begin{aligned} \triangle p(x,y) = \alpha \cdot (D_{x,y} - \tilde{D}_{x,y}), \end{aligned}$$

(16)

where $\alpha $ is a proportional coefficient determined by the following equation

$$\begin{aligned} \alpha = \frac{f\cdot L}{255} \cdot \left( \frac{1}{Z_{near}}-\frac{1}{Z_{far}}\right) \end{aligned}$$

(17)

here, f is the focal length and L is the baseline between the current view and the synthesized view. $Z_{near}$ and $Z_{far}$ are the values of the nearest and the farthest depth of the scene, respectively. Therefore, (14) can be further simplified according to [22] as

$$\begin{aligned} J \approx \sum \limits _{\forall (x,y)} \left[ |\triangle p(x,y)|\frac{|C_{x,y}-C_{x-1,y}|+|C_{x,y}-C_{x+1,y}|}{2}\right] ^2. \end{aligned}$$

(18)

Finally, to learn the linear mapping from LR examples to the HR examples, for depth images (4) can be rewritten based on (18) as

$$\begin{aligned} \begin{array}{ll} M_j = \arg \min \limits _{M_j}&{} \Vert \left[ \alpha (D_i^j - M_j d_i^j)\right] \frac{|C_{x,y}-C_{x-1,y}|+|C_{x,y}-C_{x+1,y}|}{2} \Vert ^2_2\\ &{} + \lambda \Vert M_j\Vert _F^2 , \end{array} \end{aligned}$$

(19)

where $D_i^j$ denotes the HR depth patches which belong to the same class j, and $d_i^j$ are for the LR depth patches of class j. This is known as multi-variate regression, and according to [38], this optimization problem can be approximately solved as

$$\begin{aligned} \begin{array}{ll} M_j =&\alpha ^2 A^TA D_i^j{d_i^j}^T \left( d_i^j{d_i^j}^T+\lambda \mathbf {I}\right) ^{-1}, \end{array} \end{aligned}$$

(20)

where $A = \frac{|C_{x,y}-C_{x-1,y}|+|C_{x,y}-C_{x+1,y}|}{2}$, $\mathbf {I}$ is the identity matrix. Based on (20), the linear mapping can be learned off-line and used to reconstruct HR patches for class j. The complete training process are summarized in Algorithm 1.

Based on the calculated linear mappings $M_j$ for each class j, the given LR depth images for testing are firstly divided into a set of LR patches with size $2\times 2$. Then, using (9) and (12), the edge-orientation feature of each patch $\mathbf {\Phi _p}$ can be calculated and matched with the cluster centers $\mathbf {\Phi _c}$. The matching procedure of edge-orientation class can be described as searching the minimal distance between the given LR depth patch and each cluster centers, and the distance metric is

$$\begin{aligned} d = sin(|\mathbf {\Phi _p} - \mathbf {\Phi _c}|), \end{aligned}$$

(21)

which is based on the Sine of the local angular distance. At last, the corresponding linear mapping can be found. The super-resolution phase is summarized in Algorithm 2.

3 Experimental Results

In this section, the proposed depth super-resolution method is compared with the other 3 depth super-resolution methods, which include a filter-based method which is joint bilateral up-sampling algorithm (JBU) [15], a guidance information assistant method called the color-based depth up-sampling method (CBU) [32] and a learning-based method which is edge-guided depth super-resolution method (EDU) [34]. To train the linear mappings, the depth images from 17 image pairs in the Middlebury Stereo dataset [25] are used. Each image pair consists of 2 views (left and right views, and the corresponding texture and depth image pairs) taken under several different illuminations and exposures. For testing, the realistic depth images from MPEG Standardization Test Dataset are applied to evaluate the performance of depth super-resolution, which include “Newspaper”, “Balloons”, “Kendo”, “Dancer”, “Poznan$\_$hall2” and “Poznan$\_$street”. The details about the test sequences are shown in Table 1. For both of training and testing, the depth images are down-sampled with the scale factor is 2 by using the “Bicubic” filter. The results are evaluated in PSNR for quality assessments. To evaluate the view synthesis performance, the given depth images from two different views are firstly down-sampled and then up-sampled by using different depth super-resolution methods, in prior to view synthesis. The standard software VSRS 3.5 [26] is employed to generate the synthesized views by using the interpolated depth images and the corresponding texture images. Moreover, the ground-truth depth images are used as reference.

Table 1. Details of test dataset.

Full size table

For quantitative evaluations, we firstly evaluate the depth super-resolution results on test dataset. Table 2 lists the objective quality of depth super-resolution for each view in the test dataset. As reported in Table 2, the objective quality of the proposed depth super-resolution is limited, because the designed target function (19) is not for minimizing the distortion between up-sampled depth images and the ground-truth ones, as [34]. But the PSNR values of up-sampled depth images obtained by using the proposed method are also near to the other benchmark baselines. By evaluating the synthesized view quality, as shown in Table 3, the proposed depth super-resolution method performs much better than the other 3 methods. Compared with JBU and EDU which both utilize the edge information to guide depth super-resolution without employing the color information, the average PSNR gain on synthesize quality is near to 2 dB. Considering the synthesize distortion as (18), the color information should be considered. Thereby, CBU method shows a good performance on synthesize quality, but the average PSNR is still near to 1.2 dB lower than the proposed method.

Table 2. Objective quality of depth super-resolution.

Full size table

Table 3. Objective quality of synthesized views by using interpolated depth images with scale factor 2.

Full size table

We also evaluate our proposed method visually in Figs. 2, 3, 4 and 5. The depth visual results are shown in Fig. 2. Note that, not all interpolated depth images by using the baseline methods are shown in Fig. 2 due to the space limitation. Refer to Table 2, we select several depth images which are generated using the baseline methods, to compare with the generated ones by using our proposed method. Visually, the proposed method focus on the transition regions between foreground objects and background, which means that not all edges would be preserved in the super-resolution process. In comparison, JBU [15], CBU [32] and EDU [34] introduce the edge guidance information from texture images or self depth image to optimize the depth super-resolution. Thereby, the texture/depth edge would be sharped. Moreover, Figs. 3, 4 and 5 show the synthesized views by utilizing the interpolated depth images, and some details are shown with zoomed cropped regions. To clearly distinguish the differences of synthesized views, we select the visual results based on Table 3. EDU [34] method has the best objective quality, so the subjective comparison is mainly between EDU [34] and our proposed method. The red circle lines in Figs. 3, 4 and 5 shows the comparison regions between the EDU method [34] and the proposed method.

4 Conclusion

In this paper, we present a depth super-resolution method based on the linear mapping relations between HR and LR depth patch pairs. Motivated by the idea that depth images are not directly watched by viewers, just for assisting different vision tasks, we convert the traditional super-resolution problem as view synthesis driven depth super-resolution optimization. We design an edge-orientation feature based learning method to learn the possible linear mappings, and interpolate the LR depth image to HR version by utilizing the learned mappings. In a realistic test dataset, our proposed method can generate the synthesized views with competitive quality in terms of PSNR, compared to the other depth super-resolution methods.

References

Choi, J.S., Kim, M.: Super-interpolation with edge-orientation-based mapping kernels for low complex $2\times $ upscaling. IEEE Trans. Image Process. 25(1), 469–483 (2016)
Article MathSciNet Google Scholar
Choi, O., Jung, S.W.: A consensus-driven approach for structure and texture aware depth map upsampling. IEEE Trans. Image Process. 23(8), 3321–3335 (2014)
Article MathSciNet Google Scholar
Deng, H., Yu, L., Qiu, J., Zhang, J.: A joint texture/depth edge-directed up-sampling algorithm for depth map coding. In: 2012 IEEE International Conference on Multimedia and Expo (ICME), pp. 646–650. IEEE (2012)
Google Scholar
Dong, Y., Lin, C., Zhao, Y., Yao, C., Hou, J.: Depth map up-sampling with texture edge feature via sparse representation. In: Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2016)
Google Scholar
Fehn, C.: Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV. In: Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291, pp. 93–105. International Society for Optics and Photonics (2004)
Google Scholar
Ferstl, D., Reinbacher, C., Ranftl, R., Rüther, M., Bischof, H.: Image guided depth upsampling using anisotropic total generalized variation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 993–1000 (2013)
Google Scholar
Guo, C., Li, C., Guo, J., Cong, R., Fu, H., Han, P.: Hierarchical features driven residual learning for depth map super-resolution. IEEE Trans. Image Process. 28(5), 2545–2557 (2019)
Article MathSciNet Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
MATH Google Scholar
Hornácek, M., Rhemann, C., Gelautz, M., Rother, C.: Depth super resolution by rigid body self-similarity in 3D. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Hu, W., Cheung, G., Li, X., Au, O.: Depth map super-resolution using synthesized view matching for depth-image-based rendering. In: 2012 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 605–610. IEEE (2012)
Google Scholar
Jiang, Z., Hou, Y., Yue, H., Yang, J., Hou, C.: Depth super-resolution from RGB-D pairs with transform and spatial domain regularization. IEEE Trans. Image Process. 27(5), 2587–2602 (2018)
Article MathSciNet Google Scholar
Jin, Z., Tillo, T., Yao, C., Xiao, J., Zhao, Y.: Virtual-view-assisted video super-resolution and enhancement. IEEE Trans. Circ. Syst. Video Technol. 26(3), 467–478 (2016)
Article Google Scholar
Kim, D., Yoon, K.J.: High-quality depth map up-sampling robust to edge noise of range sensors. In: 2012 19th IEEE International Conference on Image Processing (ICIP), pp. 553–556. IEEE (2012)
Google Scholar
Kim, K.I., Kwon, Y.: Single-image super-resolution using sparse regression and natural image prior. IEEE Trans. Pattern Anal. Mach. Intell. 32(6), 1127–1133 (2010)
Article Google Scholar
Kopf, J., Cohen, M.F., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ACM Trans. Graph. (ToG) 26(3), 96 (2007)
Article Google Scholar
Li, J., Lu, Z., Zeng, G., Gan, R., Zha, H.: Similarity-aware patchwork assembly for depth image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3374–3381 (2014)
Google Scholar
Liu, M., Zhao, Y., Liang, J., Lin, C., Bai, H., Yao, C.: Depth mapup-samplingwith fractal dimension and texture-depth boundary consistencies. Neurocomputing 257, 185–192 (2017)
Article Google Scholar
Liu, M.Y., Tuzel, O., Taguchi, Y.: Joint geodesic upsampling of depth images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 169–176 (2013)
Google Scholar
Liu, X., Zhai, D., Chen, R., Ji, X., Zhao, D., Gao, W.: Depth super-resolution via joint color-guided internal and external regularizations. IEEE Trans. Image Process. 28(4), 1636–1645 (2019)
Article MathSciNet Google Scholar
Ni, M., Lei, J., Cong, R., Zheng, K., Peng, B., Fan, X.: Color-guided depth map super resolution using convolutional neural network. IEEE Access 5, 26666–26672 (2017)
Article Google Scholar
Oh, B.T., Lee, J., Park, D.S.: Depth map coding based on synthesized view distortion function. IEEE J. Sel. Top. Sign. Process. 5(7), 1344–1352 (2011)
Article Google Scholar
Oh, B.T., Oh, K.J.: View synthesis distortion estimation for AVC-and HEVC-compatible 3-D video coding. IEEE Trans. Circ. Syst. Video Technol. 24(6), 1006–1015 (2014)
Article Google Scholar
Park, J., Kim, H., Tai, Y.W., Brown, M.S., Kweon, I.: High quality depth map upsampling for 3D-TOF cameras. In: IEEE International Conference on Computer Vision (ICCV), pp. 1623–1630. IEEE (2011)
Google Scholar
Riemens, A., Gangwal, O., Barenbrug, B., Berretty, R.P.: Multistep joint bilateral depth upsampling. In: Visual Communications and Image Processing 2009, vol. 7257, p. 72570M. International Society for Optics and Photonics (2009)
Google Scholar
Scharstein, D., et al.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 31–42. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_3
Chapter Google Scholar
Tanimoto, M., Fujii, T., Suzuki, K.: View synthesis algorithm in view synthesis reference software 3.5 (VSRS3. 5) document M16090, ISO/IEC JTC1/SC29/WG11 (MPEG) (2009)
Google Scholar
Timofte, R., De, V., Van Gool, L.: Anchored neighborhood regression for fast example-based super-resolution. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 1920–1927. IEEE (2013)
Google Scholar
Tsai, C.Y., Tsai, S.H.: Simultaneous 3D object recognition and pose estimation based on RGB-D images. IEEE Access 6, 28859–28869 (2018)
Article Google Scholar
Wang, L., Xiang, S., Meng, G., Wu, H., Pan, C.: Edge-directed single-image super-resolution via adaptive gradient magnitude self-interpolation. IEEE Trans. Circ. Syst. Video Technol. 23(8), 1289–1299 (2013)
Article Google Scholar
Wang, Y., Ortega, A., Tian, D., Vetro, A.: A graph-based joint bilateral approach for depth enhancement. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 885–889. IEEE (2014)
Google Scholar
Wen, Y., Sheng, B., Li, P., Lin, W., Feng, D.D.: Deep color guided coarse-to-fine convolutional network cascade for depth image super-resolution. IEEE Trans. Image Process. 28(2), 994–1006 (2019). https://doi.org/10.1109/TIP.2018.2874285
Article MathSciNet MATH Google Scholar
Wildeboer, M.O., Yendo, T., Tehrani, M.P., Fujii, T., Tanimoto, M.: Color based depth up-sampling for depth compression. In: Picture Coding Symposium (PCS), pp. 170–173. IEEE (2010)
Google Scholar
Xie, J., Chou, C.C., Feris, R., Sun, M.T.: Single depth image super resolution and denoising via coupled dictionary learning with local constraints and shock filtering. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2014)
Google Scholar
Xie, J., Feris, R.S., Sun, M.T.: Edge-guided single depth image super resolution. IEEE Trans. Image Process. 25(1), 428–438 (2016)
Article MathSciNet Google Scholar
Xie, J., Feris, R.S., Yu, S.S., Sun, M.T.: Joint super resolution and denoising from a single depth image. IEEE Trans. Multimedia 17(9), 1525–1537 (2015)
Article Google Scholar
Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)
Article MathSciNet Google Scholar
Yang, Y., Gao, M., Zhang, J., Zha, Z., Wang, Z.: Depth map super-resolution using stereo-vision-assisted model. Neurocomputing 149, 1396–1406 (2015)
Article Google Scholar
Yao, C., Xiao, J., Tillo, T., Zhao, Y., Lin, C., Bai, H.: Depth map down-sampling and coding based on synthesized view distortion. IEEE Trans. Multimedia 18(10), 2015–2022 (2016)
Article Google Scholar
Zhu, J., Wang, L., Gao, J., Yang, R.: Spatial-temporal fusion for high accuracy depth maps using dynamic mrfs. IEEE Trans. Pattern Anal. Mach. Intell. 32(5), 899–909 (2010)
Article Google Scholar

Download references

Acknowledgement

This research was supported in part by the National Key Research and Development Program of China (2016YFB0700502) National Natural Science Foundation of China (61873299, 61702036, 61572075).

Author information

Authors and Affiliations

Beijing Advanced Innovation Center for Materials Genome Engineering, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Chao Yao & Xiaojuan Ban
The Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
Jimin Xiao
Institute of Information Science, Beijing Jiaotong University, Beijing, 100044, China
Jian Jin

Authors

Chao Yao
View author publications
You can also search for this author in PubMed Google Scholar
Jimin Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Jian Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojuan Ban
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chao Yao or Xiaojuan Ban .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
The Australian National University, Canberra, Australia
Nick Barnes
Peking University, Peking, China
Baoquan Chen
The Technical University of Munich, München, Bayern, Germany
Rüdiger Westermann
Zhejiang University, Hangzhou, China
Xiangwei Kong
Beijing Jiaotong University, Beijing, China
Chunyu Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, C., Xiao, J., Jin, J., Ban, X. (2019). Edge Orientation Driven Depth Super-Resolution for View Synthesis. In: Zhao, Y., Barnes, N., Chen, B., Westermann, R., Kong, X., Lin, C. (eds) Image and Graphics. ICIG 2019. Lecture Notes in Computer Science(), vol 11903. Springer, Cham. https://doi.org/10.1007/978-3-030-34113-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-34113-8_10
Published: 28 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34112-1
Online ISBN: 978-3-030-34113-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Edge Orientation Driven Depth Super-Resolution for View Synthesis

Abstract

Similar content being viewed by others

Fast Hierarchical Depth Super-Resolution via Guided Attention

Multi-orientation depthwise extraction for stereo image super-resolution

Depth map super-resolution based on edge-guided joint trilateral upsampling

Keywords

1 Introduction

2 Methodology

2.1 Depth Patches Classification Based on Edge Orientation

2.2 Depth Super-Resolution in View Synthesis

3 Experimental Results

4 Conclusion

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Edge Orientation Driven Depth Super-Resolution for View Synthesis

Abstract

Similar content being viewed by others

Fast Hierarchical Depth Super-Resolution via Guided Attention

Multi-orientation depthwise extraction for stereo image super-resolution

Depth map super-resolution based on edge-guided joint trilateral upsampling

Keywords

1 Introduction

2 Methodology

2.1 Depth Patches Classification Based on Edge Orientation

2.2 Depth Super-Resolution in View Synthesis

3 Experimental Results

4 Conclusion

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation