Light Field Super-Resolution Using Cross-Resolution Input Based on PatchMatch and Learning Method

Zhao, Mandan; Hao, Xiangyang; Cheng, Chuanqi; Li, Jiansheng

doi:10.1007/978-981-10-7302-1_28

Mandan Zhao¹⁶,
Xiangyang Hao¹⁶,
Chuanqi Cheng¹⁶ &
…
Jiansheng Li¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 772))

Included in the following conference series:

CCF Chinese Conference on Computer Vision

2402 Accesses
1 Citations

Abstract

This paper addresses the problem of generating a super-resolution (SR) image from a low-resolution (LR) image assisted by nearby high-resolution (HR) image. The scaling factor of the super-resolved image is up to 8 times and even more, which is much larger than the ordinary super-resolution scaling factor. Combined patch match and learning based method for image super-resolution using a cross-resolution input. The method is used to super-resolve the images captured by a hybrid light field system consisting of a standard LF camera and a HR DSLR camera. We take the central high-resolution image as a reference, to deal with the around low-resolution images. Unlike other relative algorithm, the proposed method is exploited for the existence of large parallax between the captured images. The main process of our method is a combination of patch based (i.e., example based) algorithm and learning based (e.g., convolutional neural network) method, and does not require any calibration information. Experimental results show that our proposed method performs better than existing method on challenging scenes containing complex texture, specularity and large parallax. Both accuracy and visual improvements in our results are noticeable.

M. Zhao—Student author.

You have full access to this open access chapter, Download conference paper PDF

Downsampling consistency correction-based quality enhancement for CNN-based light field image super-resolution

Article 09 March 2024

LF-SAET: Cascaded Spatial-Angular-EPI Transformers for Light Field Image Super-Resolution

MPIN: a macro-pixel integration network for light field super-resolution

Article 13 October 2021

Keywords

1 Introduction

In recent years, light field imaging [5] becomes one of the most extensively used method for capturing the 3D appearance of a scene. It measures the spatial and angular variations in the intensity of light [18]. In the early years, light field cameras required expensive hardware, such as multi-camera arrays [30]. Although recently commercial and industrial light field cameras such as Lytro [6] and RayTrix [1] are introduced, they still suffer from restricted sensor resolution which hamper their widespread appeal and adoption. Light field cameras must make a trade-off between spatial and angular resolution.

In this paper, we propose a highly accurate multi-view super-resolution method based on the light-field system introduced in [28]. The key idea of the proposed synthesis is a combination of CNN and patch-based method and the method fully uses the correspondence between images in different views. The patch-based method finds the textural similarity between the high resolution image and the low resolution image. However, when dealing with challenging scenes such as large parallax, the correspondence of the high resolution image and the low resolution image is much lower. As a result, the patch-based method is not effective enough to deal with the edges of the objects and the specularity and results in ghosting and blurring. Instead of pairing correspondence between images, VDSR (Very Deep convolutional networks for Super Resolution [19]) super-resolves images in a single view and fully extract the features of the low-resolution image, so it can cover the shortage of the patch-based method. Our method combines these two methods and takes advantage of both. On the one hand, the method explores the similarity between images. On the other hand, dealing with challenging scenes effectively becomes possible.

Our key technical contributions are: (a) the proposed method is able to dispose challenge scenes, especially large parallax; (b) the method performs better than existing methods in accuracy; (c) the super-resolution process is simple and effective and results in less time cost. Experimental result demonstrates that the proposed method obviously improves the quality of the reconstructed high resolution light field.

2 Related Work

Light fields [5] provide a new angular dimension allowing for various visual applications such as light field display [29] and light field microscopy [21]. Many recent works try to capture or synthesize high quality light fields form different types of input data. Light field super-resolution, which aims to improve the spatial resolution of the light fields, is a hotspot of these works.

2.1 Single Image Super-Resolution Method Using CNNs

Image super-resolution using deep convolutional networks is first introduced in [14]. The method differs fundamentally from existing external example-based approaches, in that the method does not explicitly learn the dictionaries [31] or manifolds [4] for modeling the patch space. SuperResolution Convolutional Neural Network (SRCNN) is a representative method for deep-learning-based SR approach and has been used with large improvements in accuracy. Kim et al. [19] has proposed a simple yet effective training procedure that learns residuals only and performs better than SRCNN in accuracy, which is called VDSR. We train our model based on the algorithm illustrated in [19].

We notice that VDSR utilizes contextual information spread over large image regions and performs better than the patch-based algorithms in the position of the edge of the object and the regions full of complex texture. However, VDSR is based on the single image and it could not fully use the information of different views.

2.2 Light Field Super-Resolution

For single camera light-field, the disadvantage is low spatial resolution. Recently there are several methods composed to restore high frequency information.

Spatial Super-Resolution. To increase the spatial resolution, Bishop et al. [7] proposed a method to estimate both high resolution depth map and light in the Bayesian framework under the prior of Lambertian textural. Another method, patch matching based techniques, are widely used in image processing, such as texture synthesis [15], image completion [25], denoising [8], deblurring [12] and image super-resolution [16, 17]. Wanner and Goldluecke [28] introduced a hybrid imaging system using a patch-based algorithm. Cho et al. [11] explicitly model the calibration pipeline of Lytro cameras and propose a learning based interpolation method to obtain higher spatial resolution. However, the quality of the recovered light field images is not as good as that of the input high resolution images. The spatial high frequency details are lost in the super-resolution recovered images.

Angular Super-Resolution. To reconstruct novel views from sparse angular samples, some methods require the input to follow a specific pattern, or to be captured in a carefully designed way. For example, the work by Levin and Durand [20] takes in a 3D focal stack sequence and reconstructs the light field, using a prior based on the dimensionality gap. Shi et al. [24] leverage sparsity in the continuous Fourier spectrum to reconstruct a dense light field from 1D set of viewpoints. Marwah et al. [23] propose a dictionary-based approach to reconstruct light fields a coded 2D projection.

2.3 Hybrid Imaging

The idea of hybrid imaging was proposed in the context of motion deblurring [3], where a low resolution high speed video camera co-located with a high resolution still camera was used to deblur the blurred images. On the basis of that work, several examples of hybrid imaging have found utility in different applications. Cao et al. [9] propose a hybrid imaging system consisting of an RBG video camera and an LR multi-spectral camera to produce HR multi-spectral video using a co-located system. Another example of a hybrid imaging system is the virtual view synthesis system proposed by Tola et al. [26], where four regular video cameras and a time-of-flight sensor are used. They show that by adding the time-of-flight camera they could render better quality virtual views than just using a camera array with similar sparsity. Wang et al. [27] introduced another light-field attachment which combines a DSLR and 8 low-quality cameras around. They improve the accuracy of the super-resolved images but the algorithm of the synthesize is too complex and needs several times of iterations which limits the speed.

Accordingly, our method integrates patch-based method and VDSR, which makes use of the advantages of the two techniques.

3 Proposed Method

This section introduces the proposed patch-based method integrated convolutional networks for super-resolution of the side view images. The configuration is outlined in Fig. 1. Here is our basic idea: using patch-based method to fix the error of the image super-resolved by VDSR.

We consider an input of two images: a high-resolution image (the reference image which we denote as Ref), and a low-resolution image (we denote them as Src). The two images show the same scene in two different views, and the distance of the two views is 10 pixels in the light field.

3.1 Compute Initial Error

In this step, we aims to calculate an error map which presents the error of the image super-resolved by VDSR. As we known, the scaling factor is too large in our experiment’s setup. For the single image super-resolution, it cannot handle under these too large factor. So we draw into the reference high-resolution image.

By down-sampling the Ref by a factor of N, we obtain the image $R_{low}$ which is the same size as the Src. It is noted that the factor N is the result of the size of Ref divided by the size of Src. Then we super-resolve the $R_{low}$ by a factor of N using VDSR, and denote it as $R_{high}$. The initial error map is obtained by subtracting Ref and $R_{high}$.

$$\begin{aligned} R_{error}=Ref-R_{high} \end{aligned}$$

(1)

The very deep convolution network (VDSR) is inspired by [13]. This residual-learning network converges much faster than the standard CNN and gives a significant boost in performance.

In this paper, we note that the residual-learning network is not conflict with the error map between the high-resolution and low-resolution images. The residual-learning network is not the access to obtain the aim results. It is the more efficiency method.

3.2 Patch-Based Estimation

Now we get the error map between $R_{high}$ and Ref from the first step. In this step, we use the patch-based method based on the error map (which denoted as $R_{error}$) at the view of Ref. We adopt the available patch match-based super-resolution method which improves the algorithm in [28]. In this step, we first build the dictionary $D_{error}$ consisting of the extracted patches from the error map $R_{error}$. Then we extract patches from $R_{high}$ to build dictionary $D_{high}$. Low resolution features are computed from each of the patches in $D_{high}$ by down-sampling by a factor of N using the first and second order derivatives filters. The low resolution features are stored in dictionary $D_{low}$.

Gradient information can be incorporated into patch matching algorithms to improve accuracy when searching for similar patches. Chang et al. [10] use first- and second-order derivatives as features to facilitate matching. The PatchMatch based method also use first- and second-order gradients as the feature which is extracted from the low-resolution patches. The four 1-D gradient filters used to extract the features are:

$$\begin{aligned} {{g}_{1}}=\left[ -1,0,1 \right] , g_2=g_1^T \end{aligned}$$

(2)

$$\begin{aligned} {{g}_{3}}=\left[ 1,0,-2,0,1 \right] , g_4=g_3^T \end{aligned}$$

(3)

where the superscript “T” denotes transpose. For a low-resolution patch l, filter $\left\{ {{g}_{1}},{{g}_{2}},{{g}_{3}},{{g}_{4}} \right\} $ are applied and feature $f_l$ is represented as concatenation of the vectorized filter outputs.

To super-resolve Src, the features $f_{j}$, which are calculated from each patch $l_j$ of Src, are used to match. The 9 nearest neighbors in $D_{low}$ with the smallest $L_2$ distance from $f_{j}$ are computed. These 9 nearest neighbors in $D_{low}$ (denoted as {$f_{ref,k}^{j}$}$_{k=1}^{9}$) correspond to 9 HR patches in $D_{high}$ and these 9 HR patches maps 9 error patches in $D_{error}$ (denoted as {$e_{ref,k}^{j}$}$_{k=1}^{9}$). Then the reconstruction weights motivated from [28] are calculated. The estimated error patches $\hat{e_j}$ corresponding to $l_{j}$ is estimated by:

$$\begin{aligned} \hat{e_j}=\frac{\sum _{k=1}^9 w_{k}e_{ref,k}^{j}}{\sum _{k=1}^9 w_{k}},w_{k}=exp\frac{-{||f_j-f_{ref,k}^j||}^2}{2{\sigma }^2} \end{aligned}$$

(4)

So we get an error image (which denoted as $S_{error}$) at the view of Src with the sum of similarity weighted error patches from the dictionary $D_{error}$. We follow the same parameter setting in [28]. The patch size of high resolution patches is $64\times 64$ and the patch size of low resolution patches is determined by the factor N. The $S_{error}$ ,which indicates the error of VDSR method at the view of Src, has the same size of the high-resolution image Ref.

3.3 Integrated Super-Resolution

We have got the error map $S_{error}$ at the view of Src, which indicates the error of VDSR method. So in this step, we will integrate two proposed method to fully use the correspondence of the images of different views. Firstly, Src is super-resolved by VDSR and we denote the result as $S_{cnn}$. Then we add $S_{cnn}$ and $S_{error}$. In this way, the defect of the VDSR super-resolved image is made up by the $S_{error}$. The final result is generated.

Here we explain why we compute $S_{error}$ in the process of the synthesis. Patch-based method finds the textural similarity between the HR image and LR image, and the super-resolved image is a sum of the HR patches. At the scene of large parallax, the edges of the objects in image are quite different so the patch-based method results in blur. Also, the specularity cannot be restored well. VDSR can cover the shortage of the patch-based method due to the single view information. We combine two methods to take advantage of both.

4 Experimental Results

We evaluate the performance of our proposed method for side views and dense light field rendering on the Standford light field dataset [2] in several different scenes, including challenging scenes such as complex textures, specularity and large parallax.

Table 1. The superresolve scale is ${\times }4$.

Full size table

Table 2. The superresolve scale is $\times 8$.

Full size table

4.1 Experiment Setup

For Standford data set, we select 9 views from each light field with similar layout to the light-field attachment. To make the scene challenging, we select the side view image with $d =10$ in 8-adjacency distance from the central view. We evaluate our method in two different scales: $\times $4, $\times $8. The input low resolution side view images are obtained by down-sampling of each image with these two factors, and the original high resolution images can act as ground truth. For patch-based super-resolution, we follow the same setting up with [28]. For VDSR, we set the initial training parameters the same as [13]. In the end, we also text several microscope light field datasets, e.g. provided by Lin et al. [22].

4.2 Super-Resolution Results

We evaluate our method on all light fields in the dataset [2]. The PSNR values of the patch-based super resolution images, VDSR images and our super-resolution images of several listed scenes are shown in Tables 1 and 2. It is noticed that the PSNRs of our method are higher than those of patch-based method and VDSR method, reflecting in both two scales. It is due to the fact that our method fully use the correspondence of the images in different views and takes advantages of two kinds of synthesis.

Figures 2 and 3 illustrates some super-resolution patches cropped the simulations. It is obvious that the patches of out method contain better high frequency details than those of patch-based method. The patch-based method results in blurring and our method alleviates this mistake.

The results of the microscope light field are also presented in the Fig. 4. The top of the results is the Cells dataset, which be all in a muddle situation. The bottom of the results is the fly compound eye. These two typical microscope light field are unstructure and dusky. So the super-resolved results of our method are more similarity to the groundtruth (Table 3).

Table 3. The superresolve scale is $\times 4$ in the microscope light field datasets.

Full size table

The run-time for our proposed algorithm is about 3 min per picture. The algorithm was implemented in C++ without optimization on an Intel i7 fourth generation processor with 32 GB of RAM. Compared to the synthesis in [27], the speed is much faster.

5 Conclusion

In this work, we proposed a highly accurate multi-view super-resolution method which is used to super-resolve the images captured by light field system. The main process of our method is a combination of patch based algorithm and convolutional neural network. Our method performs better in accuracy than existing method on challenging scenes containing complex texture, specularity and large parallax, while costing less time. Experimental result demonstrates that the proposed method obviously improved the quality of the reconstructed high resolution light field.

In the feature, we would like to utilize the natural property of the light field, which we will reach a better super-resolved results. Besides, some applications should be extended, such as the depth estimation, images sequence interpolation, and so on.

References

RayTrix: 3D light field camera technology. http://www.raytrix.de/
The (new) Standford light field archive. http://www.raytrix.de/
Ben-Ezra, M., Nayar, S.K.: Motion deblurring using hybrid imaging. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 657–664 (2003)
Google Scholar
Bevilacqua, M., Roumy, A., Guillemot, C., Morel, A.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: BMVC (2012)
Google Scholar
Bishop, T.E., Favaro, P.: The light field camera: extended depth of field, aliasing, and superresolution. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 972–986 (2012)
Article Google Scholar
Bishop, T.E., Favaro, P.: The light field camera: extended depth of field, aliasing, and superresolution. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 972 (2012)
Article Google Scholar
Bishop, T.E., Zanetti, S., Favaro, P.: Light field superresolution. In: 2009 IEEE International Conference on Computational Photography (ICCP), pp. 1–9. IEEE (2009)
Google Scholar
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 60–65. IEEE (2005)
Google Scholar
Cao, X., Tong, X., Dai, Q., Lin, S.: High resolution multispectral video capture with a hybrid camera system. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 297–304 (2011)
Google Scholar
Chang, H., Yeung, D.Y., Xiong, Y.: Super-resolution through neighbor embedding. Vis. Pattern Recognit. 1, 275–282 (2004)
Google Scholar
Cho, D., Lee, M., Kim, S., Tai, Y.W.: Modeling the calibration pipeline of the Lytro camera for high quality light-field image reconstruction. In: ICCV (2013)
Google Scholar
Cho, S., Wang, J., Lee, S.: Video deblurring for hand-held cameras using patch-based synthesis. ACM Trans. Graph. (TOG) 31(4), 64 (2012)
Article Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)
Article Google Scholar
Efros, A.A., Freeman, W.T.: Image quilting for texture synthesis and transfer. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 341–346. ACM (2001)
Google Scholar
Freedman, G., Fattal, R.: Image and video upscaling from local self-examples. ACM Trans. Graph. (TOG) 30(2), 12 (2011)
Article Google Scholar
Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Comput. Graph. Appl. 22(2), 56–65 (2002)
Article Google Scholar
Ihrke, I., Restrepo, J., Mignard-Debise, L.: Principles of light field imaging: briefly revisiting 25 years of research. IEEE Signal Process. Mag. 33(5), 59–69 (2016)
Article Google Scholar
Kim, J., Kwon Lee, S., Mu Lee, K.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)
Google Scholar
Levin, A., Durand, F.: Linear view synthesis using a dimensionality gap light field prior. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2010), pp. 1831–1838. IEEE (2010)
Google Scholar
Levoy, M., Ng, R., Adams, A., Footer, M., Horowitz, M.: Light field microscopy. ACM Trans. Graph. (TOG) 25(3), 924–934 (2006)
Article Google Scholar
Lin, X., Wu, J., Zheng, G., Dai, Q.: Camera array based light field microscopy. Biomed. Opt. Express 6(9), 3179–3189 (2015)
Article Google Scholar
Marwah, K., Wetzstein, G., Bando, Y., Raskar, R.: Compressive light field photography using overcomplete dictionaries and optimized projections. ACM Trans. Graph. (TOG) 32(4), 46:1–46:12 (2013)
Article MATH Google Scholar
Shi, L., Hassanieh, H., Davis, A., Katabi, D., Durand, F.: Light field reconstruction using sparsity in the continuous fourier domain. ACM Trans. Graph. (TOG) 34(1), 12:1–12:13 (2014)
Article MATH Google Scholar
Sun, J., Yuan, L., Jia, J., Shum, H.Y.: Image completion with structure propagation. ACM Trans. Graph. (ToG) 24(3), 861–868 (2005)
Article Google Scholar
Tola, E., Cai, Q., Zhang, Z., Zhang, C.: Virtual view generation with a hybrid camera array. Technical report (2009)
Google Scholar
Wang, Y., Liu, Y., Heidrich, W., Dai, Q.: The light field attachment: turning a DSLR into a light field camera using a low budget camera ring. IEEE Trans. Vis. Comput. Graph. 23, 2357–2364 (2016)
Article Google Scholar
Wanner, S., Goldluecke, B.: Variational light field analysis for disparity estimation and super-resolution. IEEE TPAMI 36(3), 606–619 (2014)
Article Google Scholar
Wetzstein, G., Lanman, D., Hirsch, M., Heidrich, W., Raskar, R.: Compressive light field displays. IEEE Comput. Graph. Appl. 32(5), 6–11 (2012)
Article Google Scholar
Wilburn, B., Joshi, N., Vaish, V., Talvala, E.V., Antunez, E., Barth, A., Adams, A., Horowitz, M., Levoy, M.: High performance imaging using large camera arrays, pp. 765–776 (2005)
Google Scholar
Yang, J., Wright, J., Huang, T., Ma, Y.: Image super-resolution as sparse representation of raw image patches. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), pp. 1–8. IEEE (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Engineering University, Zhengzhou, China
Mandan Zhao, Xiangyang Hao, Chuanqi Cheng & Jiansheng Li

Authors

Mandan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyang Hao
View author publications
You can also search for this author in PubMed Google Scholar
Chuanqi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jiansheng Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mandan Zhao .

Editor information

Editors and Affiliations

Civil Aviation University of China, Tianjin, China
Jinfeng Yang
Tianjin University, Tianjin, China
Qinghua Hu
Nankai University, Tianjin, China
Ming-Ming Cheng
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Liang Wang
Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Huazhong University of Science and Technology, Wuhan, China
Xiang Bai
Xi’an Jiaotong University, Xi’an, China
Deyu Meng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, M., Hao, X., Cheng, C., Li, J. (2017). Light Field Super-Resolution Using Cross-Resolution Input Based on PatchMatch and Learning Method. In: Yang, J., et al. Computer Vision. CCCV 2017. Communications in Computer and Information Science, vol 772. Springer, Singapore. https://doi.org/10.1007/978-981-10-7302-1_28

Download citation

DOI: https://doi.org/10.1007/978-981-10-7302-1_28
Published: 30 November 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7301-4
Online ISBN: 978-981-10-7302-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Light Field Super-Resolution Using Cross-Resolution Input Based on PatchMatch and Learning Method

Abstract

Similar content being viewed by others

Downsampling consistency correction-based quality enhancement for CNN-based light field image super-resolution

LF-SAET: Cascaded Spatial-Angular-EPI Transformers for Light Field Image Super-Resolution

MPIN: a macro-pixel integration network for light field super-resolution

Keywords

1 Introduction

2 Related Work

2.1 Single Image Super-Resolution Method Using CNNs

2.2 Light Field Super-Resolution

2.3 Hybrid Imaging

3 Proposed Method

3.1 Compute Initial Error

3.2 Patch-Based Estimation

3.3 Integrated Super-Resolution

4 Experimental Results

4.1 Experiment Setup

4.2 Super-Resolution Results

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us