Abstract
Light field angular super-resolution (LFASR) aims to reconstruct the densely sampled light field from sparsely sampled inputs. Recently, convolutional neural network-based methods have achieved encouraging results. However, most of these approaches use view-specific characteristics of the target dense light (i.e., contents and view locations) separately, the geometry structure information of the target light field is not fully explored. To this end, we propose view-specific queries to integrate the view location information of the dense light field into Transformer (dubbed as ViewFormer) for LFASR. In particular, we first leverage a Transformer encoder to process the input sparsely sampled light field. Then, the view interpolation operation is used to process the extracted subaperture features along horizontal and vertical directions of the target light field, generating the new sub-aperture representations dubbed as view-specific queries. Next, the view-specific queries contain the view coordinate information of the target light field, and are dynamically enhanced by a Transformer decoder layer by layer. The enhanced view-specific queries are fed to the reconstruction module for final light field synthesis. Additionally, to further mine more information on input sparsely sampled light field, we employ the channel attention scheme to the building blocks of the Transformer. Extensive experiments are performed on commonly-used LFASR benchmarks. ViewFormer achieves new state-of-the-art results compared with other methods on popular LFASR benchmarks, including real-world and synthetic data.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated or analysed during the current study are available from the corresponding author on reasonable request.
Notes
Note that the order of view interpolation does not affect the final LFASR results.
We attempt to replace WMSA with the self-attention layer from ViT [39], but it consumes a significant amount of computational resources, making it infeasible to run ViewFormer on the experiment server. Additionally, we find that varying the window size in WMSA has minimal impact on the ViewFormer’s final performance, but it significantly affects the number of parameters and FLOPs. Considering the computational complexity, we ultimately choose to set the window size in WMSA to 4.
References
Kim, C., Zimmer, H., Pritch, Y., Sorkine-Hornung, A., Gross, M.H.: Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graph. 32(4), 73–1 (2013)
Fiss, J., Curless, B., Szeliski, R.: Refocusing plenoptic images using depth-adaptive splatting. In: IEEE International Conference on Computational Photography, pp. 1–9 (2014)
Wang, X., Chao, W., Wang, L., Duan, F.: Light field depth estimation using occlusion-aware consistency analysis. Vis. Comput. 39(8), 3441–3454 (2023)
Jia, C., Shi, F., Zhao, M., Zhang, Y., Cheng, X., Wang, M., Chen, S.: Semantic segmentation with light field imaging and convolutional neural networks. IEEE Trans. Instrum. Meas. 70, 1–14 (2021)
Gao, W., Fan, S., Li, G., Lin, W.: A thorough benchmark and a new model for light field saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(7), 8003–8019 (2023)
Liao, G., Gao, W.: Rethinking feature mining for light field salient object detection. Communications, and Applications, ACM Transactions on Multimedia Computing (2024)
Yu, J.: A light-field journey to virtual reality. IEEE Multim. 24(2), 104–112 (2017)
Wang, S., Sheng, H., Yang, D., Cui, Z., Cong, R., Ke, W.: Mfsrnet: spatial-angular correlation retaining for light field super-resolution. Appl. Intell. 1–19 (2023)
Yang, J., Wang, L., Ren, L., Cao, Y., Cao, Y.: Light field angular super-resolution based on structure and scene information. Appl. Intell. 53(4), 4767–4783 (2023)
Yoon, Y., Jeon, H.-G., Yoo, D., Lee, J.-Y., So Kweon, I.: Learning a deep convolutional network for light-field image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision Workshops, pp. 24–32 (2015)
Yeung, H.W.F., Hou, J., Chen, J., Chung, Y.Y., Chen, X.: Fast light field reconstruction with deep coarse-to-fine modeling of spatial-angular clues. In: European Conference on Computer Vision, pp. 137–152 (2018)
Gao, W., Zhou, L., Tao, L.: A fast view synthesis implementation method for light field applications. ACM Trans. Multim. Comput. Commun. Appl. 17(4), 1–20 (2021)
Kalantari, N.K., Wang, T.-C., Ramamoorthi, R.: Learning-based view synthesis for light field cameras. ACM Trans. Graph. 35(6), 1–10 (2016)
Jin, J., Hou, J., Yuan, H., Kwong, S.: Learning light field angular super-resolution via a geometry-aware network. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11141–11148 (2020)
Jin, J., Hou, J., Chen, J., Zeng, H., Kwong, S., Yu, J.: Deep coarse-to-fine dense light field reconstruction with flexible sampling and geometry-aware fusion. IEEE Trans. Pattern Anal. Mach. Intell. 44(04), 1819–1836 (2022)
Liu, X., Wang, M., Wang, A., Hua, X., Liu, S.: Depth-guided learning light field angular super-resolution with edge-aware inpainting. Vis. Comput. 38(8), 2839–2851 (2022)
Guo, M., Jin, J., Liu, H., Hou, J.: Learning dynamic interpolation for extremely sparse light fields with wide baselines. In: Proceedings of the IEEE Conference on Computer Vision, pp. 2450–2459 (2021)
Guo, M., Hou, J., Jin, J., Liu, H., Zeng, H., Lu, J.: Content-aware warping for view synthesis. IEEE Trans. Pattern Anal. Mach. Intell. (2023). https://doi.org/10.1109/TPAMI.2023.3242709
Wu, G., Zhao, M., Wang, L., Dai, Q., Chai, T., Liu, Y.: Light field reconstruction using deep convolutional network on epi. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6319–6327 (2017)
Gul, M.S.K., Gunturk, B.K.: Spatial and angular resolution enhancement of light fields using convolutional neural networks. IEEE Trans. Image Process. 27(5), 2146–2159 (2018)
Wang, Y., Liu, F., Wang, Z., Hou, G., Sun, Z., Tan, T.: End-to-end view synthesis for light field imaging with pseudo 4dcnn. In: European Conference on Computer Vision, pp. 333–348 (2018)
Zhu, M., Alperovich, A., Johannsen, O., Sulc, A., Goldluecke, B.: An epipolar volume autoencoder with adversarial loss for deep light field super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1853–1861 (2019)
Wu, G., Liu, Y., Dai, Q., Chai, T.: Learning sheared epi structure for light field reconstruction. IEEE Trans. Image Process. 28(7), 3261–3273 (2019)
Jin, J., Hou, J., Chen, J., Kwong, S.: Light field spatial super-resolution via deep combinatorial geometry embedding and structural consistency regularization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2260–2269 (2020)
Meng, N., Li, K., Liu, J., Lam, E.Y.: Light field view synthesis via aperture disparity and warping confidence map. IEEE Trans. Image Process. 30, 3908–3921 (2021)
Wang, S., Zhou, T., Lu, Y., Di, H.: Detail-preserving transformer for light field image super-resolution. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2522–2530 (2022)
Liang, Z., Wang, Y., Wang, L., Yang, J., Zhou, S.: Light field image super-resolution with transformers. IEEE Signal Process. Lett. 29, 563–567 (2022)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE Conference on Computer Vision, pp. 10012–10022 (2021)
Rombach, R., Esser, P., Ommer, B.: Geometry-free view synthesis: Transformers and no 3d priors. In: Proceedings of the IEEE Conference on Computer Vision, pp. 14356–14366 (2021)
Sajjadi, M.S., Meyer, H., Pot, E., Bergmann, U., Greff, K., Radwan, N., Vora, S., Lučić, M., Duckworth, D., Dosovitskiy, A., et al.: Scene representation transformer: Geometry-free novel view synthesis through set-latent scene representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6229–6238 (2022)
Chen, X., Wang, X., Zhou, J., Dong, C.: Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2023)
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: European Conference on Computer Vision, pp. 286–301 (2018)
Mo, Y., Wang, Y., Xiao, C., Yang, J., An, W.: Dense dual-attention network for light field image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4431–4443 (2021)
Mo, Y., Wang, Y., Wang, L., Yang, J., An, W.: Light field angular super-resolution via dense correspondence field reconstruction. In: European Conference on Computer Vision Workshops, pp. 412–428 (2022)
Liu, D., Mao, Y., Zhou, X., An, P., Fang, Y.: Learning a multilevel cooperative view reconstruction network for light field angular super-resolution. In: IEEE International Conference on Multimedia and Expo, pp. 1271–1276 (2023)
Cao, Y., Wang, L., Ren, L., Yang, J., Cao, Y.: View position prior-supervised light field angular super-resolution network with asymmetric feature extraction and spatial-angular interaction. Neurocomputing 518, 206–218 (2023)
Wang, L., Ren, L., Wei, X., Yang, J., Cao, Y., Cao, Y.: Light field angular super-resolution based on intrinsic and geometric information. Knowl.-Based Syst. 270, 110553 (2023)
Liu, D., Mao, Y., Huang, Y., Cao, L., Wang, Y., Fang, Y.: Optical flow-assisted multi-level fusion network for light field image angular reconstruction. Signal Process.: Image Commun. 119, 117031 (2023)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Chen, Z., Zhou, Y., Li, R., Li, P., Sheng, B.: Scpa-net: self-calibrated pyramid aggregation for image dehazing. Comput. Animat. Virtual Worlds 33(3–4), 2061–2073 (2022)
Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: Eapt: efficient attention pyramid transformer for image processing. IEEE Trans. Multim. 25, 50–61 (2023)
Chen, Z., Qiu, G., Li, P., Zhu, L., Yang, X., Sheng, B.: Mngnas: distilling adaptive combination of multiple searched networks for one-shot neural architecture search. IEEE Trans. Pattern Anal. Mach. Intell. 45(11), 13489–13508 (2023)
Jiang, N., Sheng, B., Li, P., Lee, T.-Y.: Photohelper: portrait photographing guidance via deep feature retrieval and fusion. IEEE Trans. Multim. 25, 2226–2238 (2023)
Sheng, B., Li, P., Ali, R., Chen, C.P.: Improving video temporal consistency via broad learning system. IEEE Trans. Cybern. 52(7), 6662–6675 (2022)
Li, J., Chen, J., Sheng, B., Li, P., Yang, P., Feng, D.D., Qi, J.: Automatic detection and classification system of domestic waste via multimodel cascaded convolutional neural network. IEEE Trans. Ind. Inf. 18(1), 163–173 (2022)
Xie, Z., Zhang, W., Sheng, B., Li, P., Chen, C.P.: Bagfn: broad attentive graph fusion network for high-order feature interactions. IEEE Trans. Neural Netw. Learn. Syst. 34(8), 4499–4513 (2023)
Li, H., Sheng, B., Li, P., Ali, R., Chen, C.P.: Globally and locally semantic colorization via exemplar-based broad-gan. IEEE Trans. Image Process. 30, 8526–8539 (2021)
Li, P., Sheng, B., Chen, C.P.: Face sketch synthesis using regularized broad learning system. IEEE Trans. Neural Netw. Learn. Syst. 33(10), 5346–5360 (2021)
Wen, Y., Chen, J., Sheng, B., Chen, Z., Li, P., Tan, P., Lee, T.-Y.: Structure-aware motion deblurring using multi-adversarial optimized cyclegan. IEEE Trans. Image Process. 30, 6142–6155 (2021)
Jin, Y., Sheng, B., Li, P., Chen, C.P.: Broad colorization. IEEE Trans. Neural Netw. Learn. Syst. 32(6), 2330–2343 (2020)
Zhou, Y., Chen, Z., Li, P., Song, H., Chen, C.P., Sheng, B.: Fsad-net: feedback spatial attention dehazing network. IEEE Trans. Neural Netw. Learn. Syst. 34(10), 7719–7733 (2023)
Dai, L., Wu, L., Li, H., Cai, C., Wu, Q., Kong, H., Liu, R., Wang, X., Hou, X., Liu, Y., et al.: A deep learning system for detecting diabetic retinopathy across the disease spectrum. Nat. Commun. 12(1), 3242 (2021)
Guo, H., Sheng, B., Li, P., Chen, C.P.: Multiview high dynamic range image synthesis using fuzzy broad learning system. IEEE Trans. Cybern. 51(5), 2735–2747 (2019)
Sheng, B., Li, P., Fang, X., Tan, P., Wu, E.: Depth-aware motion deblurring using loopy belief propagation. IEEE Trans. Circuits Syst. Video Technol. 30(4), 955–969 (2019)
Sheng, B., Li, P., Jin, Y., Tan, P., Lee, T.-Y.: Intrinsic image decomposition with step and drift shading separation. IEEE Trans. Visual Comput. Graph. 26(2), 1332–1346 (2018)
Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., Gao, W.: Pre-trained image processing transformer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12299–12310 (2021)
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE Conference on Computer Vision Workshops, pp. 1833–1844 (2021)
Cao, J., Liang, J., Zhang, K., Li, Y., Zhang, Y., Wang, W., Van Goo, L.: Reference-based image super-resolution with deformable attention transformer. In: European Conference on Computer Vision, pp. 325–342 (2022)
Liang, J., Cao, J., Fan, Y., Zhang, K., Ranjan, R., Li, Y., Timofte, R., Van Gool, L.: Vrt: A video restoration transformer (2022). arXiv preprint arXiv:2201.12288
Liang, J., Fan, Y., Xiang, X., Ranjan, R., Ilg, E., Green, S., Cao, J., Zhang, K., Timofte, R., Gool, L.V.: Recurrent video restoration transformer with guided deformable attention. Adv. Neural. Inf. Process. Syst. 35, 378–393 (2022)
Geng, Z., Liang, L., Ding, T., Zharkov, I.: Rstt: Real-time spatial temporal transformer for space-time video super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 17441–17451 (2022)
Wang, Z., Cun, X., Bao, J., Zhou, W., Liu, J., Li, H.: Uformer: A general u-shaped transformer for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 17683–17693 (2022)
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.-H.: Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5728–5739 (2022)
Wang, Y., Lu, Y., Wang, S., Zhang, W., Wang, Z.: Local-global feature aggregation for light field image super-resolution. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2160–2164 (2022)
Wang, Y., Wang, L., Liang, Z., Yang, J., Timofte, R., Guo, Y., Jin, K., Wei, Z., Yang, A., Guo, S., et al.: Ntire 2023 challenge on light field image super-resolution: Dataset, methods and results. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1320–1335 (2023)
Xu, W., Xu, Y., Chang, T., Tu, Z.: Co-scale conv-attentional image transformers. In: Proceedings of the IEEE Conference on Computer Vision, pp. 9981–9990 (2021)
Li, K., Wang, Y., Zhang, J., Gao, P., Song, G., Liu, Y., Li, H., Qiao, Y.: Uniformer: Unifying convolution and self-attention for visual recognition (2022). arXiv preprint arXiv:2201.09450
Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., Zhang, L.: Cvt: Introducing convolutions to vision transformers. In: Proceedings of the IEEE Conference on Computer Vision, pp. 22–31 (2021)
Mehta, S., Rastegari, M.: Separable self-attention for mobile vision transformers (2022). arXiv preprint arXiv:2206.02680
Yuan, K., Guo, S., Liu, Z., Zhou, A., Yu, F., Wu, W.: Incorporating convolution designs into visual transformers. In: Proceedings of the IEEE Conference on Computer Vision, pp. 579–588 (2021)
Xiao, T., Singh, M., Mintun, E., Darrell, T., Dollár, P., Girshick, R.: Early convolutions help transformers see better. Adv. Neural. Inf. Process. Syst. 34, 30392–30400 (2021)
Mehta, S., Rastegari, M.: Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. In: International Conference on Learning Representations (2021)
Guo, J., Han, K., Wu, H., Tang, Y., Chen, X., Wang, Y., Xu, C.: Cmt: Convolutional neural networks meet vision transformers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12175–12185 (2022)
Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., Ye, Q.: Conformer: Local features coupling global representations for visual recognition. In: Proceedings of the IEEE Conference on Computer Vision, pp. 367–376 (2021)
Chen, Y., Dai, X., Chen, D., Liu, M., Dong, X., Yuan, L., Liu, Z.: Mobile-former: Bridging mobilenet and transformer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5270–5279 (2022)
Chen, Q., Wu, Q., Wang, J., Hu, Q., Hu, T., Ding, E., Cheng, J., Wang, J.: Mixformer: Mixing features across windows and dimensions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5249–5259 (2022)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)
Wang, Y., Wang, L., Wu, G., Yang, J., An, W., Yu, J., Guo, Y.: Disentangling light fields for super-resolution and disparity estimation. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 425–443 (2022)
Honauer, K., Johannsen, O., Kondermann, D., Goldluecke, B.: A dataset and evaluation methodology for depth estimation on 4d light fields. In: Asian Conference on Computer Vision, pp. 19–34 (2016)
Wanner, S., Meister, S., Goldluecke, B.: Datasets and benchmarks for densely sampled 4d light fields. In: Vision, Modelling and Visualization, vol. 13, pp. 225–226 (2013)
Raj, A.S., Lowney, M., Shah, R., Wetzstein, G.: Stanford Lytro Light Field Archive. http://lightfields.stanford.edu/LF2016.html
Liu, G., Yue, H., Wu, J., Yang, J.: Efficient light field angular super-resolution with sub-aperture feature learning and macro-pixel upsampling. IEEE Trans. Multim. 25, 6588–6600 (2023)
Zhang, S., Sheng, H., Li, C., Zhang, J., Xiong, Z.: Robust depth estimation for light field via spinning parallelogram operator. Comput. Vis. Image Underst. 145, 148–159 (2016)
Wang, Y., Liang, Z., Wang, L., Yang, J., An, W., Guo, Y.: Real-world light field image super-resolution via degradation modulation. IEEE Trans. Neural Netw. Learn. Syst. (2024)
Xiao, Z., Shi, J., Jiang, X., Guillemot, C.: A learning-based view extrapolation method for axial super-resolution. Neurocomputing 455, 229–241 (2021)
Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. ACM Trans. Graph. 37(4), 1–12 (2018)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Acknowledgements
This work is supported by the special projects in key areas of Guangdong Province (2022ZDZX1036) and Shenzhen Peacock Plan.
Author information
Authors and Affiliations
Contributions
Shunzhou Wang: Conceptualization of this study, Methodology, Software, Writing. Yao Lu: Discussion, Review, Editing, Supervision. Wang Xia: Visualization, Validation, Review. Peiqi Xia: Visualization, Validation, Review. Ziqi Wang: Visualization, Validation, Review. Wei Gao: Discussion, Review, Editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Ethical and informed consent statement for date used
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, S., Lu, Y., Xia, W. et al. Light field angular super-resolution by view-specific queries. Vis Comput 41, 3565–3580 (2025). https://doi.org/10.1007/s00371-024-03620-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-024-03620-y