Abstract
Novel view synthesis has achieved remarkable quality and efficiency by the paradigm of 3D Gaussian Splatting (3D-GS), but still faces two challenges: 1) significant performance degradation when trained with only few-shot samples due to a lack of geometry constraint, and 2) incapability of rendering at a higher resolution that is beyond the input resolution of training samples. In this paper, we propose Dual-Lens 3D-GS (DL-GS) to achieve high-resolution (HR) and few-shot view synthesis, by leveraging the characteristics of the asymmetric dual-lens system commonly equipped on mobile devices. This kind of system captures the same scene with different focal lengths (i.e., wide-angle and telephoto) under an asymmetric stereo configuration, which naturally provides geometric hints for few-shot training and HR guidance for resolution improvement. Nevertheless, there remain two major technical problems to achieving this goal. First, how to effectively exploit the geometry information from the asymmetric stereo configuration? To this end, we propose a consistency-aware training strategy, which integrates a dual-lens-consistent loss to regularize the 3D-GS optimization. Second, how to make the best use of the dual-lens training samples to effectively improve the resolution of newly synthesized views? To this end, we design a multi-reference-guided refinement module to select proper telephoto and wide-angle guided images from training samples based on the camera pose distances, and then exploit their information for high-frequency detail enhancement. Extensive experiments on simulated and real-captured datasets validate the distinct superiority of our DL-GS over various competitors on the task of HR and few-shot view synthesis. The implementation code is available at https://github.com/XrKang/DL-GS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alzayer, H., et al.: DC2: dual-camera defocus control by learning to refocus. In: CVPR (2023)
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: MIP-nerf 360: unbounded anti-aliased neural radiance fields. In: CVPR (2022)
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-nerf: anti-aliased grid-based neural radiance fields. In: ICCV (2023)
Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., Müller, M.: Zoedepth: zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023)
Bhoi, A.: Monocular depth estimation: a survey. arXiv preprint arXiv:1901.09402 (2019)
Cao, J., et al.: Real-time neural light field on mobile devices. In: CVPR (2023)
Chan, K.C., Wang, X., Yu, K., Dong, C., Loy, C.C.: Basicvsr: the search for essential components in video super-resolution and beyond. In: CVPR (2021)
Chen, A., et al.: Mvsnerf: fast generalizable radiance field reconstruction from multi-view stereo. In: CVPR (2021)
Chen, X., Wang, X., Zhou, J., Qiao, Y., Dong, C.: Activating more pixels in image super-resolution transformer. In: CVPR (2023)
Chen, X., Xiong, Z., Cheng, Z., Peng, J., Zhang, Y., Zha, Z.J.: Degradation-agnostic correspondence from resolution-asymmetric stereo. In: CVPR (2022)
Chen, Z., Funkhouser, T., Hedman, P., Tagliasacchi, A.: Mobilenerf: exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In: CVPR (2023)
Chung, J., Oh, J., Lee, K.M.: Depth-regularized optimization for 3D gaussian splatting in few-shot images. arXiv preprint arXiv:2311.13398 (2023)
Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised nerf: fewer views and faster training for free. In: CVPR (2022)
Deng, N., et al.: FoV-NeRF: foveated neural radiance fields for virtual reality. IEEE Trans. Visual Comput. Graphics 28(11), 3854–3864 (2022)
Dong, J., Fang, Q., Yang, T., Shuai, Q., Qiao, C., Peng, S.: iVS-Net: learning human view synthesis from internet videos. In: ICCV (2023)
Hattori, H., Maki, A.: Stereo without depth search and metric calibration. In: CVPR (2000)
Hu, T., Liu, S., Chen, Y., Shen, T., Jia, J.: Efficientnerf efficient neural radiance fields. In: CVPR (2022)
Huang, X., Li, W., Hu, J., Chen, H., Wang, Y.: RefSR-NeRF: towards high fidelity and super resolution view synthesis. In: CVPR (2023)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023)
Kim, M., Seo, S., Han, B.: Infonerf: ray entropy minimization for few-shot neural volume rendering. In: CVPR (2022)
Lai, W.S., Huang, J.B., Wang, O., Shechtman, E., Yumer, E., Yang, M.H.: Learning blind video temporal consistency. In: ECCV (2018)
Larsson, V., Zobernig, N., Taskin, K., Pollefeys, M.: Calibration-free structure-from-motion with calibrated radial trifocal tensors. In: ECCV (2020)
Lee, J., Lee, M., Cho, S., Lee, S.: Reference-based video super-resolution using multi-camera video triplets. In: CVPR (2022)
Li, Q., Li, F., Guo, J., Guo, Y.: Uhdnerf: ultra-high-definition neural radiance fields. In: ICCV (2023)
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: image restoration using swin transformer. In: CVPRW (2021)
Lin, C.Y., Fu, Q., Merth, T., Yang, K., Ranjan, A.: Fastsr-nerf: improving nerf efficiency on consumer devices with a simple super-resolution pipeline. In: WACV (2024)
Manne, S.K.R., Prasad, B., Rosh, K.: Asymmetric wide tele camera fusion for high fidelity digital zoom. In: ICCVIP (2019)
Mechrez, R., Talmi, I., Shama, F., Zelnik-Manor, L.: Maintaining natural image statistics with the contextual loss. In: ACCV (2019)
Mechrez, R., Talmi, I., Zelnik-Manor, L.: The contextual loss for image transformation with non-aligned data. In: ECCV (2018)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Mohan, M.M., Nithin, G., Rajagopalan, A.: Deep dynamic scene deblurring for unconstrained dual-lens cameras. IEEE Trans. Image Process. 30, 4479–4491 (2021)
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (ToG) 41(4), 1–15 (2022)
Niemeyer, M., Barron, J.T., Mildenhall, B., Sajjadi, M.S., Geiger, A., Radwan, N.: Regnerf: regularizing neural radiance fields for view synthesis from sparse inputs. In: CVPR (2022)
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV (2021)
Saito, S., Simon, T., Saragih, J., Joo, H.: Pifuhd: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: CVPR (2020)
Santesteban, I., Otaduy, M., Thuerey, N., Casas, D.: Ulnef: untangled layered neural fields for mix-and-match virtual try-on. In: NIPS (2022)
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Sedgwick, P.: Pearson’s correlation coefficient. BMJ 345 (2012)
Seo, S., Chang, Y., Kwak, N.: Flipnerf: flipped reflection rays for few-shot novel view synthesis. In: ICCV (2023)
Shao, R., et al.: Doublefield: bridging the neural surface and radiance fields for high-fidelity human reconstruction and rendering. In: CVPR (2022)
Somraj, N., Soundararajan, R.: VIP-nerf: visibility prior for sparse input neural radiance fields. In: ACM SIGGRAPH (2023)
Song, J., et al.: Därf: boosting radiance fields from sparse input views with monocular depth adaptation. In: NIPS (2023)
Song, T., Kim, S., Sohn, K.: Unsupervised deep asymmetric stereo matching with spatially-adaptive self-similarity. In: CVPR (2023)
Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: Dreamgaussian: generative gaussian splatting for efficient 3D content creation. arXiv preprint arXiv:2309.16653 (2023)
Teed, Z., Deng, J.: Raft: recurrent all-pairs field transforms for optical flow. In: ECCV (2020)
Tian, Y., Zhang, Y., Fu, Y., Xu, C.: TDAN: temporally-deformable alignment network for video super-resolution. In: CVPR (2020)
Tosi, F., Tonioni, A., De Gregorio, D., Poggi, M.: Nerf-supervised deep stereo. In: CVPR (2023)
Uy, M.A., Martin-Brualla, R., Guibas, L., Li, K.: Scade: nerfs from space carving with ambiguity-aware depth estimates. In: CVPR (2023)
Wang, C., Wu, X., Guo, Y.C., Zhang, S.H., Tai, Y.W., Hu, S.M.: Nerf-SR: high quality neural radiance fields using supersampling. In: ACM MM (2022)
Wang, G., Chen, Z., Loy, C.C., Liu, Z.: Sparsenerf: distilling depth ranking for few-shot novel view synthesis. In: ICCV (2023)
Wang, T., Xie, J., Sun, W., Yan, Q., Chen, Q.: Dual-camera super-resolution with aligned attention modules. In: ICCV (2021)
Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: CVPRW (2019)
Wang, Y., et al.: Disentangling light fields for super-resolution and disparity estimation. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 425–443 (2022)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, pp. 1398–1402. IEEE (2003)
Xiong, H., Muttukuru, S., Upadhyay, R., Chari, P., Kadambi, A.: Sparsegs: real-time 360 sparse view synthesis using gaussian splatting. arXiv preprint arXiv:2312.00206 (2023)
Xu, R., Yao, M., Xiong, Z.: Zero-shot dual-lens super-resolution. In: CVPR (2023)
Yang, J., Pavone, M., Wang, Y.: Freenerf: improving few-shot neural rendering with free frequency regularization. In: CVPR (2023)
Yoon, Y., Yoon, K.J.: Cross-guided optimization of radiance fields with multi-view image super-resolution for high-resolution novel view synthesis. In: CVPR (2023)
Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelNeRF: neural radiance fields from one or few images. In: CVPR (2021)
Yue, H., Cui, Z., Li, K., Yang, J.: Kedusr: real-world dual-lens super-resolution via kernel-free matching. arXiv preprint arXiv:2312.17050 (2023)
Zhang, J., et al.: Mobidepth: real-time depth estimation using on-device dual cameras. In: Proceedings of the 28th Annual International Conference on Mobile Computing and Networking, pp. 528–541 (2022)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Zhang, Z., Wang, R., Zhang, H., Chen, Y., Zuo, W.: Self-supervised learning for real-world super-resolution from dual zoomed observations. In: ECCV (2022)
Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain generalization: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4396–4415 (2022)
Zhu, Z., Fan, Z., Jiang, Y., Wang, Z.: FSGS: real-time few-shot view synthesis using gaussian splatting. arXiv preprint arXiv:2312.00451 (2023)
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grants 62131003 and 62021001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, R., Yao, M., Li, Y., Zhang, Y., Xiong, Z. (2025). High-Resolution and Few-Shot View Synthesis from Asymmetric Dual-Lens Inputs. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15061. Springer, Cham. https://doi.org/10.1007/978-3-031-72646-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-72646-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72645-3
Online ISBN: 978-3-031-72646-0
eBook Packages: Computer ScienceComputer Science (R0)