Visual Localization Through Virtual Views

Song, Zhenbo; Sun, Xi; Xue, Zhou; Xie, Dong; Wen, Chao

doi:10.1007/978-3-031-20503-3_52

Zhenbo Song^12,13,
Xi Sun¹³,
Zhou Xue¹³,
Dong Xie¹² &
…
Chao Wen¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13606))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1515 Accesses

Abstract

This paper addresses the problem of camera localization, i.e. 6 DoF pose estimation, with respect to a given 3D reconstruction. Current methods often use a coarse-to-fine image registration framework, which integrates image retrieval and visual keypoint matching. However, the localization accuracy is restricted by the limited invariance of feature descriptors. For example, when the query image has been acquired at the illumination (day/night) not consistent with the model image time, or from a position not covered by the model images, retrieval and feature matching may fail, leading to false pose estimation. In this paper, we propose to increase the diversity of model images, namely new viewpoints and new visual appearances, by synthesizing novel images with neural rendering methods. Specifically, we build the 3D model on Neural Radiance Fields (NeRF), and use appearance embeddings to encode variation of illuminations. Then we propose an efficient strategy to interpolate appearance embeddings and place virtual cameras in the scene to generate virtual model images. In order to facilitate the model image management, the appearance embeddings are associated with image acquisition conditions, such as daytime, season, and weather. Query image pose is estimated through similar conditional virtual views using the conventional hierarchical localization framework. We demonstrate the approach by conducting single smartphone image localization in a large-scale 3D urban model, showing the improvement in the accuracy of pose estimation.

This work was completed when Zhenbo Song was an intern at ByteDance. This work was supported in part by the Jiangsu Funding Program for Excellent Postdoctoral Talent under Grant 2022ZB268.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016)
Google Scholar
Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised nerf: fewer views and faster training for free. arXiv preprint arXiv:2107.02791 (2021)
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Hausler, S., Garg, S., Xu, M., Milford, M., Fischer, T.: Patch-netvlad: multi-scale fusion of locally-global descriptors for place recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14141–14152 (2021)
Google Scholar
Horaud, R., Conio, B., Leboulleux, O., Lacolle, B.: An analytic solution for the perspective 4-point problem. Comput. Vis. Graph. Image Process. 47(1), 33–44 (1989)
Article Google Scholar
Irschara, A., Zach, C., Frahm, J.M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA (2009)
Google Scholar
Liu, L., Li, H., Dai, Y.: Efficient global 2d–3d matching for camera localization in a large-scale 3d map. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Luo, Z., Zhou, L., Bai, X., Chen, H., Zhang, J., Yao, Y., Li, S., Fang, T., Quan, L.: Aslfeat: learning local features of accurate shape and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6589–6598 (2020)
Google Scholar
Martin-Brualla, R., Radwan, N., Sajjadi, M.S., Barron, J.T., Dosovitskiy, A., Duckworth, D.: Nerf in the wild: neural radiance fields for unconstrained photo collections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7210–7219 (2021)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Purkait, P., Zhao, C., Zach, C.: Spp-net: deep absolute pose regression with synthetic views. arXiv preprint arXiv:1712.03452 (2017)
Rolin, P., Berger, M.O., Sur, F.: View synthesis for pose computation. Machine Vision and Applications (2–3) (2019)
Google Scholar
Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12716–12725 (2019)
Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_54
Chapter Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)
Google Scholar
Torii, A., Arandjelovic, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1808–1817 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Zhenbo Song & Dong Xie
ByteDance, Beijing, China
Zhenbo Song, Xi Sun, Zhou Xue & Chao Wen

Authors

Zhenbo Song
View author publications
You can also search for this author in PubMed Google Scholar
Xi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Xue
View author publications
You can also search for this author in PubMed Google Scholar
Dong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Chao Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi Sun .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Xiaomi Inc., Beijing, China
Daniel Povey
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
JD Explore Academy, Beijing, China
Tao Mei
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, Z., Sun, X., Xue, Z., Xie, D., Wen, C. (2022). Visual Localization Through Virtual Views. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13606. Springer, Cham. https://doi.org/10.1007/978-3-031-20503-3_52

Download citation

DOI: https://doi.org/10.1007/978-3-031-20503-3_52
Published: 17 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20502-6
Online ISBN: 978-3-031-20503-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics