Skip to main content

Visual Localization Through Virtual Views

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13606))

Included in the following conference series:

  • 1515 Accesses

Abstract

This paper addresses the problem of camera localization, i.e. 6 DoF pose estimation, with respect to a given 3D reconstruction. Current methods often use a coarse-to-fine image registration framework, which integrates image retrieval and visual keypoint matching. However, the localization accuracy is restricted by the limited invariance of feature descriptors. For example, when the query image has been acquired at the illumination (day/night) not consistent with the model image time, or from a position not covered by the model images, retrieval and feature matching may fail, leading to false pose estimation. In this paper, we propose to increase the diversity of model images, namely new viewpoints and new visual appearances, by synthesizing novel images with neural rendering methods. Specifically, we build the 3D model on Neural Radiance Fields (NeRF), and use appearance embeddings to encode variation of illuminations. Then we propose an efficient strategy to interpolate appearance embeddings and place virtual cameras in the scene to generate virtual model images. In order to facilitate the model image management, the appearance embeddings are associated with image acquisition conditions, such as daytime, season, and weather. Query image pose is estimated through similar conditional virtual views using the conventional hierarchical localization framework. We demonstrate the approach by conducting single smartphone image localization in a large-scale 3D urban model, showing the improvement in the accuracy of pose estimation.

This work was completed when Zhenbo Song was an intern at ByteDance. This work was supported in part by the Jiangsu Funding Program for Excellent Postdoctoral Talent under Grant 2022ZB268.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016)

    Google Scholar 

  2. Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised nerf: fewer views and faster training for free. arXiv preprint arXiv:2107.02791 (2021)

  3. DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)

    Google Scholar 

  4. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  5. Hausler, S., Garg, S., Xu, M., Milford, M., Fischer, T.: Patch-netvlad: multi-scale fusion of locally-global descriptors for place recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14141–14152 (2021)

    Google Scholar 

  6. Horaud, R., Conio, B., Leboulleux, O., Lacolle, B.: An analytic solution for the perspective 4-point problem. Comput. Vis. Graph. Image Process. 47(1), 33–44 (1989)

    Article  Google Scholar 

  7. Irschara, A., Zach, C., Frahm, J.M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA (2009)

    Google Scholar 

  8. Liu, L., Li, H., Dai, Y.: Efficient global 2d–3d matching for camera localization in a large-scale 3d map. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  9. Luo, Z., Zhou, L., Bai, X., Chen, H., Zhang, J., Yao, Y., Li, S., Fang, T., Quan, L.: Aslfeat: learning local features of accurate shape and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6589–6598 (2020)

    Google Scholar 

  10. Martin-Brualla, R., Radwan, N., Sajjadi, M.S., Barron, J.T., Dosovitskiy, A., Duckworth, D.: Nerf in the wild: neural radiance fields for unconstrained photo collections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7210–7219 (2021)

    Google Scholar 

  11. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24

    Chapter  Google Scholar 

  12. Purkait, P., Zhao, C., Zach, C.: Spp-net: deep absolute pose regression with synthetic views. arXiv preprint arXiv:1712.03452 (2017)

  13. Rolin, P., Berger, M.O., Sur, F.: View synthesis for pose computation. Machine Vision and Applications (2–3) (2019)

    Google Scholar 

  14. Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12716–12725 (2019)

    Google Scholar 

  15. Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_54

    Chapter  Google Scholar 

  16. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)

    Google Scholar 

  17. Torii, A., Arandjelovic, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1808–1817 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Song, Z., Sun, X., Xue, Z., Xie, D., Wen, C. (2022). Visual Localization Through Virtual Views. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13606. Springer, Cham. https://doi.org/10.1007/978-3-031-20503-3_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20503-3_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20502-6

  • Online ISBN: 978-3-031-20503-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics