Skip to main content

Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization Using Geometrical Information

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Accurate localization in diverse environments is a fundamental challenge in computer vision and robotics. The task involves determining a sensor’s precise position and orientation, typically a camera, within a given space. Traditional localization methods often rely on passive sensing, which may struggle in scenarios with limited features or dynamic environments. In response, this paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy. Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications. Our results demonstrate that our method performs better than the existing one, targeting similar problems and generalizing on synthetic and real data. We also release an open-source implementation to benefit the community at www.github.com/rvp-group/learning-where-to-look.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bartolomei, L., Teixeira, L., Chli, M.: Semantic-aware active perception for UAVs using deep reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3101–3108 (2021). https://doi.org/10.1109/IROS51168.2021.9635893

  2. Brachmann, E., Rother, C.: Learning less is more-6D camera localization via 3D surface regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4654–4662 (2018)

    Google Scholar 

  3. Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 7525–7534 (2019)

    Google Scholar 

  4. Brizi, L., et al.: VBR: a vision benchmark in Rome. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE (2024)

    Google Scholar 

  5. Burgard, W., Fox, D., Thrun, S.: Active mobile robot localization. In: Proceedings of the International Conference on Artificial Intelligence (IJCAI), pp. 1346–1352. Citeseer (1997)

    Google Scholar 

  6. Cavallari, T., et al.: Real-time RGB-D camera pose estimation in novel scenes using a relocalisation cascade. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 42(10), 2465–2477 (2019)

    Google Scholar 

  7. Chaplot, D.S., Parisotto, E., Salakhutdinov, R.: Active neural localization. arXiv preprint arXiv:1801.08214 (2018)

  8. Chen, S., Li, X., Wang, Z., Prisacariu, V.A.: DFNet: enhance absolute pose regression with direct feature matching. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 1–17. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_1

    Chapter  Google Scholar 

  9. Clark, R., Wang, S., Markham, A., Trigoni, N., Wen, H.: VidLoc: a deep spatio-temporal model for 6-DoF video-clip relocalization. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6856–6864 (2017)

    Google Scholar 

  10. Costante, G., Forster, C., Delmerico, J., Valigi, P., Scaramuzza, D.: Perception-aware path planning. arXiv preprint arXiv:1605.04151 (2016)

  11. Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: CamNet: coarse-to-fine retrieval for camera re-localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2871–2880 (2019)

    Google Scholar 

  12. Dong, S., Wang, S., Zhuang, Y., Kannala, J., Pollefeys, M., Chen, B.: Visual localization via few-shot scene region classification. In: 2022 International Conference on 3D Vision (3DV), pp. 393–402. IEEE (2022)

    Google Scholar 

  13. Dusmanu, M., et al.: D2-Net: a trainable CNN for joint description and detection of local features. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8092–8101 (2019)

    Google Scholar 

  14. Fang, Q., et al.: Towards accurate active camera localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 122–139. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_8

    Chapter  Google Scholar 

  15. Fontanelli, D., Salaris, P., Belo, F.A., Bicchi, A.: Visual appearance mapping for optimal vision based servoing. In: Khatib, O., Kumar, V., Pappas, G.J. (eds.) Experimental Robotics. Springer Tracts in Advanced Robotics, vol. 54, pp. 353–362. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00196-3_41

    Chapter  Google Scholar 

  16. González, Á.: Measurement of areas on a sphere using fibonacci and latitude-longitude lattices. Math. Geosci. 42, 49–64 (2010)

    Article  MathSciNet  Google Scholar 

  17. Hanlon, M., Sun, B., Pollefeys, M., Blum, H.: Active visual localization for multi-agent collaboration: a data-driven approach. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE (2024)

    Google Scholar 

  18. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  19. Karaman, S., Frazzoli, E.: Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. (IJRR) 30(7), 846–894 (2011)

    Article  Google Scholar 

  20. Katz, S., Tal, A., Basri, R.: Direct visibility of point sets. ACM Trans. Graph. 26, 24–es (2007)

    Google Scholar 

  21. Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2938–2946 (2015)

    Google Scholar 

  22. Kim, A., Eustice, R.M.: Perception-driven navigation: active visual slam for robotic area coverage. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 3196–3203 (2013). https://doi.org/10.1109/ICRA.2013.6631022

  23. LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_3

    Chapter  Google Scholar 

  24. Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11983–11992 (2020)

    Google Scholar 

  25. Lim, J., Lawrance, N., Achermann, F., Stastny, T., Bähnemann, R., Siegwart, R.: Fisher information based active planning for aerial photogrammetry. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 1249–1255 (2023). https://doi.org/10.1109/ICRA48891.2023.10161136

  26. Lodel, M., Brito, B., Serra-Gómez, A., Ferranti, L., Babuska, R., Alonso-Mora, J.: Where to look next: learning viewpoint recommendations for informative trajectory planning. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 4466–4472. IEEE (2022)

    Google Scholar 

  27. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60, 91–110 (2004)

    Article  Google Scholar 

  28. Moreau, A., Piasco, N., Tsishkou, D., Stanciulescu, B., de La Fortelle, A.: Lens: localization enhanced by nerf synthesis. In: Conference on Robot Learning, pp. 1347–1356. PMLR (2022)

    Google Scholar 

  29. Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. 32(6), 1–11 (2013)

    Article  Google Scholar 

  30. Panek, V., Kukelova, Z., Sattler, T.: MeshLoc: mesh-based visual localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13682, pp. 589–609. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_34

    Chapter  Google Scholar 

  31. Papachristos, C., Khattak, S., Alexis, K.: Uncertainty-aware receding horizon exploration and mapping using aerial robots. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 4568–4575 (2017). https://doi.org/10.1109/ICRA.2017.7989531

  32. Placed, J.A., et al.: A survey on active simultaneous localization and mapping: state of the art and new frontiers. IEEE Trans. Robot. (TRO) 39, 1686–1705 (2023)

    Google Scholar 

  33. Ramakrishnan, S.K., et al.: Habitat-matterport 3D dataset (HM3D): 1000 large-scale 3D environments for embodied AI. arXiv preprint arXiv:2109.08238 (2021)

  34. Roy, N., Burgard, W., Fox, D., Thrun, S.: Coastal navigation-mobile robot navigation with uncertainty in dynamic environments. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), vol. 1, pp. 35–40. IEEE (1999)

    Google Scholar 

  35. Saraceni, L., Motoi, I.M., Nardi, D., Ciarfuglia, T.A.: AgriSORT: a simple online real-time tracking-by-detection framework for robotics in precision agriculture. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE (2024)

    Google Scholar 

  36. Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12716–12725 (2019)

    Google Scholar 

  37. Sarlin, P.E., et al.: Back to the feature: arning robust camera localization from pixels to pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3247–3257 (2021)

    Google Scholar 

  38. Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2D-to-3D matching. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 667–674. IEEE (2011)

    Google Scholar 

  39. Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_54

    Chapter  Google Scholar 

  40. Sattler, T., Leibe, B., Kobbelt, L.: Efficient & effective prioritized matching for large-scale image-based localization. IEEE Trans. on Pattern Anal. Mach. Intell. (TPAMI) 39(9), 1744–1756 (2016)

    Google Scholar 

  41. Sattler, T., et al.: Benchmarking 6dof outdoor visual localization in changing conditions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8601–8610 (2018)

    Google Scholar 

  42. Sattler, T., et al.: Are large-scale 3d models really necessary for accurate visual localization? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1646 (2017)

    Google Scholar 

  43. Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3302–3312 (2019)

    Google Scholar 

  44. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113 (2016)

    Google Scholar 

  45. Shavit, Y., Ferens, R., Keller, Y.: Learning multi-scene absolute pose regression with transformers. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2733–2742 (2021)

    Google Scholar 

  46. Shavit, Y., Keller, Y.: Camera pose auto-encoders for improving pose regression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 140–157. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_9

    Chapter  Google Scholar 

  47. Shi, J., et al.: Good features to track. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 593–600. IEEE (1994)

    Google Scholar 

  48. Tang, S., Tang, C., Huang, R., Zhu, S., Tan, P.: Learning camera localization via dense scene matching. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1831–1841 (2021)

    Google Scholar 

  49. Torr, P.H., Zisserman, A., Maybank, S.J.: Robust detection of degenerate configurations while estimating the fundamental matrix. Comput. Vis. Image Underst. 71(3), 312–333 (1998)

    Article  Google Scholar 

  50. Wang, B., Chen, C., Lu, C.X., Zhao, P., Trigoni, N., Markham, A.: AtLoc: attention guided camera localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10393–10401 (2020)

    Google Scholar 

  51. Xue, F., Wu, X., Cai, S., Wang, J.: Learning multi-view camera relocalization with graph neural networks. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11372–11381. IEEE (2020)

    Google Scholar 

  52. Yan, Q., Zheng, J., Reding, S., Li, S., Doytchinov, I.: CrossLoc: scalable aerial localization assisted by multimodal synthetic data. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17358–17368 (2022)

    Google Scholar 

  53. Yang, L., Bai, Z., Tang, C., Li, H., Furukawa, Y., Tan, P.: SANet: scene agnostic network for camera localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 42–51 (2019)

    Google Scholar 

  54. Zhang, Z., Scaramuzza, D.: Perception-aware receding horizon navigation for MAVs. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 2534–2541. IEEE (2018)

    Google Scholar 

  55. Zhang, Z., Scaramuzza, D.: Beyond point clouds: Fisher information field for active visual localization. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 5986–5992. IEEE (2019)

    Google Scholar 

  56. Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv preprint arXiv:1801.09847 (2018)

Download references

Acknowledgments

This work has been partially supported by Sapienza University of Rome as part of the work for project H&M: Hyperspectral and Multispectral Fruit Sugar Content Estimation for Robot Harvesting Operations in Difficult Environments, Del. SA n.36/2022, by the Hasler Stiftung Research Grant via the ETH Zurich Foundation and an ETH Zurich Career Seed Award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Di Giammarino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Di Giammarino, L., Sun, B., Grisetti, G., Pollefeys, M., Blum, H., Barath, D. (2025). Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization Using Geometrical Information. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15144. Springer, Cham. https://doi.org/10.1007/978-3-031-73016-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73016-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73015-3

  • Online ISBN: 978-3-031-73016-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics