Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization Using Geometrical Information

Di Giammarino, Luca; Sun, Boyang; Grisetti, Giorgio; Pollefeys, Marc; Blum, Hermann; Barath, Daniel

doi:10.1007/978-3-031-73016-0_12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15144))

Included in the following conference series:

European Conference on Computer Vision

390 Accesses

Abstract

Accurate localization in diverse environments is a fundamental challenge in computer vision and robotics. The task involves determining a sensor’s precise position and orientation, typically a camera, within a given space. Traditional localization methods often rely on passive sensing, which may struggle in scenarios with limited features or dynamic environments. In response, this paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy. Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications. Our results demonstrate that our method performs better than the existing one, targeting similar problems and generalizing on synthetic and real data. We also release an open-source implementation to benefit the community at www.github.com/rvp-group/learning-where-to-look.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

View planning in robot active vision: A survey of systems, algorithms, and applications

Article Open access 01 August 2020

Learning Visual Landmarks for Localization with Minimal Supervision

Next Best View Planning in a Single Glance: An Approach to Improve Object Recognition

Article 11 November 2022

References

Bartolomei, L., Teixeira, L., Chli, M.: Semantic-aware active perception for UAVs using deep reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3101–3108 (2021). https://doi.org/10.1109/IROS51168.2021.9635893
Brachmann, E., Rother, C.: Learning less is more-6D camera localization via 3D surface regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4654–4662 (2018)
Google Scholar
Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 7525–7534 (2019)
Google Scholar
Brizi, L., et al.: VBR: a vision benchmark in Rome. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE (2024)
Google Scholar
Burgard, W., Fox, D., Thrun, S.: Active mobile robot localization. In: Proceedings of the International Conference on Artificial Intelligence (IJCAI), pp. 1346–1352. Citeseer (1997)
Google Scholar
Cavallari, T., et al.: Real-time RGB-D camera pose estimation in novel scenes using a relocalisation cascade. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 42(10), 2465–2477 (2019)
Google Scholar
Chaplot, D.S., Parisotto, E., Salakhutdinov, R.: Active neural localization. arXiv preprint arXiv:1801.08214 (2018)
Chen, S., Li, X., Wang, Z., Prisacariu, V.A.: DFNet: enhance absolute pose regression with direct feature matching. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 1–17. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_1
Chapter Google Scholar
Clark, R., Wang, S., Markham, A., Trigoni, N., Wen, H.: VidLoc: a deep spatio-temporal model for 6-DoF video-clip relocalization. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6856–6864 (2017)
Google Scholar
Costante, G., Forster, C., Delmerico, J., Valigi, P., Scaramuzza, D.: Perception-aware path planning. arXiv preprint arXiv:1605.04151 (2016)
Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: CamNet: coarse-to-fine retrieval for camera re-localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2871–2880 (2019)
Google Scholar
Dong, S., Wang, S., Zhuang, Y., Kannala, J., Pollefeys, M., Chen, B.: Visual localization via few-shot scene region classification. In: 2022 International Conference on 3D Vision (3DV), pp. 393–402. IEEE (2022)
Google Scholar
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint description and detection of local features. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8092–8101 (2019)
Google Scholar
Fang, Q., et al.: Towards accurate active camera localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 122–139. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_8
Chapter Google Scholar
Fontanelli, D., Salaris, P., Belo, F.A., Bicchi, A.: Visual appearance mapping for optimal vision based servoing. In: Khatib, O., Kumar, V., Pappas, G.J. (eds.) Experimental Robotics. Springer Tracts in Advanced Robotics, vol. 54, pp. 353–362. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00196-3_41
Chapter Google Scholar
González, Á.: Measurement of areas on a sphere using fibonacci and latitude-longitude lattices. Math. Geosci. 42, 49–64 (2010)
Article MathSciNet Google Scholar
Hanlon, M., Sun, B., Pollefeys, M., Blum, H.: Active visual localization for multi-agent collaboration: a data-driven approach. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE (2024)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
Google Scholar
Karaman, S., Frazzoli, E.: Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. (IJRR) 30(7), 846–894 (2011)
Article Google Scholar
Katz, S., Tal, A., Basri, R.: Direct visibility of point sets. ACM Trans. Graph. 26, 24–es (2007)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2938–2946 (2015)
Google Scholar
Kim, A., Eustice, R.M.: Perception-driven navigation: active visual slam for robotic area coverage. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 3196–3203 (2013). https://doi.org/10.1109/ICRA.2013.6631022
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_3
Chapter Google Scholar
Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11983–11992 (2020)
Google Scholar
Lim, J., Lawrance, N., Achermann, F., Stastny, T., Bähnemann, R., Siegwart, R.: Fisher information based active planning for aerial photogrammetry. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 1249–1255 (2023). https://doi.org/10.1109/ICRA48891.2023.10161136
Lodel, M., Brito, B., Serra-Gómez, A., Ferranti, L., Babuska, R., Alonso-Mora, J.: Where to look next: learning viewpoint recommendations for informative trajectory planning. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 4466–4472. IEEE (2022)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60, 91–110 (2004)
Article Google Scholar
Moreau, A., Piasco, N., Tsishkou, D., Stanciulescu, B., de La Fortelle, A.: Lens: localization enhanced by nerf synthesis. In: Conference on Robot Learning, pp. 1347–1356. PMLR (2022)
Google Scholar
Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. 32(6), 1–11 (2013)
Article Google Scholar
Panek, V., Kukelova, Z., Sattler, T.: MeshLoc: mesh-based visual localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13682, pp. 589–609. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_34
Chapter Google Scholar
Papachristos, C., Khattak, S., Alexis, K.: Uncertainty-aware receding horizon exploration and mapping using aerial robots. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 4568–4575 (2017). https://doi.org/10.1109/ICRA.2017.7989531
Placed, J.A., et al.: A survey on active simultaneous localization and mapping: state of the art and new frontiers. IEEE Trans. Robot. (TRO) 39, 1686–1705 (2023)
Google Scholar
Ramakrishnan, S.K., et al.: Habitat-matterport 3D dataset (HM3D): 1000 large-scale 3D environments for embodied AI. arXiv preprint arXiv:2109.08238 (2021)
Roy, N., Burgard, W., Fox, D., Thrun, S.: Coastal navigation-mobile robot navigation with uncertainty in dynamic environments. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), vol. 1, pp. 35–40. IEEE (1999)
Google Scholar
Saraceni, L., Motoi, I.M., Nardi, D., Ciarfuglia, T.A.: AgriSORT: a simple online real-time tracking-by-detection framework for robotics in precision agriculture. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE (2024)
Google Scholar
Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12716–12725 (2019)
Google Scholar
Sarlin, P.E., et al.: Back to the feature: arning robust camera localization from pixels to pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3247–3257 (2021)
Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2D-to-3D matching. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 667–674. IEEE (2011)
Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_54
Chapter Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: Efficient & effective prioritized matching for large-scale image-based localization. IEEE Trans. on Pattern Anal. Mach. Intell. (TPAMI) 39(9), 1744–1756 (2016)
Google Scholar
Sattler, T., et al.: Benchmarking 6dof outdoor visual localization in changing conditions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8601–8610 (2018)
Google Scholar
Sattler, T., et al.: Are large-scale 3d models really necessary for accurate visual localization? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1646 (2017)
Google Scholar
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3302–3312 (2019)
Google Scholar
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113 (2016)
Google Scholar
Shavit, Y., Ferens, R., Keller, Y.: Learning multi-scene absolute pose regression with transformers. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2733–2742 (2021)
Google Scholar
Shavit, Y., Keller, Y.: Camera pose auto-encoders for improving pose regression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 140–157. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_9
Chapter Google Scholar
Shi, J., et al.: Good features to track. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 593–600. IEEE (1994)
Google Scholar
Tang, S., Tang, C., Huang, R., Zhu, S., Tan, P.: Learning camera localization via dense scene matching. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1831–1841 (2021)
Google Scholar
Torr, P.H., Zisserman, A., Maybank, S.J.: Robust detection of degenerate configurations while estimating the fundamental matrix. Comput. Vis. Image Underst. 71(3), 312–333 (1998)
Article Google Scholar
Wang, B., Chen, C., Lu, C.X., Zhao, P., Trigoni, N., Markham, A.: AtLoc: attention guided camera localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10393–10401 (2020)
Google Scholar
Xue, F., Wu, X., Cai, S., Wang, J.: Learning multi-view camera relocalization with graph neural networks. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11372–11381. IEEE (2020)
Google Scholar
Yan, Q., Zheng, J., Reding, S., Li, S., Doytchinov, I.: CrossLoc: scalable aerial localization assisted by multimodal synthetic data. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17358–17368 (2022)
Google Scholar
Yang, L., Bai, Z., Tang, C., Li, H., Furukawa, Y., Tan, P.: SANet: scene agnostic network for camera localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 42–51 (2019)
Google Scholar
Zhang, Z., Scaramuzza, D.: Perception-aware receding horizon navigation for MAVs. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 2534–2541. IEEE (2018)
Google Scholar
Zhang, Z., Scaramuzza, D.: Beyond point clouds: Fisher information field for active visual localization. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 5986–5992. IEEE (2019)
Google Scholar
Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv preprint arXiv:1801.09847 (2018)

Download references

Acknowledgments

This work has been partially supported by Sapienza University of Rome as part of the work for project H&M: Hyperspectral and Multispectral Fruit Sugar Content Estimation for Robot Harvesting Operations in Difficult Environments, Del. SA n.36/2022, by the Hasler Stiftung Research Grant via the ETH Zurich Foundation and an ETH Zurich Career Seed Award.

Author information

Authors and Affiliations

Sapienza University of Rome, Rome, Italy
Luca Di Giammarino & Giorgio Grisetti
ETH Zürich, Zürich, Switzerland
Boyang Sun, Marc Pollefeys, Hermann Blum & Daniel Barath
Microsoft, Redmond, USA
Marc Pollefeys
Uni Bonn, Bonn, Germany
Hermann Blum

Authors

Luca Di Giammarino
View author publications
You can also search for this author in PubMed Google Scholar
Boyang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Grisetti
View author publications
You can also search for this author in PubMed Google Scholar
Marc Pollefeys
View author publications
You can also search for this author in PubMed Google Scholar
Hermann Blum
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Barath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Di Giammarino .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Di Giammarino, L., Sun, B., Grisetti, G., Pollefeys, M., Blum, H., Barath, D. (2025). Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization Using Geometrical Information. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15144. Springer, Cham. https://doi.org/10.1007/978-3-031-73016-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-73016-0_12
Published: 26 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73015-3
Online ISBN: 978-3-031-73016-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization Using Geometrical Information