Abstract
Accurate localization in diverse environments is a fundamental challenge in computer vision and robotics. The task involves determining a sensor’s precise position and orientation, typically a camera, within a given space. Traditional localization methods often rely on passive sensing, which may struggle in scenarios with limited features or dynamic environments. In response, this paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy. Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications. Our results demonstrate that our method performs better than the existing one, targeting similar problems and generalizing on synthetic and real data. We also release an open-source implementation to benefit the community at www.github.com/rvp-group/learning-where-to-look.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bartolomei, L., Teixeira, L., Chli, M.: Semantic-aware active perception for UAVs using deep reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3101–3108 (2021). https://doi.org/10.1109/IROS51168.2021.9635893
Brachmann, E., Rother, C.: Learning less is more-6D camera localization via 3D surface regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4654–4662 (2018)
Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 7525–7534 (2019)
Brizi, L., et al.: VBR: a vision benchmark in Rome. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE (2024)
Burgard, W., Fox, D., Thrun, S.: Active mobile robot localization. In: Proceedings of the International Conference on Artificial Intelligence (IJCAI), pp. 1346–1352. Citeseer (1997)
Cavallari, T., et al.: Real-time RGB-D camera pose estimation in novel scenes using a relocalisation cascade. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 42(10), 2465–2477 (2019)
Chaplot, D.S., Parisotto, E., Salakhutdinov, R.: Active neural localization. arXiv preprint arXiv:1801.08214 (2018)
Chen, S., Li, X., Wang, Z., Prisacariu, V.A.: DFNet: enhance absolute pose regression with direct feature matching. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 1–17. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_1
Clark, R., Wang, S., Markham, A., Trigoni, N., Wen, H.: VidLoc: a deep spatio-temporal model for 6-DoF video-clip relocalization. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6856–6864 (2017)
Costante, G., Forster, C., Delmerico, J., Valigi, P., Scaramuzza, D.: Perception-aware path planning. arXiv preprint arXiv:1605.04151 (2016)
Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: CamNet: coarse-to-fine retrieval for camera re-localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2871–2880 (2019)
Dong, S., Wang, S., Zhuang, Y., Kannala, J., Pollefeys, M., Chen, B.: Visual localization via few-shot scene region classification. In: 2022 International Conference on 3D Vision (3DV), pp. 393–402. IEEE (2022)
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint description and detection of local features. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8092–8101 (2019)
Fang, Q., et al.: Towards accurate active camera localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 122–139. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_8
Fontanelli, D., Salaris, P., Belo, F.A., Bicchi, A.: Visual appearance mapping for optimal vision based servoing. In: Khatib, O., Kumar, V., Pappas, G.J. (eds.) Experimental Robotics. Springer Tracts in Advanced Robotics, vol. 54, pp. 353–362. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00196-3_41
González, Á.: Measurement of areas on a sphere using fibonacci and latitude-longitude lattices. Math. Geosci. 42, 49–64 (2010)
Hanlon, M., Sun, B., Pollefeys, M., Blum, H.: Active visual localization for multi-agent collaboration: a data-driven approach. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE (2024)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
Karaman, S., Frazzoli, E.: Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. (IJRR) 30(7), 846–894 (2011)
Katz, S., Tal, A., Basri, R.: Direct visibility of point sets. ACM Trans. Graph. 26, 24–es (2007)
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2938–2946 (2015)
Kim, A., Eustice, R.M.: Perception-driven navigation: active visual slam for robotic area coverage. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 3196–3203 (2013). https://doi.org/10.1109/ICRA.2013.6631022
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_3
Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11983–11992 (2020)
Lim, J., Lawrance, N., Achermann, F., Stastny, T., Bähnemann, R., Siegwart, R.: Fisher information based active planning for aerial photogrammetry. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 1249–1255 (2023). https://doi.org/10.1109/ICRA48891.2023.10161136
Lodel, M., Brito, B., Serra-Gómez, A., Ferranti, L., Babuska, R., Alonso-Mora, J.: Where to look next: learning viewpoint recommendations for informative trajectory planning. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 4466–4472. IEEE (2022)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60, 91–110 (2004)
Moreau, A., Piasco, N., Tsishkou, D., Stanciulescu, B., de La Fortelle, A.: Lens: localization enhanced by nerf synthesis. In: Conference on Robot Learning, pp. 1347–1356. PMLR (2022)
Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. 32(6), 1–11 (2013)
Panek, V., Kukelova, Z., Sattler, T.: MeshLoc: mesh-based visual localization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13682, pp. 589–609. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_34
Papachristos, C., Khattak, S., Alexis, K.: Uncertainty-aware receding horizon exploration and mapping using aerial robots. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 4568–4575 (2017). https://doi.org/10.1109/ICRA.2017.7989531
Placed, J.A., et al.: A survey on active simultaneous localization and mapping: state of the art and new frontiers. IEEE Trans. Robot. (TRO) 39, 1686–1705 (2023)
Ramakrishnan, S.K., et al.: Habitat-matterport 3D dataset (HM3D): 1000 large-scale 3D environments for embodied AI. arXiv preprint arXiv:2109.08238 (2021)
Roy, N., Burgard, W., Fox, D., Thrun, S.: Coastal navigation-mobile robot navigation with uncertainty in dynamic environments. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), vol. 1, pp. 35–40. IEEE (1999)
Saraceni, L., Motoi, I.M., Nardi, D., Ciarfuglia, T.A.: AgriSORT: a simple online real-time tracking-by-detection framework for robotics in precision agriculture. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA). IEEE (2024)
Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12716–12725 (2019)
Sarlin, P.E., et al.: Back to the feature: arning robust camera localization from pixels to pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3247–3257 (2021)
Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2D-to-3D matching. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 667–674. IEEE (2011)
Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_54
Sattler, T., Leibe, B., Kobbelt, L.: Efficient & effective prioritized matching for large-scale image-based localization. IEEE Trans. on Pattern Anal. Mach. Intell. (TPAMI) 39(9), 1744–1756 (2016)
Sattler, T., et al.: Benchmarking 6dof outdoor visual localization in changing conditions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8601–8610 (2018)
Sattler, T., et al.: Are large-scale 3d models really necessary for accurate visual localization? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1646 (2017)
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3302–3312 (2019)
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113 (2016)
Shavit, Y., Ferens, R., Keller, Y.: Learning multi-scene absolute pose regression with transformers. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2733–2742 (2021)
Shavit, Y., Keller, Y.: Camera pose auto-encoders for improving pose regression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13670, pp. 140–157. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20080-9_9
Shi, J., et al.: Good features to track. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 593–600. IEEE (1994)
Tang, S., Tang, C., Huang, R., Zhu, S., Tan, P.: Learning camera localization via dense scene matching. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1831–1841 (2021)
Torr, P.H., Zisserman, A., Maybank, S.J.: Robust detection of degenerate configurations while estimating the fundamental matrix. Comput. Vis. Image Underst. 71(3), 312–333 (1998)
Wang, B., Chen, C., Lu, C.X., Zhao, P., Trigoni, N., Markham, A.: AtLoc: attention guided camera localization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10393–10401 (2020)
Xue, F., Wu, X., Cai, S., Wang, J.: Learning multi-view camera relocalization with graph neural networks. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11372–11381. IEEE (2020)
Yan, Q., Zheng, J., Reding, S., Li, S., Doytchinov, I.: CrossLoc: scalable aerial localization assisted by multimodal synthetic data. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17358–17368 (2022)
Yang, L., Bai, Z., Tang, C., Li, H., Furukawa, Y., Tan, P.: SANet: scene agnostic network for camera localization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 42–51 (2019)
Zhang, Z., Scaramuzza, D.: Perception-aware receding horizon navigation for MAVs. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 2534–2541. IEEE (2018)
Zhang, Z., Scaramuzza, D.: Beyond point clouds: Fisher information field for active visual localization. In: Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 5986–5992. IEEE (2019)
Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv preprint arXiv:1801.09847 (2018)
Acknowledgments
This work has been partially supported by Sapienza University of Rome as part of the work for project H&M: Hyperspectral and Multispectral Fruit Sugar Content Estimation for Robot Harvesting Operations in Difficult Environments, Del. SA n.36/2022, by the Hasler Stiftung Research Grant via the ETH Zurich Foundation and an ETH Zurich Career Seed Award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Di Giammarino, L., Sun, B., Grisetti, G., Pollefeys, M., Blum, H., Barath, D. (2025). Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization Using Geometrical Information. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15144. Springer, Cham. https://doi.org/10.1007/978-3-031-73016-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-73016-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73015-3
Online ISBN: 978-3-031-73016-0
eBook Packages: Computer ScienceComputer Science (R0)