Abstract
Localization and mapping is the foundational technology for augmented reality (AR) that enables sharing and persistence of digital content in the real world. While significant progress has been made, researchers are still mostly driven by unrealistic benchmarks not representative of real-world AR scenarios. In particular, benchmarks are often based on small-scale datasets with low scene diversity, captured from stationary cameras, and lacking other sensor inputs like inertial, radio, or depth data. Furthermore, ground-truth (GT) accuracy is mostly insufficient to satisfy AR requirements. To close this gap, we introduce a new benchmark with a comprehensive capture and GT pipeline, which allow us to co-register realistic AR trajectories in diverse scenes and from heterogeneous devices at scale. To establish accurate GT, our pipeline robustly aligns the captured trajectories against laser scans in a fully automatic manner. Based on this pipeline, we publish a benchmark dataset of diverse and large-scale scenes recorded with head-mounted and hand-held AR devices. We extend several state-of-the-art methods to take advantage of the AR specific setup and evaluate them on our benchmark. Based on the results, we present novel insights on current research gaps to provide avenues for future work in the community.
P.-E. Sarlin and M. Dusmanu—Equal contribution.
V. Larsson—Now at Lund University, Sweden.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, S., Mierle, K., et al.: Ceres solver. http://ceres-solver.org
Arandjelovic, R.: Three things everyone should know to improve object retrieval. In: CVPR (2012)
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the CVPR (2016)
Badino, H., Huber, D., Kanade, T.: The CMU visual localization data set (2011). http://3dvis.ri.cmu.edu/data-sets/localization
Bahl, P., Padmanabhan, V.N.: RADAR: an in-building RF-based user location and tracking system. In: INFOCOM (2000)
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the CVPR (2017)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). CVIU 110, 346–359 (2008)
Brachmann, E., Humenberger, M., Rother, C., Sattler, T.: On the limits of pseudo ground truth in visual camera re-localisation. In: Proceedings of the ICCV (2021)
Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. In: ICCV (2019)
Brachmann, E., Rother, C.: Visual camera re-localization from RGB and RGB-D images using DSAC. T-PAMI 44, 5847–5865 (2021)
Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 726–743. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_43
Carlevaris-Bianco, N., Ushani, A.K., Eustice, R.M.: University of Michigan North Campus long-term vision and lidar dataset. Int. J. Robot. Res. 35, 1023–1035 (2015)
Chan, Y.T., Tsui, W.Y., So, H.C., Chung Ching, P.: Time-of-arrival based localization under NLOS conditions. IEEE Trans. Veh. Technol. 55, 17–24 (2006)
Chen, D.M., et al.: City-scale landmark identification on mobile devices. In: CVPR (2011)
Choi, S., Zhou, Q.Y., Koltun, V.: Robust reconstruction of indoor scenes. In: CVPR, pp. 5556–5565 (2015)
Chum, O., Matas, J., Kittler, J.: Locally optimized RANSAC. In: Joint Pattern Recognition Symposium, pp. 236–243 (2003)
Cohen-Steiner, D., Da, F.: A greedy Delaunay-based surface reconstruction algorithm. Vis. Comput. 20(1), 4–16 (2004)
Comsa, C.R., Luo, J., Haimovich, A., Schwartz, S.: Wireless localization using time difference of arrival in narrow-band multipath systems. In: 2007 International Symposium on Signals, Circuits and Systems (2007)
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPR workshops (2018)
Dusmanu, M., Miksik, O., Schönberger, J.L., Pollefeys, M.: Cross-descriptor visual localization and mapping. In: ICCV (2021)
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: CVPR (2019)
Dusmanu, M., Schönberger, J.L., Sinha, S., Pollefeys, M.: Privacy-preserving image features via adversarial affine subspace embeddings. In: CVPR (2021)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)
Geppert, M., Larsson, V., Speciale, P., Schönberger, J.L., Pollefeys, M.: Privacy preserving structure-from-motion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 333–350. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_20
Geppert, M., Larsson, V., Speciale, P., Schonberger, J.L., Pollefeys, M.: Privacy preserving localization and mapping from uncalibrated cameras. In: CVPR (2021)
Grisetti, G., Kümmerle, R., Stachniss, C., Burgard, W.: A tutorial on graph-based slam. IEEE Intell. Transp. Syst. Mag. 2(4), 31–43 (2010)
He, S., Chan, S.H.G.: Wi-fi fingerprint-based indoor positioning: recent advances and comparisons. IEEE Commun. Surv. Tutor. 18, 466–490 (2016)
Hee Lee, G., Li, B., Pollefeys, M., Fraundorfer, F.: Minimal solutions for pose estimation of a multi-camera system. In: Inaba, M., Corke, P. (eds.) Robotics Research. STAR, vol. 114, pp. 521–538. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28872-7_30
Hodaň, T., et al.: BOP: benchmark for 6D object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_2
Humenberger, M., et al.: Robust image retrieval-based visual localization using kapture. arXiv preprint arXiv:2007.13867 (2020)
Hyeon, J., Kim, J., Doh, N.: Pose correction for highly accurate visual localization in large-scale indoor spaces. In: ICCV (2021)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)
Jin, Y., et al.: Image matching across wide baselines: from paper to practice. Int. J. Comput. Vis. 129, 517–547 (2020)
Johns, E., Yang, G.Z.: Feature co-occurrence maps: appearance-based localisation throughout the day. In: ICRA (2013)
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: CVPR (2017)
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: ICCV (2015)
Khalajmehrabadi, A., Gatsis, N., Akopian, D.: Modern WLAN fingerprinting indoor positioning methods and deployment challenges (2016)
Laoudias, C., Michaelides, M.P., Panayiotou, C.G.: Fault detection and mitigation in WLAN RSS fingerprint-based positioning. J. Locat. Based Serv. 6, 101–116 (2012)
Lee, D., et al.: Large-scale localization datasets in crowded indoor spaces. In: CVPR (2021)
Li, X., Ylioinas, J., Verbeek, J., Kannala, J.: Scene coordinate regression with angle-based reprojection loss for camera relocalization. In: ECCV workshop (2018)
Li, Y., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 15–29. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_2
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford RobotCar dataset. IJRR 36, 3–15 (2017)
Massiceti, D., Krull, A., Brachmann, E., Rother, C., Torr, P.H.S.: Random forests versus neural networks - what’s best for camera localization? In: ICRA (2017)
Meng, L., Chen, J., Tung, F., Little, J.J., Valentin, J., de Silva, C.W.: Backtracking regression forests for accurate camera relocalization. In: IROS (2017)
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60, 63–86 (2004)
Milford, M.J., Wyeth, G.F.: SeqSLAM: visual route-based navigation for sunny summer days and stormy winter nights. In: ICRA (2012)
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: NeurIPS (2017)
Ng, T., Lopez-Rodriguez, A., Balntas, V., Mikolajczyk, K.: Reassessing the limitations of CNN methods for camera pose regression. arXiv (2021)
Peng, R., Sichitiu, M.L.: Angle of arrival localization for wireless sensor networks. In: 2006 3rd Annual IEEE Communications Society on Sensor and Ad Hoc Communications and Networks (2006)
Pion, N., Humenberger, M., Csurka, G., Cabon, Y., Sattler, T.: Benchmarking image retrieval for visual localization. In: 3DV (2020)
Pless, R.: Using many cameras as one. In: CVPR (2003)
Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. T-PAMI 41(7), 1655–1668 (2018)
Rau, A., Garcia-Hernando, G., Stoyanov, D., Brostow, G.J., Turmukhambetov, D.: Predicting visual overlap of images through interpretable non-metric box embeddings. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 629–646. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_37
Reina, S.C., Solin, A., Rahtu, E., Kannala, J.: ADVIO: an authentic dataset for visual-inertial odometry. In: ECCV (2018). http://arxiv.org/abs/1807.09828
Revaud, J., Almazán, J., de Rezende, R.S., de Souza, C.R.: Learning with average precision: training image retrieval with a listwise loss. In: ICCV (2019)
Revaud, J., Weinzaepfel, P., de Souza, C.R., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor. In: NeurIPS (2019)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: ICCV (2011)
Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: 3DIM (2001)
Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: CVPR (2019)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Sarlin, P.E., et al.: Back to the feature: learning robust camera localization from pixels to pose. In: CVPR (2021)
Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_54
Sattler, T., et al.: Benchmarking 6DOF outdoor visual localization in changing conditions. In: CVPR (2018)
Sattler, T., Weyand, T., Leibe, B., Kobbelt, L.P.: Image retrieval for image-based localization revisited. In: BMVC (2012)
Schönberger, J., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Schönberger, J.L., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative evaluation of hand-crafted and learned local features. In: Proceedings of the CVPR (2017)
Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: CVPR (2018)
Schops, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR (2017)
Shibuya, M., Sumikura, S., Sakurada, K.: Privacy preserving visual SLAM. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 102–118. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_7
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)
Speciale, P., Schönberger, J.L., Kang, S.B., Sinha, S.N., Pollefeys, M.: Privacy preserving image-based localization. In: CVPR (2019)
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: CVPR (2021)
Sun, X., Xie, Y., Luo, P., Wang, L.: A dataset for benchmarking image-based localization. In: CVPR (2017)
Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: CVPR (2018)
Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: SOSNet: second order similarity regularization for local descriptor learning. In: CVPR (2019)
Tolias, G., Avrithis, Y., Jégou, H.: To aggregate or not to aggregate: selective match kernels for image search. In: ICCV (2013)
Torii, A., Arandjelović, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: CVPR (2015)
Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13(04), 376–380 (1991)
Ungureanu, D., et al.: HoloLens 2 research mode as a tool for computer vision research (2020)
Valentin, J., Niessner, M., Shotton, J., Fitzgibbon, A., Izadi, S., Torr, P.H.S.: Exploiting uncertainty in regression forests for accurate camera relocalization. In: CVPR (2015)
Wald, J., Sattler, T., Golodetz, S., Cavallari, T., Tombari, F.: Beyond controlled environments: 3D camera re-localization in changing indoor scenes. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 467–487. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_28
Wang, F., Galliani, S., Vogel, C., Speciale, P., Pollefeys, M.: PatchMatchNet: learned multi-view patchmatch stereo (2021)
Wang, S., Laskar, Z., Melekhov, I., Li, X., Kannala, J.: Continual learning for image-based camera localization. In: ICCV (2021)
Wenzel, P., et al.: 4Seasons: a cross-season dataset for multi-weather SLAM in autonomous driving. In: GCPR (2020)
Yang, H., Antonante, P., Tzoumas, V., Carlone, L.: Graduated non-convexity for robust spatial perception: from non-minimal solvers to global outlier rejection. RA-L 5(2), 1127–1134 (2020)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv:1801.09847 (2018)
Acknowledgements
This paper would not have been possible without the hard work and contributions of Gabriela Evrova, Silvano Galliani, Michael Baumgartner, Cedric Cagniart, Jeffrey Delmerico, Jonas Hein, Dawid Jeczmionek, Mirlan Karimov, Maximilian Mews, Patrick Misteli, Juan Nieto, Sònia Batllori Pallarès, Rémi Pautrat, Songyou Peng, Iago Suarez, Rui Wang, Jeremy Wanner, Silvan Weder and our colleagues in CVG at ETH Zürich and the wider Microsoft Mixed Reality & AI team.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sarlin, PE. et al. (2022). LaMAR: Benchmarking Localization and Mapping for Augmented Reality. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13667. Springer, Cham. https://doi.org/10.1007/978-3-031-20071-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-031-20071-7_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20070-0
Online ISBN: 978-3-031-20071-7
eBook Packages: Computer ScienceComputer Science (R0)