Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer

Brachmann, Eric; Wynn, Jamie; Chen, Shuai; Cavallari, Tommaso; Monszpart, Áron; Turmukhambetov, Daniyar; Prisacariu, Victor Adrian

doi:10.1007/978-3-031-72992-8_24

Eric Brachmann¹³,
Jamie Wynn¹³,
Shuai Chen¹⁴,
Tommaso Cavallari¹³,
Áron Monszpart¹³,
Daniyar Turmukhambetov¹³ &
…
Victor Adrian Prisacariu^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15114))

Included in the following conference series:

European Conference on Computer Vision

610 Accesses

Abstract

We address the task of estimating camera parameters from a set of images depicting a scene. Popular feature-based structure-from-motion (SfM) tools solve this task by incremental reconstruction: they repeat triangulation of sparse 3D points and registration of more camera views to the sparse point cloud. We re-interpret incremental structure-from-motion as an iterated application and refinement of a visual relocalizer, that is, of a method that registers new views to the current state of the reconstruction. This perspective allows us to investigate alternative visual relocalizers that are not rooted in local feature matching. We show that scene coordinate regression, a learning-based relocalization approach, allows us to build implicit, neural scene representations from unposed images. Different from other learning-based reconstruction methods, we do not require pose priors nor sequential inputs, and we optimize efficiently over thousands of images. In many cases, our method, ACE0, estimates camera poses with an accuracy close to feature-based SfM, as demonstrated by novel view synthesis.

Project page: https://nianticlabs.github.io/acezero/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

Progressive Structure from Motion

RayTran: 3D Pose Estimation and Shape Reconstruction of Multiple Objects from Videos with Ray-Traced Transformers

References

Agarwal, S., et al.: Building Rome in a day. ACM TOG (2011)
Google Scholar
Agarwal, S., Snavely, N., Seitz, S.M., Szeliski, R.: Bundle adjustment in the large. In: ECCV (2010)
Google Scholar
Arnold, E., et al.: Map-free visual relocalization: metric pose relative to a single image. In: ECCV (2022)
Google Scholar
Balntas, V., Li, S., Prisacariu, V.A.: RelocNet: continuous metric learning relocalisation using neural nets. In: ECCV (2018)
Google Scholar
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-NeRF 360: unbounded anti-aliased neural radiance fields. In: CVPR (2022)
Google Scholar
Beardsley, P.A., Zisserman, A., Murray, D.W.: Sequential updating of projective and affine structure from motion. IJCV 23, 235–259 (1997)
Article Google Scholar
Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., Müller, M.: ZoeDepth: zero-shot transfer by combining relative and metric depth. arXiv (2023)
Google Scholar
Bhowmick, B., Patra, S., Chatterjee, A., Govindu, V.M., Banerjee, S.: Divide and conquer: efficient large-scale structure from motion using graph partitioning. In: ACCV (2015)
Google Scholar
Bhowmick, B., Patra, S., Chatterjee, A., Govindu, V.M., Banerjee, S.: Divide and conquer: a hierarchical approach to large-scale structure-from-motion. CVIU 157, 190–205 (2017)
Google Scholar
Bian, W., Wang, Z., Li, K., Bian, J.W., Prisacariu, V.A.: NoPe-NeRF: optimising neural radiance field with no pose prior. In: CVPR (2023)
Google Scholar
Bloesch, M., Czarnowski, J., Clark, R., Leutenegger, S., Davison, A.J.: CodeSLAM — learning a compact, optimisable representation for dense visual SLAM. In: CVPR (2018)
Google Scholar
Brachmann, E., Cavallari, T., Prisacariu, V.A.: Accelerated coordinate encoding: learning to relocalize in minutes using RGB and poses. In: CVPR (2023)
Google Scholar
Brachmann, E., Humenberger, M., Rother, C., Sattler, T.: On the limits of pseudo ground truth in visual camera re-localisation. In: ICCV (2021)
Google Scholar
Brachmann, E., et al.: DSAC-differentiable RANSAC for camera localization. In: CVPR (2017)
Google Scholar
Brachmann, E., Rother, C.: Learning less is more-6D camera localization via 3D surface regression. In: CVPR (2018)
Google Scholar
Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. In: ICCV (2019)
Google Scholar
Brachmann, E., Rother, C.: Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE TPAMI 44(9), 5847–5865 (2021)
Google Scholar
Brégier, R.: Deep regression on manifolds: a 3D rotation case study. In: 3DV (2021)
Google Scholar
Brown, D.: The bundle adjustment-progress and prospect. In: Congress of the International Society for Photogrammetry (1976)
Google Scholar
Brown, M., Lowe, D.G.: Unsupervised 3D object recognition and reconstruction in unordered datasets. In: 3DIM (2005)
Google Scholar
Carlone, L., Tron, R., Daniilidis, K., Dellaert, F.: Initialization techniques for 3D SLAM: a survey on rotation estimation and its use in pose graph optimization. In: ICRA (2015)
Google Scholar
Cavallari, T., Bertinetto, L., Mukhoti, J., Torr, P.H., Golodetz, S.: Let’s take this online: adapting scene coordinate regression network predictions for online RGB-D camera relocalisation. In: 3DV (2019)
Google Scholar
Cavallari, T., Golodetz, S., Lord, N.A., Valentin, J., Di Stefano, L., Torr, P.H.: On-the-fly adaptation of regression forests for online camera relocalisation. In: CVPR (2017)
Google Scholar
Chen, S., Bhalgat, Y., Li, X., Bian, J., Li, K., Wang, Z., Prisacariu, V.A.: Neural refinement for absolute pose regression with feature synthesis. In: CVPR (2024)
Google Scholar
Chen, S., Li, X., Wang, Z., Prisacariu, V.: DFNet: enhance absolute pose regression with direct feature matching. In: ECCV (2022)
Google Scholar
Chen, S., Wang, Z., Prisacariu, V.: Direct-PoseNet: absolute pose regression with photometric consistency. In: 3DV (2021)
Google Scholar
Cheng, Z., Esteves, C., Jampani, V., Kar, A., Maji, S., Makadia, A.: LU-NeRF: scene and pose estimation by synchronizing local unposed NeRFs. In: ICCV (2023)
Google Scholar
Crandall, D., Owens, A., Snavely, N., Huttenlocher, D.: Discrete-continuous optimization for large-scale structure from motion. In: CVPR (2011)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)
Google Scholar
Davison, A.J.: Real-time simultaneous localisation and mapping with a single camera. In: ICCV (2003)
Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPRW (2018)
Google Scholar
Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: CamNet: coarse-to-fine retrieval for camera re-localization. In: ICCV (2019)
Google Scholar
Dusmanu, M., et al.: D2-net: a trainable CNN for joint description and detection of local features. In: CVPR (2019)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE TPAMI 25(8), 930–943 (2003)
Article Google Scholar
Gherardi, R., Farenzena, M., Fusiello, A.: Improving the efficiency of hierarchical structure-and-motion. In: CVPR (2010)
Google Scholar
Govindu, V.M.: Combining two-view constraints for motion estimation. In: CVPR (2001)
Google Scholar
Govindu, V.M.: Lie-algebraic averaging for globally consistent motion estimation. In: CVPR (2004)
Google Scholar
Hartley, R., Trumpf, J., Dai, Y., Li, H.: Rotation averaging. IJCV (2013)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
Google Scholar
He, X., et al.: Detector-free structure from motion. In: CVPR (2024)
Google Scholar
Heinly, J., Schönberger, J.L., Dunn, E., Frahm, J.M.: Reconstructing the World in six days. In: CVPR (2015)
Google Scholar
Humenberger, M., et al.: Investigating the role of image retrieval for visual localization: an exhaustive benchmark. IJCV 130(7), 1811–1836 (2022)
Article Google Scholar
Izadi, S., et al.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: UIST (2011)
Google Scholar
Jeong, Y., Ahn, S., Choy, C., Anandkumar, A., Cho, M., Park, J.: Self-calibrating neural radiance fields. In: ICCV (2021)
Google Scholar
Jin, Y., et al.: Image matching across wide baselines: from paper to practice. IJCV 129(2), 517–547 (2021)
Article Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: ICCV (2015)
Google Scholar
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM TOG (2023)
Google Scholar
Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM TOG 36(4), 1–13 (2017)
Article Google Scholar
Kraus, K.: Photogrammetry. No. v. 1 in Photogrammetry, Ferdinand Dummlers Verlag (1993)
Google Scholar
Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. In: ICCV Workshops (2017)
Google Scholar
Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: CVPR (2020)
Google Scholar
Lin, A., Zhang, J.Y., Ramanan, D., Tulsiani, S.: Relpose++: recovering 6D poses from sparse-view observations. In: 3DV (2024)
Google Scholar
Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: BARF: bundle-adjusting neural radiance fields. In: ICCV (2021)
Google Scholar
Lin, Y., et al.: Parallel inversion of neural radiance fields for robust pose estimation. In: ICRA (2023)
Google Scholar
Lindenberger, P., Sarlin, P.E., Pollefeys, M.: LightGlue: local feature matching at light speed. In: ICCV (2023)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004)
Google Scholar
Martinec, D., Pajdla, T.: Robust rotation and translation estimation in multiview reconstruction. In: CVPR (2007)
Google Scholar
Meng, Q., et al.: GNeRF: GAN-based neural radiance field without posed camera. In: ICCV (2021)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Google Scholar
Moreau, A., Piasco, N., Bennehar, M., Tsishkou, D., Stanciulescu, B., de La Fortelle, A.: CROSSFIRE: camera relocalization on self-supervised features from an implicit representation. ICCV (2023)
Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG (2022)
Google Scholar
Newcombe, R., et al.: KinectFusion: real-time dense surface mapping and tracking. In: ISMAR (2011)
Google Scholar
Nistér, D., Naroditsky, O., Bergen, J.: Visual odometry. In: CVPR (2004)
Google Scholar
Pollefeys, M., Koch, R., Vergauwen, M., Van Gool, L.: Automated reconstruction of 3D scenes from sequences of images. J. Photogr. Rem. Sens. (2000)
Google Scholar
Rau, A., Garcia-Hernando, G., Stoyanov, D., Brostow, G.J., Turmukhambetov, D.: Predicting visual overlap of images through interpretable non-metric box embeddings. In: ECCV (2020)
Google Scholar
Reality, C.: Reality Capture (2016). https://www.capturingreality.com/realitycapture. Accessed 15 Nov 2023
Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3D: large-scale learning and evaluation of real-life 3D category reconstruction. In: ICCV (2021)
Google Scholar
Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: CVPR (2019)
Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Google Scholar
Sarlin, P.E., et al.: Back to the feature: learning robust camera localization from pixels to pose. In: CVPR (2021)
Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2D-to-3D matching. In: ICCV (2011)
Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: ECCV (2012)
Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: Efficient & effective prioritized matching for large-scale image-based localization. IEEE TPAMI 39(9), 1744–1756 (2016)
Article Google Scholar
Sattler, T., et al.: Are large-scale 3D models really necessary for accurate visual localization? In: CVPR (2017)
Google Scholar
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: CVPR (2019)
Google Scholar
Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or “how do i organize my holiday snaps?”. In: ECCV (2002)
Google Scholar
Schönberger, J.L.: Colmap Github Issues (2017). https://github.com/colmap/colmap/issues/116#issuecomment-298926277. Accessed 15 Nov 2023
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Google Scholar
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)
Google Scholar
Sinha, S., Zhang, J.Y., Tagliasacchi, A., Gilitschenski, I., Lindell, D.B.: SparsePose: sparse-view camera pose regression and refinement. In: CVPR (2023)
Google Scholar
Smith, L.N., Topin, N.: Super-convergence: very fast training of neural networks using large learning rates. In: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications (2019)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM TOG (2006)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. IJCV (2008)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Skeletal graphs for efficient structure from motion. In: CVPR (2008)
Google Scholar
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: CVPR (2021)
Google Scholar
Szeliski, R., Kang, S.B.: Recovering 3D shape and motion from image streams using nonlinear least squares. J. Vis. Comut. Image Repr. 5(1), 10–28 (1994)
Article Google Scholar
Tancik, M., et al.: Nerfstudio: a modular framework for neural radiance field development. In: ACM TOG (2023)
Google Scholar
Teed, Z., Deng, J.: DROID-SLAM: deep visual SLAM for monocular, stereo, and RGB-D cameras. In: NeurIPS (2021)
Google Scholar
Toldo, R., Gherardi, R., Farenzena, M., Fusiello, A.: Hierarchical structure-and-motion recovery from uncalibrated images. CVIU (2015)
Google Scholar
Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment — a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44480-7_21
Chapter Google Scholar
Türkoğlu, M.Ö., Brachmann, E., Schindler, K., Brostow, G., Monszpart, A.: Visual camera re-localization using graph neural networks and relative pose supervision. In: 3DV (2021)
Google Scholar
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Deep image prior. In: CVPR (2018)
Google Scholar
Ummenhofer, B., et al.: DeMoN: depth and motion network for learning monocular stereo. In: CVPR (2017)
Google Scholar
Waechter, M., Beljan, M., Fuhrmann, S., Moehrle, N., Kopf, J., Goesele, M.: Virtual rephotography: novel view prediction error for 3D reconstruction. ACM TOG 36(1), 1–11 (2017)
Article Google Scholar
Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: DUSt3R: geometric 3D vision made easy. In: CVPR (2024)
Google Scholar
Wang, Z., Wu, S., Xie, W., Chen, M., Prisacariu, V.A.: NeRF–: neural radiance fields without known camera parameters. arXiv (2021)
Google Scholar
Wei, X., Zhang, Y., Li, Z., Fu, Y., Xue, X.: DeepSFM: structure from motion via deep bundle adjustment. In: ECCV (2020)
Google Scholar
Wu, C.: Towards linear-time incremental structure from motion. In: 3DV (2013)
Google Scholar
Xia, Y., Tang, H., Timofte, R., Van Gool, L.: SiNeRF: sinusoidal neural radiance fields for joint pose estimation and scene reconstruction. In: BMVC (2022)
Google Scholar
Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: iNeRF: inverting neural radiance fields for pose estimation. In: IROS (2021)
Google Scholar
Zhang, J.Y., Lin, A., Kumar, M., Yang, T.H., Ramanan, D., Tulsiani, S.: Cameras as rays: pose estimation via ray diffusion. In: ICLR (2024)
Google Scholar
Zhang, W., Kosecka, J.: Image based localization in urban environments. In: 3DPVT (2006)
Google Scholar
Zhou, Q., Sattler, T., Pollefeys, M., Leal-Taixe, L.: To learn or not to learn: visual localization from essential matrices. In: ICRA (2020)
Google Scholar
Zhou, Y., Barnes, C., Jingwan, L., Jimei, Y., Hao, L.: On the continuity of rotation representations in neural networks. In: CVPR (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Niantic, London, UK
Eric Brachmann, Jamie Wynn, Tommaso Cavallari, Áron Monszpart, Daniyar Turmukhambetov & Victor Adrian Prisacariu
University of Oxford, Oxford, UK
Shuai Chen & Victor Adrian Prisacariu

Authors

Eric Brachmann
View author publications
You can also search for this author in PubMed Google Scholar
Jamie Wynn
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tommaso Cavallari
View author publications
You can also search for this author in PubMed Google Scholar
Áron Monszpart
View author publications
You can also search for this author in PubMed Google Scholar
Daniyar Turmukhambetov
View author publications
You can also search for this author in PubMed Google Scholar
Victor Adrian Prisacariu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eric Brachmann .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brachmann, E. et al. (2025). Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15114. Springer, Cham. https://doi.org/10.1007/978-3-031-72992-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-72992-8_24
Published: 30 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72991-1
Online ISBN: 978-3-031-72992-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer