Skip to main content

Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15114))

Included in the following conference series:

  • 610 Accesses

Abstract

We address the task of estimating camera parameters from a set of images depicting a scene. Popular feature-based structure-from-motion (SfM) tools solve this task by incremental reconstruction: they repeat triangulation of sparse 3D points and registration of more camera views to the sparse point cloud. We re-interpret incremental structure-from-motion as an iterated application and refinement of a visual relocalizer, that is, of a method that registers new views to the current state of the reconstruction. This perspective allows us to investigate alternative visual relocalizers that are not rooted in local feature matching. We show that scene coordinate regression, a learning-based relocalization approach, allows us to build implicit, neural scene representations from unposed images. Different from other learning-based reconstruction methods, we do not require pose priors nor sequential inputs, and we optimize efficiently over thousands of images. In many cases, our method, ACE0, estimates camera poses with an accuracy close to feature-based SfM, as demonstrated by novel view synthesis.

Project page: https://nianticlabs.github.io/acezero/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agarwal, S., et al.: Building Rome in a day. ACM TOG (2011)

    Google Scholar 

  2. Agarwal, S., Snavely, N., Seitz, S.M., Szeliski, R.: Bundle adjustment in the large. In: ECCV (2010)

    Google Scholar 

  3. Arnold, E., et al.: Map-free visual relocalization: metric pose relative to a single image. In: ECCV (2022)

    Google Scholar 

  4. Balntas, V., Li, S., Prisacariu, V.A.: RelocNet: continuous metric learning relocalisation using neural nets. In: ECCV (2018)

    Google Scholar 

  5. Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-NeRF 360: unbounded anti-aliased neural radiance fields. In: CVPR (2022)

    Google Scholar 

  6. Beardsley, P.A., Zisserman, A., Murray, D.W.: Sequential updating of projective and affine structure from motion. IJCV 23, 235–259 (1997)

    Article  Google Scholar 

  7. Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., Müller, M.: ZoeDepth: zero-shot transfer by combining relative and metric depth. arXiv (2023)

    Google Scholar 

  8. Bhowmick, B., Patra, S., Chatterjee, A., Govindu, V.M., Banerjee, S.: Divide and conquer: efficient large-scale structure from motion using graph partitioning. In: ACCV (2015)

    Google Scholar 

  9. Bhowmick, B., Patra, S., Chatterjee, A., Govindu, V.M., Banerjee, S.: Divide and conquer: a hierarchical approach to large-scale structure-from-motion. CVIU 157, 190–205 (2017)

    Google Scholar 

  10. Bian, W., Wang, Z., Li, K., Bian, J.W., Prisacariu, V.A.: NoPe-NeRF: optimising neural radiance field with no pose prior. In: CVPR (2023)

    Google Scholar 

  11. Bloesch, M., Czarnowski, J., Clark, R., Leutenegger, S., Davison, A.J.: CodeSLAM — learning a compact, optimisable representation for dense visual SLAM. In: CVPR (2018)

    Google Scholar 

  12. Brachmann, E., Cavallari, T., Prisacariu, V.A.: Accelerated coordinate encoding: learning to relocalize in minutes using RGB and poses. In: CVPR (2023)

    Google Scholar 

  13. Brachmann, E., Humenberger, M., Rother, C., Sattler, T.: On the limits of pseudo ground truth in visual camera re-localisation. In: ICCV (2021)

    Google Scholar 

  14. Brachmann, E., et al.: DSAC-differentiable RANSAC for camera localization. In: CVPR (2017)

    Google Scholar 

  15. Brachmann, E., Rother, C.: Learning less is more-6D camera localization via 3D surface regression. In: CVPR (2018)

    Google Scholar 

  16. Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. In: ICCV (2019)

    Google Scholar 

  17. Brachmann, E., Rother, C.: Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE TPAMI 44(9), 5847–5865 (2021)

    Google Scholar 

  18. Brégier, R.: Deep regression on manifolds: a 3D rotation case study. In: 3DV (2021)

    Google Scholar 

  19. Brown, D.: The bundle adjustment-progress and prospect. In: Congress of the International Society for Photogrammetry (1976)

    Google Scholar 

  20. Brown, M., Lowe, D.G.: Unsupervised 3D object recognition and reconstruction in unordered datasets. In: 3DIM (2005)

    Google Scholar 

  21. Carlone, L., Tron, R., Daniilidis, K., Dellaert, F.: Initialization techniques for 3D SLAM: a survey on rotation estimation and its use in pose graph optimization. In: ICRA (2015)

    Google Scholar 

  22. Cavallari, T., Bertinetto, L., Mukhoti, J., Torr, P.H., Golodetz, S.: Let’s take this online: adapting scene coordinate regression network predictions for online RGB-D camera relocalisation. In: 3DV (2019)

    Google Scholar 

  23. Cavallari, T., Golodetz, S., Lord, N.A., Valentin, J., Di Stefano, L., Torr, P.H.: On-the-fly adaptation of regression forests for online camera relocalisation. In: CVPR (2017)

    Google Scholar 

  24. Chen, S., Bhalgat, Y., Li, X., Bian, J., Li, K., Wang, Z., Prisacariu, V.A.: Neural refinement for absolute pose regression with feature synthesis. In: CVPR (2024)

    Google Scholar 

  25. Chen, S., Li, X., Wang, Z., Prisacariu, V.: DFNet: enhance absolute pose regression with direct feature matching. In: ECCV (2022)

    Google Scholar 

  26. Chen, S., Wang, Z., Prisacariu, V.: Direct-PoseNet: absolute pose regression with photometric consistency. In: 3DV (2021)

    Google Scholar 

  27. Cheng, Z., Esteves, C., Jampani, V., Kar, A., Maji, S., Makadia, A.: LU-NeRF: scene and pose estimation by synchronizing local unposed NeRFs. In: ICCV (2023)

    Google Scholar 

  28. Crandall, D., Owens, A., Snavely, N., Huttenlocher, D.: Discrete-continuous optimization for large-scale structure from motion. In: CVPR (2011)

    Google Scholar 

  29. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)

    Google Scholar 

  30. Davison, A.J.: Real-time simultaneous localisation and mapping with a single camera. In: ICCV (2003)

    Google Scholar 

  31. DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPRW (2018)

    Google Scholar 

  32. Ding, M., Wang, Z., Sun, J., Shi, J., Luo, P.: CamNet: coarse-to-fine retrieval for camera re-localization. In: ICCV (2019)

    Google Scholar 

  33. Dusmanu, M., et al.: D2-net: a trainable CNN for joint description and detection of local features. In: CVPR (2019)

    Google Scholar 

  34. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  35. Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE TPAMI 25(8), 930–943 (2003)

    Article  Google Scholar 

  36. Gherardi, R., Farenzena, M., Fusiello, A.: Improving the efficiency of hierarchical structure-and-motion. In: CVPR (2010)

    Google Scholar 

  37. Govindu, V.M.: Combining two-view constraints for motion estimation. In: CVPR (2001)

    Google Scholar 

  38. Govindu, V.M.: Lie-algebraic averaging for globally consistent motion estimation. In: CVPR (2004)

    Google Scholar 

  39. Hartley, R., Trumpf, J., Dai, Y., Li, H.: Rotation averaging. IJCV (2013)

    Google Scholar 

  40. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  41. He, X., et al.: Detector-free structure from motion. In: CVPR (2024)

    Google Scholar 

  42. Heinly, J., Schönberger, J.L., Dunn, E., Frahm, J.M.: Reconstructing the World in six days. In: CVPR (2015)

    Google Scholar 

  43. Humenberger, M., et al.: Investigating the role of image retrieval for visual localization: an exhaustive benchmark. IJCV 130(7), 1811–1836 (2022)

    Article  Google Scholar 

  44. Izadi, S., et al.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: UIST (2011)

    Google Scholar 

  45. Jeong, Y., Ahn, S., Choy, C., Anandkumar, A., Cho, M., Park, J.: Self-calibrating neural radiance fields. In: ICCV (2021)

    Google Scholar 

  46. Jin, Y., et al.: Image matching across wide baselines: from paper to practice. IJCV 129(2), 517–547 (2021)

    Article  Google Scholar 

  47. Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: ICCV (2015)

    Google Scholar 

  48. Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM TOG (2023)

    Google Scholar 

  49. Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM TOG 36(4), 1–13 (2017)

    Article  Google Scholar 

  50. Kraus, K.: Photogrammetry. No. v. 1 in Photogrammetry, Ferdinand Dummlers Verlag (1993)

    Google Scholar 

  51. Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. In: ICCV Workshops (2017)

    Google Scholar 

  52. Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: CVPR (2020)

    Google Scholar 

  53. Lin, A., Zhang, J.Y., Ramanan, D., Tulsiani, S.: Relpose++: recovering 6D poses from sparse-view observations. In: 3DV (2024)

    Google Scholar 

  54. Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: BARF: bundle-adjusting neural radiance fields. In: ICCV (2021)

    Google Scholar 

  55. Lin, Y., et al.: Parallel inversion of neural radiance fields for robust pose estimation. In: ICRA (2023)

    Google Scholar 

  56. Lindenberger, P., Sarlin, P.E., Pollefeys, M.: LightGlue: local feature matching at light speed. In: ICCV (2023)

    Google Scholar 

  57. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)

    Google Scholar 

  58. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004)

    Google Scholar 

  59. Martinec, D., Pajdla, T.: Robust rotation and translation estimation in multiview reconstruction. In: CVPR (2007)

    Google Scholar 

  60. Meng, Q., et al.: GNeRF: GAN-based neural radiance field without posed camera. In: ICCV (2021)

    Google Scholar 

  61. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)

    Google Scholar 

  62. Moreau, A., Piasco, N., Bennehar, M., Tsishkou, D., Stanciulescu, B., de La Fortelle, A.: CROSSFIRE: camera relocalization on self-supervised features from an implicit representation. ICCV (2023)

    Google Scholar 

  63. Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG (2022)

    Google Scholar 

  64. Newcombe, R., et al.: KinectFusion: real-time dense surface mapping and tracking. In: ISMAR (2011)

    Google Scholar 

  65. Nistér, D., Naroditsky, O., Bergen, J.: Visual odometry. In: CVPR (2004)

    Google Scholar 

  66. Pollefeys, M., Koch, R., Vergauwen, M., Van Gool, L.: Automated reconstruction of 3D scenes from sequences of images. J. Photogr. Rem. Sens. (2000)

    Google Scholar 

  67. Rau, A., Garcia-Hernando, G., Stoyanov, D., Brostow, G.J., Turmukhambetov, D.: Predicting visual overlap of images through interpretable non-metric box embeddings. In: ECCV (2020)

    Google Scholar 

  68. Reality, C.: Reality Capture (2016). https://www.capturingreality.com/realitycapture. Accessed 15 Nov 2023

  69. Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3D: large-scale learning and evaluation of real-life 3D category reconstruction. In: ICCV (2021)

    Google Scholar 

  70. Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: CVPR (2019)

    Google Scholar 

  71. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)

    Google Scholar 

  72. Sarlin, P.E., et al.: Back to the feature: learning robust camera localization from pixels to pose. In: CVPR (2021)

    Google Scholar 

  73. Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2D-to-3D matching. In: ICCV (2011)

    Google Scholar 

  74. Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: ECCV (2012)

    Google Scholar 

  75. Sattler, T., Leibe, B., Kobbelt, L.: Efficient & effective prioritized matching for large-scale image-based localization. IEEE TPAMI 39(9), 1744–1756 (2016)

    Article  Google Scholar 

  76. Sattler, T., et al.: Are large-scale 3D models really necessary for accurate visual localization? In: CVPR (2017)

    Google Scholar 

  77. Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: CVPR (2019)

    Google Scholar 

  78. Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or “how do i organize my holiday snaps?”. In: ECCV (2002)

    Google Scholar 

  79. Schönberger, J.L.: Colmap Github Issues (2017). https://github.com/colmap/colmap/issues/116#issuecomment-298926277. Accessed 15 Nov 2023

  80. Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)

    Google Scholar 

  81. Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)

    Google Scholar 

  82. Sinha, S., Zhang, J.Y., Tagliasacchi, A., Gilitschenski, I., Lindell, D.B.: SparsePose: sparse-view camera pose regression and refinement. In: CVPR (2023)

    Google Scholar 

  83. Smith, L.N., Topin, N.: Super-convergence: very fast training of neural networks using large learning rates. In: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications (2019)

    Google Scholar 

  84. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM TOG (2006)

    Google Scholar 

  85. Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. IJCV (2008)

    Google Scholar 

  86. Snavely, N., Seitz, S.M., Szeliski, R.: Skeletal graphs for efficient structure from motion. In: CVPR (2008)

    Google Scholar 

  87. Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: CVPR (2021)

    Google Scholar 

  88. Szeliski, R., Kang, S.B.: Recovering 3D shape and motion from image streams using nonlinear least squares. J. Vis. Comut. Image Repr. 5(1), 10–28 (1994)

    Article  Google Scholar 

  89. Tancik, M., et al.: Nerfstudio: a modular framework for neural radiance field development. In: ACM TOG (2023)

    Google Scholar 

  90. Teed, Z., Deng, J.: DROID-SLAM: deep visual SLAM for monocular, stereo, and RGB-D cameras. In: NeurIPS (2021)

    Google Scholar 

  91. Toldo, R., Gherardi, R., Farenzena, M., Fusiello, A.: Hierarchical structure-and-motion recovery from uncalibrated images. CVIU (2015)

    Google Scholar 

  92. Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment — a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44480-7_21

    Chapter  Google Scholar 

  93. Türkoğlu, M.Ö., Brachmann, E., Schindler, K., Brostow, G., Monszpart, A.: Visual camera re-localization using graph neural networks and relative pose supervision. In: 3DV (2021)

    Google Scholar 

  94. Ulyanov, D., Vedaldi, A., Lempitsky, V.: Deep image prior. In: CVPR (2018)

    Google Scholar 

  95. Ummenhofer, B., et al.: DeMoN: depth and motion network for learning monocular stereo. In: CVPR (2017)

    Google Scholar 

  96. Waechter, M., Beljan, M., Fuhrmann, S., Moehrle, N., Kopf, J., Goesele, M.: Virtual rephotography: novel view prediction error for 3D reconstruction. ACM TOG 36(1), 1–11 (2017)

    Article  Google Scholar 

  97. Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: DUSt3R: geometric 3D vision made easy. In: CVPR (2024)

    Google Scholar 

  98. Wang, Z., Wu, S., Xie, W., Chen, M., Prisacariu, V.A.: NeRF–: neural radiance fields without known camera parameters. arXiv (2021)

    Google Scholar 

  99. Wei, X., Zhang, Y., Li, Z., Fu, Y., Xue, X.: DeepSFM: structure from motion via deep bundle adjustment. In: ECCV (2020)

    Google Scholar 

  100. Wu, C.: Towards linear-time incremental structure from motion. In: 3DV (2013)

    Google Scholar 

  101. Xia, Y., Tang, H., Timofte, R., Van Gool, L.: SiNeRF: sinusoidal neural radiance fields for joint pose estimation and scene reconstruction. In: BMVC (2022)

    Google Scholar 

  102. Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: iNeRF: inverting neural radiance fields for pose estimation. In: IROS (2021)

    Google Scholar 

  103. Zhang, J.Y., Lin, A., Kumar, M., Yang, T.H., Ramanan, D., Tulsiani, S.: Cameras as rays: pose estimation via ray diffusion. In: ICLR (2024)

    Google Scholar 

  104. Zhang, W., Kosecka, J.: Image based localization in urban environments. In: 3DPVT (2006)

    Google Scholar 

  105. Zhou, Q., Sattler, T., Pollefeys, M., Leal-Taixe, L.: To learn or not to learn: visual localization from essential matrices. In: ICRA (2020)

    Google Scholar 

  106. Zhou, Y., Barnes, C., Jingwan, L., Jimei, Y., Hao, L.: On the continuity of rotation representations in neural networks. In: CVPR (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric Brachmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Brachmann, E. et al. (2025). Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15114. Springer, Cham. https://doi.org/10.1007/978-3-031-72992-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72992-8_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72991-1

  • Online ISBN: 978-3-031-72992-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics