Skip to main content

Global Structure-from-Motion Revisited

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15098))

Included in the following conference series:

  • 618 Accesses

Abstract

Recovering 3D structure and camera motion from images has been a long-standing focus of computer vision research and is known as Structure-from-Motion (SfM). Solutions to this problem are categorized into incremental and global approaches. Until now, the most popular systems follow the incremental paradigm due to its superior accuracy and robustness, while global approaches are drastically more scalable and efficient. With this work, we revisit the problem of global SfM and propose GLOMAP as a new general-purpose system that outperforms the state of the art in global SfM. In terms of accuracy and robustness, we achieve results on-par or superior to COLMAP, the most widely used incremental SfM, while being orders of magnitude faster. We share our system as an open-source implementation at https://github.com/colmap/glomap.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kapture toolbox. https://github.com/naver/kapture

  2. Abdel-Aziz, Y.I., Karara, H.M., Hauck, M.: Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry. Photogram. Eng. Remote Sens. 81(2), 103–107 (2015)

    Article  Google Scholar 

  3. Agarwal, S., Mierle, K., Team, T.C.S.: Ceres Solver (2022)

    Google Scholar 

  4. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016)

    Google Scholar 

  5. Arie-Nachimson, M., Kovalsky, S.Z., Kemelmacher-Shlizerman, I., Singer, A., Basri, R.: Global motion estimation from point matches. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 81–88. IEEE (2012)

    Google Scholar 

  6. Arrigoni, F., Fusiello, A.: Bearing-based network localizability: a unifying view. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2049–2069 (2018)

    Article  Google Scholar 

  7. Arrigoni, F., Fusiello, A., Rossi, B.: On computing the translations norm in the epipolar graph. In: 2015 International Conference on 3D Vision, pp. 300–308. IEEE (2015)

    Google Scholar 

  8. Barath, D., Noskova, J., Ivashechkin, M., Matas, J.: MAGSAC++, a fast, reliable and accurate robust estimator. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1304–1312 (2020)

    Google Scholar 

  9. Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-NeRF 360: unbounded anti-aliased neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5470–5479 (2022)

    Google Scholar 

  10. Cai, Q., Zhang, L., Wu, Y., Yu, W., Hu, D.: A pose-only solution to visual reconstruction and navigation. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 73–86 (2021)

    Article  Google Scholar 

  11. Cai, R., Tung, J., Wang, Q., Averbuch-Elor, H., Hariharan, B., Snavely, N.: Doppelgangers: learning to disambiguate images of similar structures. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 34–44 (2023)

    Google Scholar 

  12. Carlone, L., Aragues, R., Castellanos, J.A., Bona, B.: A linear approximation for graph-based simultaneous localization and mapping. In: Robotics: Science and Systems, vol. 7, pp. 41–48. MIT Press Cambridge (2012)

    Google Scholar 

  13. Carlone, L., Calafiore, G.C.: Convex relaxations for pose graph optimization with outliers. IEEE Robot. Autom. Lett. 3(2), 1160–1167 (2018)

    Article  Google Scholar 

  14. Chatterjee, A., Govindu, V.M.: Efficient and robust large-scale rotation averaging. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 521–528 (2013)

    Google Scholar 

  15. Chatterjee, A., Govindu, V.M.: Robust relative rotation averaging. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 958–972 (2017)

    Article  Google Scholar 

  16. Chow, A., et al.: Image matching challenge 2023 (2023). https://kaggle.com/competitions/image-matching-challenge-2023

  17. Cui, H., Gao, X., Shen, S., Hu, Z.: HSFM: hybrid structure-from-motion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1212–1221 (2017)

    Google Scholar 

  18. Cui, Z., Jiang, N., Tang, C., Tan, P.: Linear global translation estimation with feature tracks. arXiv preprint arXiv:1503.01832 (2015)

  19. Cui, Z., Tan, P.: Global structure-from-motion by similarity averaging. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  20. Dellaert, F., Rosen, D.M., Wu, J., Mahony, R., Carlone, L.: Shonan rotation averaging: global optimality by surfing \(SO(p)^n\). In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 292–308. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_18

    Chapter  Google Scholar 

  21. DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)

    Google Scholar 

  22. Eriksson, A., Olsson, C., Kahl, F., Chin, T.J.: Rotation averaging and strong duality. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 127–135 (2018)

    Google Scholar 

  23. Fredriksson, J., Olsson, C.: Simultaneous multiple rotation averaging using lagrangian duality. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7726, pp. 245–258. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37431-9_19

    Chapter  Google Scholar 

  24. Govindu, V.M.: Combining two-view constraints for motion estimation. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 2, p. II. IEEE (2001)

    Google Scholar 

  25. Hartley, R., Aftab, K., Trumpf, J.: L1 rotation averaging using the weiszfeld algorithm. In: CVPR 2011, pp. 3041–3048. IEEE (2011)

    Google Scholar 

  26. Hartley, R., Trumpf, J., Dai, Y., Li, H.: Rotation averaging. Int. J. Comput. Vision 103, 267–305 (2013)

    Article  MathSciNet  Google Scholar 

  27. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  28. Hartley, R.I.: Cheirality invariants. In: Proc. DARPA Image Understanding Workshop, vol. 3. Citeseer (1993)

    Google Scholar 

  29. Hartley, R.I., Sturm, P.: Triangulation. Comput. Vision Image Underst. 68(2), 146–157 (1997)

    Article  Google Scholar 

  30. He, X., et al.: Detector-free structure from motion. arXiv preprint arXiv:2306.15669 (2023)

  31. Henry, S., Christian, J.A.: Absolute triangulation algorithms for space exploration. J. Guid. Control. Dyn. 46(1), 21–46 (2023)

    Article  Google Scholar 

  32. Holynski, A., Geraghty, D., Frahm, J.M., Sweeney, C., Szeliski, R.: Reducing drift in structure from motion using extended features. In: 2020 International Conference on 3D Vision (3DV), pp. 51–60. IEEE (2020)

    Google Scholar 

  33. Huber, P.J.: Robust estimation of a location parameter. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics. Springer Series in Statistics, pp. 492–518. Springer, New York (1992). https://doi.org/10.1007/978-1-4612-4380-9_35

    Chapter  Google Scholar 

  34. Jiang, N., Cui, Z., Tan, P.: A global linear method for camera pose registration. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 481–488 (2013)

    Google Scholar 

  35. Kennedy, R., Daniilidis, K., Naroditsky, O., Taylor, C.J.: Identifying maximal rigid components in bearing-based localization. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 194–201. IEEE (2012)

    Google Scholar 

  36. Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4) (2023)

    Google Scholar 

  37. Kipman, A.: Azure Spatial Anchors approach to privacy and ethical design (2019). https://www.linkedin.com/pulse/azure-spatial-anchors-approach-privacy-ethical-design-alex-kipman

  38. Levenberg, K.: A method for the solution of certain non-linear problems in least squares. Q. Appl. Math. 2(2), 164–168 (1944)

    Article  MathSciNet  Google Scholar 

  39. Li, X., Ling, H.: Pogo-net: pose graph optimization with graph neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5895–5905 (2021)

    Google Scholar 

  40. Lindenberger, P., Sarlin, P.E., Larsson, V., Pollefeys, M.: Pixel-perfect structure-from-motion with feature metric refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5987–5997 (2021)

    Google Scholar 

  41. Liu, Z., Qv, W., Cai, H., Guan, H., Zhang, S.: An efficient and robust hybrid SFM method for large-scale scenes. Remote Sens. 15(3), 769 (2023)

    Article  Google Scholar 

  42. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)

    Article  Google Scholar 

  43. Lu, F., Hartley, R.: A fast optimal algorithm for \(L_{2}\) triangulation. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007. LNCS, vol. 4844, pp. 279–288. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76390-1_28

    Chapter  Google Scholar 

  44. Manam, L., Govindu, V.M.: Correspondence reweighted translation averaging. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13693, pp. 56–72. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_4

    Chapter  Google Scholar 

  45. Manam, L., Govindu, V.M.: Sensitivity in translation averaging. Adv. Neural Inf. Process. Syst. 36 (2024)

    Google Scholar 

  46. Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 11(2), 431–441 (1963)

    Article  MathSciNet  Google Scholar 

  47. Martinec, D., Pajdla, T.: Robust rotation and translation estimation in multiview reconstruction. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)

    Google Scholar 

  48. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)

    Article  Google Scholar 

  49. Moisan, L., Moulon, P., Monasse, P.: Automatic homographic registration of a pair of images, with a contrario elimination of outliers. Image Process. Line 2, 56–73 (2012)

    Article  Google Scholar 

  50. Moulon, P., Monasse, P.: Unordered feature tracking made fast and easy. In: CVMP 2012, p. 1 (2012)

    Google Scholar 

  51. Moulon, P., Monasse, P., Perrot, R., Marlet, R.: OpenMVG: open multiple view geometry. In: International Workshop on Reproducible Research in Pattern Recognition (2016)

    Google Scholar 

  52. Ozyesil, O., Singer, A.: Robust camera location estimation by convex programming. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2674–2683 (2015)

    Google Scholar 

  53. Ozyesil, O., Singer, A., Basri, R.: Stable camera motion estimation using convex programming. SIAM J. Imag. Sci. 8(2), 1220–1262 (2015)

    Article  MathSciNet  Google Scholar 

  54. Purkait, P., Chin, T.-J., Reid, I.: NeuRoRA: neural robust rotation averaging. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 137–154. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_9

    Chapter  Google Scholar 

  55. Reinhardt, T.: Google visual positioning service (2019). https://ai.googleblog.com/2019/02/using-global-localization-to-improve.html

  56. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020)

    Google Scholar 

  57. Sarlin, P.E., et al.: LaMAR: Benchmarking localization and mapping for augmented reality. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13667, pp. 686–704. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20071-7_40

    Chapter  Google Scholar 

  58. Schönberger, J.L.: Robust methods for accurate and efficient 3D modeling from unstructured imagery. Ph.D. thesis, ETH Zürich (2018)

    Google Scholar 

  59. Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  60. Schönberger, J.L., Price, T., Sattler, T., Frahm, J.-M., Pollefeys, M.: A vote-and-verify strategy for fast spatial verification in image retrieval. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10111, pp. 321–337. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54181-5_21

    Chapter  Google Scholar 

  61. Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31

    Chapter  Google Scholar 

  62. Schöps, T., Sattler, T., Pollefeys, M.: BAD SLAM: bundle adjusted direct RGB-D SLAM. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  63. Schöps, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  64. Servatius, B., Whiteley, W.: Constraining plane configurations in computer-aided design: combinatorics of directions and lengths. SIAM J. Discret. Math. 12(1), 136–153 (1999)

    Article  MathSciNet  Google Scholar 

  65. Sidhartha, C., Govindu, V.M.: It is all in the weights: robust rotation averaging revisited. In: 2021 International Conference on 3D Vision (3DV), pp. 1134–1143. IEEE (2021)

    Google Scholar 

  66. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. In: ACM SIGGRAPH 2006 Papers, pp. 835–846 (2006)

    Google Scholar 

  67. Sweeney, C.: Theia multiview geometry library: tutorial & reference. http://theia-sfm.org

  68. Sweeney, C., Sattler, T., Hollerer, T., Turk, M., Pollefeys, M.: Optimizing the viewing graph for structure-from-motion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 801–809 (2015)

    Google Scholar 

  69. Tejus, G., Zara, G., Rota, P., Fusiello, A., Ricci, E., Arrigoni, F.: Rotation synchronization via deep matrix factorization. arXiv preprint arXiv:2305.05268 (2023)

  70. Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44480-7_21

    Chapter  Google Scholar 

  71. Ullman, S.: The interpretation of structure from motion. Proc. Roy. Soc. London Ser. B Biol. Sci. 203(1153), 405–426 (1979)

    Google Scholar 

  72. Wang, J., Karaev, N., Rupprecht, C., Novotny, D.: Visual geometry grounded deep structure from motion (2023)

    Google Scholar 

  73. Werner, T., Pajdla, T.: Cheirality in epipolar geometry. In: Proceedings Eighth IEEE International Conference on Computer Vision, ICCV 2001. vol. 1, pp. 548–553. IEEE (2001)

    Google Scholar 

  74. Wilson, K., Bindel, D., Snavely, N.: When is rotations averaging hard? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 255–270. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_16

    Chapter  Google Scholar 

  75. Wilson, K., Snavely, N.: Robust global translations with 1DSfM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 61–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_5

    Chapter  Google Scholar 

  76. Wu, C.: Towards linear-time incremental structure from motion. In: 2013 International Conference on 3D Vision-3DV 2013, pp. 127–134. IEEE (2013)

    Google Scholar 

  77. Yang, L., Li, H., Rahim, J.A., Cui, Z., Tan, P.: End-to-end rotation averaging with multi-source propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11774–11783 (2021)

    Google Scholar 

  78. Zhang, G., Larsson, V., Barath, D.: Revisiting rotation averaging: uncertainties and robust losses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17215–17224 (2023)

    Google Scholar 

  79. Zhang, J.Y., Lin, A., Kumar, M., Yang, T.H., Ramanan, D., Tulsiani, S.: Cameras as rays: pose estimation via ray diffusion. arXiv preprint arXiv:2402.14817 (2024)

  80. Zhao, S., Zelazo, D.: Localizability and distributed protocols for bearing-based network localization in arbitrary dimensions. Automatica 69, 334–341 (2016)

    Article  MathSciNet  Google Scholar 

  81. Zhuang, B., Cheong, L.F., Lee, G.H.: Baseline desensitizing in translation averaging. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4539–4547 (2018)

    Google Scholar 

Download references

Acknowledgment

The authors thank Philipp Lindenberger for the thoughtful discussions and comments on the text. This work was partially funded by the Hasler Stiftung Research Grant via the ETH Zurich Foundation and the ETH Zurich Career Seed Award. Linfei Pan was supported by gift funding from Microsoft.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linfei Pan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 33415 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pan, L., Baráth, D., Pollefeys, M., Schönberger, J.L. (2025). Global Structure-from-Motion Revisited. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15098. Springer, Cham. https://doi.org/10.1007/978-3-031-73661-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73661-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73660-5

  • Online ISBN: 978-3-031-73661-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics