Abstract
Structure-from-Motion (SfM) has become a ubiquitous tool for camera calibration and scene reconstruction with many downstream applications in computer vision and beyond. While the state-of-the-art SfM pipelines have reached a high level of maturity in well-textured and well-configured scenes over the last decades, they still fall short of robustly solving the SfM problem in challenging scenarios. In particular, weakly textured scenes and poorly constrained configurations oftentimes cause catastrophic failures or large errors for the primarily keypoint-based pipelines. In these scenarios, line segments are often abundant and can offer complementary geometric constraints. Their large spatial extent and typically structured configurations lead to stronger geometric constraints as compared to traditional keypoint-based methods. In this work, we introduce an incremental SfM system that, in addition to points, leverages lines and their structured geometric relations. Our technical contributions span the entire pipeline (mapping, triangulation, registration) and we integrate these into a comprehensive end-to-end SfM system that we share as an open-source software with the community. We also present the first analytical method to propagate uncertainties for 3D optimized lines via sensitivity analysis. Experiments show that our system is consistently more robust and accurate compared to the widely used point-based state of the art in SfM – achieving richer maps and more precise camera registrations, especially under challenging conditions. In addition, our uncertainty-aware localization module alone is able to consistently improve over the state of the art under both point-alone and hybrid setups.
S. Liu, Y. Gao and T. Zhang—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdellali, H., Frohlich, R., Vilagos, V., Kato, Z.: L2d2: learnable line detector and descriptor. In: International Conference on 3D Vision (3DV) (2021)
Agarwal, S., et al.: Building Rome in a day. Commun. ACM 54(10), 105–112 (2011)
Agarwal, S., Mierle, K.: Ceres solver. http://ceres-solver.org
Agarwal, S., Snavely, N., Seitz, S.M., Szeliski, R.: Bundle adjustment in the large. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 29–42. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_3
Åström, K., Kahl, F., Heyden, A., Berthilsson, R.: A statistical approach to structure and motion from image features. In: Advances in Pattern Recognition: Joint IAPR International Workshops SSPR 1998 and SPR 1998 (1998)
Bartoli, A., Coquerelle, M., Sturm, P.: A framework for pencil-of-points structure-from-motion. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3022, pp. 28–40. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24671-8_3
Bartoli, A., Sturm, P.: Structure-from-motion using lines: representation, triangulation, and bundle adjustment. Computer Vis, Image Understand. (CVIU) 100(3), 416–441 (2005)
Bazin, J.C., et al.: Globally optimal line clustering and vanishing point estimation in manhattan world. In: CVPR (2012)
Bhowmick, B., Patra, S., Chatterjee, A., Govindu, V.M., Banerjee, S.: Divide and conquer: Efficient large-scale structure from motion using graph partitioning. In: ACCV (2015)
Brooks, M.J., Chojnacki, W., Gawley, D., Van Den Hengel, A.: What value covariance information in estimating vision parameters? In: ICCV (2001)
Bui, B.T., Bui, H.H., Tran, D.T., Lee, J.H.: Representing 3d sparse map points and lines for camera relocalization. arXiv preprint arXiv:2402.18011 (2024)
Burnett, K., Yoon, D.J., Schoellig, A.P., Barfoot, T.D.: Radar odometry combining probabilistic estimation and unsupervised feature learning. In: Robotics: Science and Systems (RSS) (2021)
Camposeco, F., Cohen, A., Pollefeys, M., Sattler, T.: Hybrid camera pose estimation. In: CVPR (2018)
Chandraker, M., Lim, J., Kriegman, D.: Moving in stereo: efficient structure and motion using lines. In: ICCV (2009)
Chum, O., Matas, J., Kittler, J.: Locally optimized ransac. In: Joint Pattern Recognition Symposium (2003)
Crandall, D., Owens, A., Snavely, N., Huttenlocher, D.: Discrete-continuous optimization for large-scale structure from motion. In: CVPR (2011)
Dellaert, F., Seitz, S.M., Thorpe, C.E., Thrun, S.: Structure from motion without correspondence. In: CVPR (2000)
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Computer Vision and Pattern Recognition Workshops (CVPRW) (2018)
Dusmanu, M., Miksik, O., Schönberger, J.L., Pollefeys, M.: Cross-Descriptor Visual Localization and Mapping. In: ICCV (2021)
Dusmanu, M., Schönberger, J.L., Pollefeys, M.: Multi-view optimization of local feature geometry. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 670–686. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_39
Fiacco, A.V., Ishizuka, Y.: Sensitivity and stability analysis for nonlinear programming. Ann. Oper. Res. 27(1), 215–235 (1990)
Förstner, W., Gülch, E.: A fast operator for detection and precise location of distinct points, corners and centres of circular features. In: Proc. ISPRS Intercommission Conference on Fast Processing of Photogrammetric Data (1987)
Förstner, W., Wrobel, B.P.: Photogrammetric computer vision (2016)
Frahm, J.-M., et al.: Building Rome on a cloudless day. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 368–381. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_27
Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 25(8), 930–943 (2003)
Germain, H., Bourmaud, G., Lepetit, V.: S2dnet: Learning accurate correspondences for sparse-to-dense feature matching. In: ECCV (2020)
Gomez-Ojeda, R., Gonzalez-Jimenez, J.: Robust stereo visual odometry through a probabilistic combination of points and line segments. In: ICRA (2016)
Gomez-Ojeda, R., Moreno, F.A., Zuniga-Noël, D., Scaramuzza, D., Gonzalez-Jimenez, J.: Pl-slam: a stereo slam system through the combination of points and line segments. IEEE Trans. Rob. 35(3), 734–746 (2019)
Hartley, R.I., Sturm, P.: Triangulation. Comput. Vis. Image Understand. (CVIU) 68(2), 146–157 (1997)
He, X., et al.: Detector-free structure from motion. arXiv preprint arXiv:2306.15669 (2023)
He, Y., Zhao, J., Guo, Y., He, W., Yuan, K.: Pl-vio: tightly-coupled monocular visual-inertial odometry using point and line features. Sensors 18(4), 1159 (2018)
Hofer, M., Maurer, M., Bischof, H.: Line3d: efficient 3d scene abstraction for the built environment. In: German Conference on Pattern Recognition (2015)
Holynski, A., Geraghty, D., Frahm, J.M., Sweeney, C., Szeliski, R.: Reducing drift in structure from motion using extended features. In: International Conference on 3D Vision (3DV) (2020)
Huang, S., Qin, F., Xiong, P., Ding, N., He, Y., Liu, X.: TP-LSD: tri-points based line segment detector. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 770–785. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_46
Jakob, W., Rhinelander, J., Moldovan, D.: pybind11 – seamless operability between c++11 and python. https://github.com/pybind/pybind11
Jiang, N., Cui, Z., Tan, P.: A global linear method for camera pose registration. In: ICCV (2013)
Jin, Y., et al.: Image matching across wide baselines: from paper to practice. IJCV 129(2), 517–547 (2021)
Kanatani, K.: For geometric inference from images, what kind of statistical model is necessary? Syst. Comput. Japan 35(6), 1–9 (2004)
Kanazawa, Y., Kanatani, K.: Do we really have to consider covariance matrices for image feature points?. Electr. Commun. Japan (part III: Fundamental Electr. Sci.) 86(1), 1–10 (2003)
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: A convolutional network for real-time 6-DoF camera relocalization. In: ICCV (2015)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. (ToG) 42(4), 1–14 (2023)
Kuhn, A., Sormann, C., Rossi, M., Erdler, O., Fraundorfer, F.: Deepc-mvs: deep confidence prediction for multi-view stereo reconstruction. In: International Conference on 3D Vision (3DV) (2020)
Kukelova, Z., Bujnak, M., Pajdla, T.: Closed-form solutions to minimal absolute pose problems with known vertical direction. In: ACCV (2010)
Kushal, A., Agarwal, S.: Visibility based preconditioning for bundle adjustment. In: CVPR (2012)
Larsson, V.: PoseLib - Minimal Solvers for Camera Pose Estimation. https://github.com/vlarsson/PoseLib
Lebeda, K., Matas, J., Chum, O.: Fixing the locally optimized ransac–full experimental evaluation. In: BMVC (2012)
Li, H., Zhao, J., Bazin, J.C., Chen, W., Liu, Z., Liu, Y.H.: Quasi-globally optimal and efficient vanishing point estimation in manhattan world. In: ICCV (2019)
Li, Z., et al.: Neuralangelo: high-fidelity neural surface reconstruction. In: CVPR (2023)
Lim, H., Jeon, J., Myung, H.: Uv-slam: unconstrained line-based slam using vanishing points for structural mapping. IEEE Robot. Autom. Lett. (RA-L) 7(2), 1518–1525 (2022)
Lim, H., Kim, Y., Jung, K., Hu, S., Myung, H.: Avoiding degeneracy for monocular visual slam with point and line features. In: ICRA (2021)
Lindenberger, P., Sarlin, P.E., Larsson, V., Pollefeys, M.: Pixel-perfect structure-from-motion with featuremetric refinement. In: ICCV (2021)
Liu, S., Yu, Y., Pautrat, R., Pollefeys, M., Larsson, V.: 3d line mapping revisited. In: CVPR (2023)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Lu, F., Hartley, R.: A fast optimal algorithm for l 2 triangulation. In: ACCV (2007)
Marzorati, D., Matteucci, M., Migliore, D., Sorrenti, D.G.: Integration of 3d lines and points in 6dof visual slam by uncertain projective geometry. In: EMCR (2007)
Mateus, A., Tahri, O., Aguiar, A.P., Lima, P.U., Miraldo, P.: On incremental structure from motion using lines. IEEE Trans. Rob. 38(1), 391–406 (2021)
Meidow, J., Beder, C., Förstner, W.: Reasoning with uncertain points, straight lines, and straight line segments in 2d. ISPRS J. Photogramm. Remote. Sens. 64(2), 125–139 (2009)
Micusik, B., Wildenauer, H.: Structure from motion with line segments under relaxed endpoint constraints. IJCV 124(1), 65–79 (2017)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Moulon, P., Monasse, P., Perrot, R., Marlet, R.: OpenMVG: open multiple view geometry. In: International Workshop on Reproducible Research in Pattern Recognition (2016)
Muhle, D., Koestler, L., Jatavallabhula, K.M., Cremers, D.: Learning correspondence uncertainty via differentiable nonlinear least squares. In: CVPR (2023)
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)
Nistér, D.: Preemptive ransac for live structure and motion estimation. Mach. Vis. Appl. 16(5), 321–329 (2005)
Nurutdinova, I., Fitzgibbon, A.: Towards pointless structure from motion: 3d reconstruction and camera parameters from general 3d curves. In: ICCV (2015)
Pautrat, R., Lin, J.T., Larsson, V., Oswald, M.R., Pollefeys, M.: Sold2: self-supervised occlusion-aware line description and detection. In: CVPR (2021)
Pautrat, R., Liu, S., Hruby, P., Pollefeys, M., Barath, D.: Vanishing point estimation in uncalibrated images with prior gravity direction. In: ICCV (2023)
Pautrat, R., Suárez, I., Yu, Y., Pollefeys, M., Larsson, V.: Gluestick: robust image matching by sticking points and lines together. In: ICCV (2023)
Pautrat, R., Barath, D., Larsson, V., Oswald, M.R., Pollefeys, M.: Deeplsd: line segment detection and refinement with deep image gradients. In: CVPR (2023)
Persson, M., Nordberg, K.: Lambda twist: An accurate fast robust perspective three point (p3p) solver. In: ECCV (2018)
Poggi, M., Mattoccia, S.: Learning from scratch a confidence measure. In: BMVC (2016)
Pumarola, A., Vakhitov, A., Agudo, A., Sanfeliu, A., Moreno-Noguer, F.: Pl-slam: Real-time monocular visual slam with points and lines. In: ICRA (2017)
Qian, G., Chellappa, R.: Structure from motion using sequential monte carlo methods. IJCV 59, 5–31 (2004)
Qian, Y., Elder, J.H.: A reliable online method for joint estimation of focal length and camera rotation. In: ECCV (2022)
Roberts, M., Ramapuram, J., Ranjan, A., Kumar, A., Bautista, M.A., Paczan, N., Webb, R., Susskind, J.M.: Hypersim: A photorealistic synthetic dataset for holistic indoor scene understanding. In: ICCV (2021)
Sarlin, P.E.: Visual localization made easy with hloc. https://github.com/cvg/Hierarchical-Localization/
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: Learning feature matching with graph neural networks. In: CVPR (2020)
Sarlin, P.E., et al.: LaMAR: benchmarking Localization and Mapping for Augmented Reality. In: ECCV (2022)
Schindler, G., Krishnamurthy, P., Dellaert, F.: Line-based structure from motion for urban environments. In: International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT) (2006)
Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Schöps, T., Engel, J., Cremers, D.: Semi-dense visual odometry for ar on a smartphone. In: International Symposium on Mixed and Augmented Reality (ISMAR) (2014)
Schops, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR (2017)
Seki, A., Pollefeys, M.: Patch based confidence prediction for dense disparity map. In: BMVC (2016)
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)
Shu, F., Wang, J., Pagani, A., Stricker, D.: Structure plp-slam: efficient sparse mapping and localization using point, line and plane for monocular, rgb-d and stereo cameras. In: ICRA (2023)
Sinha, S.N., Steedly, D., Szeliski, R.: A multi-stage linear approach to structure from motion. In: Kutulakos, K.N. (ed.) ECCV 2010. LNCS, vol. 6554, pp. 267–281. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35740-4_21
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. In: ACM SIGGRAPH (2006)
Steedly, D., Essa, I.A., Dellaert, F.: Spectral partitioning for structure from motion. In: ICCV (2003)
Steele, R.M., Jaynes, C.: Feature uncertainty arising from covariant image noise. In: CVPR (2005)
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of rgb-d slam systems. In: IROS (2012)
Sweeney, C.: Theia multiview geometry library: Tutorial & reference. http://theia-sfm.org
Sweeney, C., Sattler, T., Hollerer, T., Turk, M., Pollefeys, M.: Optimizing the viewing graph for structure-from-motion. In: ICCV (2015)
Tang, C., Tan, P.: Ba-net: dense bundle adjustment network. In: International Conference on Learning Representations (ICLR) (2019)
Taylor, C.J., Kriegman, D.J.: Structure and motion from line segments in multiple images. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 17(11), 1021–1032 (1995)
Toldo, R., Fusiello, A.: Robust multiple structures estimation with J-linkage. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 537–547. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_41
Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment — a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44480-7_21
Von Gioi, R.G., Jakubowicz, J., Morel, J.M., Randall, G.: Lsd: a fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 32(4), 722–732 (2008)
Wang, J., Karaev, N., Rupprecht, C., Novotny, D.: Visual geometry grounded deep structure from motion. In: CVPR (2024)
Wang, J., Rupprecht, C., Novotny, D.: Posediffusion: solving pose estimation via diffusion-aided bundle adjustment. In: ICCV (2023)
Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: Dust3r: geometric 3d vision made easy. In: CVPR (2024)
Wei, X., Huang, J., Ma, X.: Real-time monocular visual slam by combining points and lines. In: IEEE International Conference on Multimedia and Expo (ICME) (2019)
Wilson, K., Snavely, N.: Robust global translations with 1DSfM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 61–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_5
Wu, C.: Visualsfm: A visual structure from motion system (2011). http://www.cs.washington.edu/homes/ccwu/vsfm
Wu, C.: Towards linear-time incremental structure from motion. In: International Conference on 3D Vision (3DV) (2013)
Xiao, Y., Xue, N., Wu, T., Xia, G.S.: Level-s2fm: structure from motion on neural level set of implicit surfaces. In: CVPR (2023)
Xue, N., et al.: Holistically-attracted wireframe parsing. In: CVPR (2020)
Yan, J., Zheng, Y., Yang, J., Mihaylova, L., Yuan, W., Gu, F.: Plpf-vslam: an indoor visual slam with adaptive fusion of point-line-plane features. J. Field Robot. (2023)
Yu, Z., Peng, S., Niemeyer, M., Sattler, T., Geiger, A.: Monosdf: exploring monocular geometric cues for neural implicit surface reconstruction. In: NeurIPS (2022)
Zeisl, B., Georgel, P.F., Schweiger, F., Steinbach, E.G., Navab, N., Munich, G.: Estimation of location uncertainty for scale invariant features points. In: BMVC (2009)
Zhang, H., Grießbach, D., Wohlfeil, J., Börner, A.: Uncertainty model for template feature matching. In: Paul, M., Hitoshi, C., Huang, Q. (eds.) PSIVT 2017. LNCS, vol. 10749, pp. 406–420. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75786-5_33
Zhang, J.Y., Lin, A., Kumar, M., Yang, T.H., Ramanan, D., Tulsiani, S.: Cameras as rays: pose estimation via ray diffusion. In: International Conference on Learning Representations (ICLR) (2024)
Zhang, L., Lu, H., Hu, X., Koch, R.: Vanishing point estimation and line classification in a manhattan world with a unifying camera model. IJCV 117 (2015)
Zhao, W., Liu, S., Wei, Y., Guo, H., Liu, Y.J.: A confidence-based iterative solver of depths and surface normals for deep multi-view stereo. In: ICCV (2021)
Zhou, L., Ye, J., Kaess, M.: A stable algebraic camera pose estimation for minimal configurations of 2d/3d point and line correspondences. In: ACCV (2018)
Zuo, X., Xie, X., Liu, Y., Huang, G.: Robust visual slam with point and line features. In: IROS (2017)
Acknowledgements
This work has been supported by Innosuisse funding (Grant No. 100.567 IP-ICT). V. Larsson was supported by ELLIIT and the Swedish Research Council (Grant No. 2023–05424).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, S. et al. (2025). Robust Incremental Structure-from-Motion with Hybrid Features. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15094. Springer, Cham. https://doi.org/10.1007/978-3-031-72764-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-72764-1_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72763-4
Online ISBN: 978-3-031-72764-1
eBook Packages: Computer ScienceComputer Science (R0)