Accurate structure from motion using consistent cluster merging

Chen, Shu; Liang, Luming; Ouyang, Jianquan

doi:10.1007/s11042-022-12202-w

Accurate structure from motion using consistent cluster merging

Published: 22 March 2022

Volume 81, pages 24913–24935, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

215 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The incremental Structure-from-Motion approach is widely used for scene reconstruction as it is robust to outliers. However, the method suffers from two major limitations: error accumulation and heavy time consumption. To alleviate these problems, we propose a redundant cluster merge approach which is effective and efficient. Different from the previous clustering methods, each cluster has only one overlapping adjacent cluster. each of the sub clusters divided by our approach has several adjacent cluster candidates with overlapping. In addition, these cluster candidates are verified whether they are suitable for merging. By selecting the correct estimated clusters, cluster merging achieves more accurate results. The cluster verification is implemented based on the fact that the correctly estimated clusters have consistent point cloud and extrinsic camera parameters in each image of the same scene will be formulated as two constraints. In addition, we introduce a feature matching consistency constraint to eliminate the falsely matched feature pairs. The gain in accuracy of feature matching leads to better estimated results in each cluster. Experiments were performed on three public datasets. The reconstruction results show that our method outperformed state-of-the-art SfM approaches in terms of both efficiency and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Adaptive Structure from Motion with a Contrario Model Estimation

Fast 3D reconstruction and modeling method based on the good choice of image pairs for modified match propagation

Article 20 December 2019

Estimated camera trajectory-based integration among local 3D models sequentially generated from image sequences by SfM–MVS

Article 01 April 2024

References

Agarwal S, Snavely N, Simon I, Szeliski SR (2009) Building Rome in a day. In: Proc. Int. Conf. Comput. Vis., pp 72–79
Bhowmick B, Patra S, Chatterjee A, Govindu VM, Banerjee S (2017) Divide and conquer: a hierarchical approach to large-scale structure-from-motion. Comput Vis Image Understanding 157:190–205
Article Google Scholar
Bian J, Lin W, Matsushita Y, Yeung S, Nguyen T, Cheng M (2007) GMS: Grid-based motion statistics for fast, ultra-robust feature correspondence. In: Proc. IEEE conf. Computer vision and pattern recognition, pp 2828–2837
Chatterjee A, Govindu VM (2018) Robust relative rotation averaging. IEEE Trans Pattern Anal Mach Intell 40(4):958–972
Article Google Scholar
Crandall DJ, Owens A, Snavely N, Huttenlocher DP (2011) Discrete-continuous optimization for large-scale structure from motion. In: Proc. Int. Conf. Comput. Vis. 2011, pp 3001–3008
Cui H, Gao X, Shen S, Hu Z (2017) HSFm: hybrid Structure-from-Motion. In: Proc. IEEE conf. Computer vision and pattern recognition, pp 2393–2402
Cui H, Shen S, Gao X, Hu Z (2017) Batched incremental structure-from-motion. In: Proc. Int. Conf. 3d. Vis. pp 205–214
Cui H, Shen S, Gao X, Hu Z (2017) CSFm: community-based Structure from Motion. In: Proc IEEE int Conf image processing, pp 1–5
Dai Y, Li H, He M (2013) Projective multiview structure and motion from element-wise factorization. IEEE Trans Pattern Anal Mach Intell 35 (9):2238–2251
Article Google Scholar
Fischler MA, Bolles RC (1981) Random Sample Consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Article MathSciNet Google Scholar
Furukawa Y, Ponce J (2010) Accurate, dense, and robust multi-view stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8):1362–1376
Article Google Scholar
Gherardi R, Farenzena M, Fusiello A (2010) Improving the efficiency of hierarchical structure-and-motion. In: Proc. IEEE conf Computer vision and pattern recognition, pp 1594–1600
Hartley R (1995) In defence of the 8-point algorithm. In: Proc. Int. Conf. Comput. Vis., pp 1064–1070
Hartley R, Sturm P (1997) Triangulation. Comput Vis Image Understand 68(2):146–157
Article Google Scholar
Hartley R, Trumpf J, Dai Y, Li H (2013) Rotation averaging. Int J Comput Vis 103(3):267–305
Article MathSciNet Google Scholar
Jiang N, Tan P, Cheong LF (2012) Seeing double without confusion: Structure-from-motion in highly ambiguous scenes. In: Proc. IEEE conf. Computer vision and pattern recognition, pp 1458–1465
Kahl F, Hartley R (2008) Multiple-view geometry under the $l_{\infty } $-norm. IEEE Trans Pattern Anal Mach Intell 30(9):1603–1617
Article Google Scholar
Lin WY, Liu S, Jiang N, Do MN, Tan P, Lu J (2016) Repmatch: Robust feature matching and pose for reconstructing modern cities. In: Proc. Int. Conf. Comput. Vis, pp 562–579
Lipman Y, Yagev S, Poranne R, Jacobs DW, Basri R (2014) Feature matching with bounded distortion. ACM Trans On Graphics 33(3):1–14
Article Google Scholar
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proc. Int. Conf. Comput. Vis., pp 1150–1157
Magerand L, Bue AD (2017) Practical projective structure from motion (p2sfm). In: Proc. Int. Conf. Comput. Vis. pp 39–47
Martinec D, Pajdla T (2007) Robust rotation and translation estimation in multiview reconstruction. In: Proc. IEEE conf. Computer vision and pattern recognition, pp. 1–8
Moulon P, Monasse P, Marlet R (2013) Global fusion of relative motions for robust, accurate and scalable structure from motion. In: Proc. Int. Conf. Comput. Vis, pp 3248–3255
Nistér D (2004) An efficient solution to the five-point relative pose problem. IEEE Trans Pattern Anal Mach Intell 26(6):756–770
Article Google Scholar
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proc. IEEE conf. Computer vision and pattern recognition, pp 2161–2168
Pedersini F, Pigazzini P, Sarti A, Tubaro S (2000) Multicamera motion estimation for high-accuracy 3D reconstruction. Signal Process 80:1–21
Article Google Scholar
Roberts R, Sinha SN, Szeliski R, Steedly D (2011) Structure from motion for scenes with large duplicate structures. In: Proc. IEEE conf. Computer vision and pattern recognition, pp 3137–3144
Rousseeuw RJ (1984) Least median of squares regression. J Amer Stat Assoc 79(388):871–880
Article MathSciNet Google Scholar
Schönberger JL, Frahm J (2016) Structure-from-motion revisited. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp 4104–4113
Schönberger JL, Zheng E, Pollefeys M, Frahm J (2016) Pixelwise view selection for unstructured multi-view stereo. In: European conference on computer vision, pp 501–518
Shen T, Zhu S, Fang T, Zhang R, Quan L (2016) Graph-based consistent matching for structure-from-motion. In: European conference on computer vision, pp 139–155
Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from Internet photo collections. Int J Comput Vis 80:189–210
Article Google Scholar
Snavely N, Seitz S, Szeliski R (2008) Skeletal graphs for efficient structure from motion. In: Proc. IEEE conf. Computer vision and pattern recognition, pp 1–8
Strecha C, Hansen WV, Gool LV, Fua P, Thoennessen U (2008) On benchmarking camera calibration and multi-view stereo for high resolution imagery. In: Proc. IEEE conf. Computer vision and pattern recognition, pp 1–8
Sturm J, Engelhard N, Endres F, Burgard W, Cremers D (2012) A benchmark for the evaluation of RGB-d SLAM systems. In: IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 573–580
Toldo R, Gherardi R, Farenzena M, Fusiello A (2015) Hierarchical structureand motion recovery from uncalibrated images. Comput Vis Image Understanding 140:127–143
Article Google Scholar
Triggs B, McLauchlan PF, Hartley R, Fitzgibbon AW (1999) Bundle adjustment a modern synthesis. In: International workshop on vision algorithms, pp 298–372
Wilson K, Snavely N (2014) Network principles for sfm: disambiguating repeated structures with local context. In: Proc. Int. Conf. Comput. Vis, pp 513–520
Wu C (2007) Siftgpu: A gpu implementation of scale invariant feature transform (sift)
Wu C (2013) Towards linear-time incremental structure from motion. In: Proc. Int. Conf. 3d. Vis., pp 127–134
Wu C (2020) Visualsfm: A visual structure from motion system, http://ccwu.me/vsfm/
Zach C (2020) SSBA 3.0, https://github.com/eokeeffe/SSBA
Zach C, Klopschitz M, Pollefeys M (2010) Disambiguating visual relations using loop constraints. In: Proc. IEEE conf. Computer vision and pattern recognition, pp 1426–1433
Zhu Q, Wang H, Tian W (2009) A practical new approach to 3D scene recovery. Signal Process 89:2152–2158
Article Google Scholar
Zhu S, Zhang R, Zhou L, Shen T, Fang T, Tan P, Quan L (2018) Very large-scale global SfM by distributed motion averaging. In: Proc. IEEE conf. Computer vision and pattern recognition, pp 4568–4577

Download references

Acknowledgements

This research is supported in part by the Natural Science Foundation of Hunan Province (No. 2017JJ2252).

Author information

Authors and Affiliations

School of Computer Science & School of Cyberspace Security, Xiangtan University, Xiangtan, 411105, People’s Republic of China
Shu Chen & Jianquan Ouyang
Key Laboratory of Intelligent Computing and Information Processing, Ministry of Education, Xiangtan, 411105, People’s Republic of China
Shu Chen & Jianquan Ouyang
Applied Science Group, Microsoft, Buffalo, WA, 98052, USA
Luming Liang

Authors

Shu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Luming Liang
View author publications
You can also search for this author in PubMed Google Scholar
Jianquan Ouyang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luming Liang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Coordinate transformation

1.1 A.1 Method 1: Coordinate transform between two clusters

Given two adjacent clusters with the same images, the images in the first cluster and in the second cluster are set as $ {I_{f}^{k}}$ and ${I_{s}^{k}}$, respectively. The estimated 3D points from the first cluster transformed into the image $ {I_{f}^{k}}$ local coordinate system satisfy

$$ {\textbf{X}}_{cam{\text{1}}} = {\textbf{R}}_{1} {\textbf{X}} + {\textbf{t}}_{1} , $$

(7)

where X is the 3D global coordinate of a point obtained in the first cluster, and R₁,t₁ are the estimated relative extrinsic camera parameters corresponding to image $ {I_{f}^{k}}$. X_cam1 is the transformed coordinate in the image $ {I_{f}^{k}}$ local coordinate system. Equation (7) can be rewritten as

$$ {\textbf{X}} = {\textbf{R}}_{1}^{- 1} ({\textbf{X}}_{cam{\text{1}}} - {\textbf{t}}_{1} ) = {\textbf{R}}_{1}^{- 1} {\textbf{X}}_{cam{\text{1}}} - {\textbf{R}}_{1}^{- 1} {\textbf{t}}_{1}. $$

(8)

Assuming the local coordinate system of image $ {I_{s}^{k}}$ is set as the global coordinate system in the second cluster, the 3D coordinates of points estimated in the second cluster are denoted as X₂. Since image $ {I_{f}^{k}}$ and image $ {I_{s}^{k}}$ are the same image, therefore, X₂ = X_cam1. From equation (8), we obtain

$$ {\textbf{X}} = {\textbf{R}}_{1}^{- 1} {\textbf{X}}_{2} - {\textbf{R}}_{1}^{- 1} {\textbf{t}}_{1} . $$

(9)

Given one image in the second cluster, $ {I_{s}^{i}}$, we explain how to estimate the parameters R,t that transforms the extrinsic camera parameters corresponding to $ {I_{s}^{i}}$ in the second cluster coordinate system into the first cluster coordinate system as follows:

The 3D global coordinates of a point obtained in the first cluster, X, and the 3D coordinate of the point transformed into $ {I_{s}^{i}}$ local coordinate system, X_cam2, satisfy

$$ {\textbf{X}}_{cam2} {\text{ = }}{\textbf{RX}} + {\textbf{t}}, $$

(10)

where X is the 3D global coordinate of the point defined as (A.3), and R,t are the extrinsic camera parameters corresponding to image $ {I_{s}^{i}}$ in the first cluster coordinate system.

With the substitution of (9), Formula (10) can be rewritten as

$$ {\textbf{X}}_{cam2} = {\textbf{R}}({\textbf{R}}_{1}^{- 1} {\textbf{X}}_{\text{2}} - {\textbf{R}}_{1}^{- 1} {\textbf{t}}_{1} ) + {\textbf{t}} = {\textbf{RR}}_{1}^{- 1} {\textbf{X}}_{\text{2}} - {\textbf{RR}}_{1}^{- 1} {\textbf{t}}_{1} + {\textbf{t}}. $$

(11)

The 3D coordinate of this point transformed into $ {I_{s}^{i}}$ local coordinate system can also be defined as

$$ {\textbf{X}}_{cam2} = {\textbf{R}}_{2} {\textbf{X}}_{2} + {\textbf{t}}_{2} , $$

(12)

where R₂,t₂ are the estimated extrinsic camera parameters corresponding to $ {I_{s}^{i}}$ in the second cluster, and X₂ is the estimated 3D global coordinate of the point in the second cluster.

Since the coordinates of the point transformed into the $ {I_{s}^{i}}$ local coordinate system are unchanged, (12) equals to (11), and can be formulated as

$$ {\textbf{RR}}_{1}^{- 1} {\textbf{X}}_{\text{2}} - {\textbf{RR}}_{1}^{- 1} {\textbf{t}}_{1} + {\textbf{t}} = {\textbf{R}}_{2} {\textbf{X}}_{2} + {\textbf{t}}_{2} . $$

(13)

From (13), we obtain

$$ {\textbf{RR}}_{1}^{- 1} = {\textbf{R}}_{2} \Rightarrow {\textbf{R}} = {\textbf{R}}_{2} {\textbf{R}}_{1}. $$

(14)

$$ - {\textbf{RR}}_{1}^{- 1} {\textbf{t}}_{1} + {\textbf{t}} = {\textbf{t}}_{2} \Rightarrow {\textbf{t}} = {\textbf{RR}}_{1}^{- 1} {\textbf{t}}_{1} + {\textbf{t}}_{2} = {\textbf{R}}_{2} {\textbf{t}}_{1} + {\textbf{t}}_{2}. $$

(15)

1.2 A.2 Method 2: Coordinate transformation in the new global coordinate system

According to the previous derivation, if the $ {I_{f}^{k}}$ local coordinate system is set to the global coordinate system in the first cluster, we have R₁ = I; t₁ = 0. From equations (14) and (15), R and t are calculated as R = R₂; t = t₂, which means that the estimated extrinsic parameters of a camera in the second cluster can be estimated as the extrinsic parameters of this camera in the first cluster global coordinate system if the global coordinate system in each cluster is set as the same image local coordinate system.

Given an image in a cluster, I_g, we describe how the estimated extrinsic parameters of one camera in the cluster are transformed into the coordinate system with the local coordinate system of I_g as the cluster global coordinate system as follows.

We denote the extrinsic camera parameters corresponding to image I_g as R_g and t_g. The estimated coordinates of a point, X, transformed into the I_g image local coordinate system as X_cam, can be formulated as:

$$ {\textbf{X}}_{cam} = {\textbf{R}}_{g} {\textbf{X}} + {\textbf{t}}_{g}. $$

(16)

Let R,t be the extrinsic camera parameters of the N^th image after transformation. Since the local coordinate system of image I_g is set as the global coordinate system after transformation, after transformation, the coordinate of the point in the N^th image local coordinate system is subject to

$$ {\textbf{RX}}_{cam} + {\textbf{t}} = {\textbf{R}}\left( {{\textbf{R}}_{g} {\textbf{X}} + {\textbf{t}}_{g} } \right) + {\textbf{t}} = {\textbf{RR}}_{g} {\textbf{X}}{\text{ + }}{\textbf{Rt}}_{g} + {\textbf{t}}. $$

(17)

Since the coordinate of the point transformed into the N^th image local coordinate system is unchanged, therefore, we have

$$ {\textbf{RR}}_{g} {\textbf{X}}{\text{ + }}{\textbf{Rt}}_{g} + {\textbf{t}} = {\textbf{R}}_{n} {\textbf{X}} + {\textbf{t}}_{n} , $$

(18)

where R_n and t_n are the estimated extrinsic camera parameters corresponding to the N^th image before transformation.

According to (18), we have

$$ {\textbf{RR}}_{g} = {\textbf{R}}_{n} ,{\textbf{Rt}}_{g} + {\textbf{t}} = {\textbf{t}}_{n}. $$

(19)

From (19), we conclude that the extrinsic camera parameters correspond to the N^th image transformed into the coordinate system with the local coordinate system of I_g as the cluster global coordinate system satisfy

$$ {\textbf{R}} = {\textbf{R}}_{n} {\textbf{R}}_{g}^{- 1} ,{\textbf{t}} = {\textbf{t}}_{n} - {\textbf{Rt}}_{g}. $$

(20)

Thus, the coordinates of a point are transformed into the coordinate in the new global coordinate system are estimated as

$$ {\textbf{X}}^{\prime} = {\textbf{R}}_{g} {\textbf{X}} + {\textbf{t}}_{g}. $$

(21)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, S., Liang, L. & Ouyang, J. Accurate structure from motion using consistent cluster merging. Multimed Tools Appl 81, 24913–24935 (2022). https://doi.org/10.1007/s11042-022-12202-w

Download citation

Received: 16 March 2021
Revised: 09 June 2021
Accepted: 10 January 2022
Published: 22 March 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11042-022-12202-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate structure from motion using consistent cluster merging

Abstract

Access this article

Similar content being viewed by others

Adaptive Structure from Motion with a Contrario Model Estimation

Fast 3D reconstruction and modeling method based on the good choice of image pairs for modified match propagation

Estimated camera trajectory-based integration among local 3D models sequentially generated from image sequences by SfM–MVS

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix A: Coordinate transformation

1.1 A.1 Method 1: Coordinate transform between two clusters

1.2 A.2 Method 2: Coordinate transformation in the new global coordinate system

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accurate structure from motion using consistent cluster merging

Abstract

Access this article

Similar content being viewed by others

Adaptive Structure from Motion with a Contrario Model Estimation

Fast 3D reconstruction and modeling method based on the good choice of image pairs for modified match propagation

Estimated camera trajectory-based integration among local 3D models sequentially generated from image sequences by SfM–MVS

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix A: Coordinate transformation

Appendix A: Coordinate transformation

1.1 A.1 Method 1: Coordinate transform between two clusters

1.2 A.2 Method 2: Coordinate transformation in the new global coordinate system

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation