Robust and efficient object reconstructions from closed loop sequences

Han, Kyung Min; Rueda, Antonio J.

doi:10.1007/s00138-021-01193-7

Robust and efficient object reconstructions from closed loop sequences

Original Paper
Published: 13 April 2021

Volume 32, article number 70, (2021)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

430 Accesses
Explore all metrics

Abstract

We propose a new hierarchical structure from motion system for close loop sequences. Our system includes a novel approach for clustering cameras into multiple sets, whose camera poses are initially reconstructed separately and later globally registered w.r.t a single coordinate frame. Each of the multiple sets is robustly reconstructed by a novel guarded least median square protocol. Our method is accelerated by reducing the parameter space of bundle adjustment in the local reconstruction optimizations. We also propose a new synthetic dataset that could be useful in 3D object reconstruction problems. Extensive experiments with both synthetic and real data were carried out to validate our method. Our system presented better results than ACTS and GPE in terms of rotation and translation errors in the camera pose estimations, and the accuracy is quite close to COLMAP while our method is much faster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 3

Fig. 9

A Dense Pipeline for 3D Reconstruction from Image Sequences

Real-Time Monocular Segmentation and Pose Tracking of Multiple Objects

Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure

References

Lu, G., Han, K., DeSouza, G., Armer, J., Shyu, C.: A new algorithm for 3D registration and its application in self-monitoring and early detection of lymphedema. IRMB 35, 370–384 (2014)
Google Scholar
Nguyen, C.V., Lovell, D.R., Adcock, M., Salle, J.L.: Capturing natural-colour 3D models of insects for species discovery and diagnostics. PLoS One 9, e94346 (2014)
Article Google Scholar
Roussel, J., Fischbach, A., Jahnke, S., Scharr, H.: 3D surface reconstruction of plant seeds by volume carving. In: Tsaftaris, H.S.S.A., Pridmore, T. (eds) Proceedings of the Computer Vision Problems in Plant Phenotyping (CVPPP), September 2015, pp. 7.1–7.13. BMVA Press (2015) https://doi.org/10.5244/C.29.CVPPP.7
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1362–1376 (2010)
Article Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. Int. J. Comput. Vis. 80(2), 189–210 (2007)
Article Google Scholar
Wu, C.: Towards linear-time incremental structure from motion. In: Proceedings of the 2013 International Conference on 3D Vision(3DV), pp. 127–134 (2013)
Schonberger, J.L., Frahm, J.-M.: Structure-from-motion revisited. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 4104–4113. IEEE Conference Publications (2016)
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)
Article Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. In: ACM Transactions on Graphics, pp. 835–846 (2006)
Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In:Proceedings of the 2009 IEEE International Conference on Computer Vision(ICCV), pp. 72–79 (2009)
Hoppe, C., Klopschitz, M., Rumpler, M., Wendel, A., Kluckner, S., Bischof, H., Reitmayr, G.: Online feedback for structure-from-motion image acquisition. In: Proceedings of the British Machine Vision Conference, pp. 70.1–70.12. BMVA Press (2012)
Wu, C.: Visualsfm: a visual structure from motion system @ONLINE. http://ccwu.me/vsfm/ (2010)
Fitzgibbon, A.W., Zisserman, A.: Automatic camera recovery for closed or open image sequences. In: Proceedings of the 1998 European Conference on Computer Vision(ECCV), pp. 311–326 (1998)
Nister, D.: Reconstruction from uncalibrated sequences with hierachiy of trifocal tensors. In: ECCV 2000, 6th European Conference on Computer Vision (2000)
Havlena, M., Torii, A., Knopp, J., Pajdla, T.: Randomized structure from motion based on atomic 3D models from camera triplets. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2874–2881. IEEE (2000)
Roberto Toldo, R.G., Farenzena, M.: Hierachical structure-and-motion recovery from uncalibrated images. Comput. Vis. Image Underst. 140, 127–143 (2015)
Article Google Scholar
Torr, P.H., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78, 138–156 (2000)
Article Google Scholar
Sim, K., Hartley, R.: Recovering camera motion using linfty minimization. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, ser. CVPR ’06. vol. 1, pp. 1230-1237. IEEE Computer Society, Washington, DC, USA (2006) https://doi.org/10.1109/CVPR.2006.247
Crandall, D., Owens, A., Snavely, N., Huttenlocher, D.: Discrete-continuous optimization for large-scale structure from motion. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, ser. CVPR ’11, pp. 3001-3008. IEEE Computer Society, Washington, DC, USA (2011). https://doi.org/10.1109/CVPR.2011.5995626
Jiang, N., Cui, Z., Tan, P.: A global linear method for camera pose registration. In: IEEE International Conference on Computer Vision (ICCV), pp. 481–488 (2013)
Martinec, D., Pajdla, T.: Robust rotation and translation estimation in multiview reconstruction. In: Proceedings of the 2007 IEEE International Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1–8 (2007)
Aeire-Nachimson, M., Kovalsky, S.Z., Kremelmancher-Shizerman, I., Singer, A., Basri, R.: Global motion estimation from point matches. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission(3DIMPVT), pp 81–88 (2012)
Wilson, K., Snavely, N.: Robust global translation with 1dsfm. In: Proceedings of the 2014 European Conference on Computer Vision(ECCV), pp. 61–75 (2014)
Govindu, V.M.: Combining two-view constraints for motion estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2001, pp. 218–225 (2001)
Chatterjee, A., Govindu, V.M.: Efficient and robust large-scale rotation averaging. In: IEEE International Conference on Computer Vision, pp. 521–528. IEEE Conference Publications (2013)
Courchay, J., Dalalyan, A.S., Keriven, R., Sturm, P.F.: Exploiting loops in the graph of trifocal tensors for calibrating a network of cameras. In: Proceedings of the 11th European Conference on Computer Vision, pp. 85–99 (2010)
Sweeney, C.: Title multiview geometry library: tutorial & reference. http://theia-sfm.org
Mur-Artal, R., Tardos, J.D.: Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 33, 1255–1262 (2017)
Article Google Scholar
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
Article Google Scholar
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, 2007. ISMAR 2007, pp. 225–234. IEEE (2007)
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular slam. In: Computer Vision-ECCV, pp. 834–849 (2014)
Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE (2014)
Tang, J., Folkesson, J., Jensfelt, P.: Geometric correspondence network for camera motion estimation. IEEE Robot. Autom. Lett. 3(2), 1010–1017 (2018)
Article Google Scholar
Mescheder, L.M., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4455–4465 (2019)
Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: Surfacenet: an end-to-end 3d neural network for multiview stereopsis. In: IEEE International Conference on Computer Vision (ICCV), pp. 2326–2334 (2017)
Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2463–2471 (2017)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Sunil Arya, N.S.N., Mount, D.M., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 891–923 (1998)
Article MathSciNet Google Scholar
Snavely, N.: Structure from motion (SFM) for unordered image collections. http://www.cs.cornell.edu/~snavely/bundler (2007)
Atallah, M.J., Blanton, M. (eds.): Algorithms and Theory of Computation Handbook: General Concepts and Techniques, 2nd edn. Chapman & Hall/CRC, Boca Raton (2010)
MATH Google Scholar
Chum, O., Werner, T., Matas, J.: Two-view geometry estimation unaffected by a dominant plane. In: Computer Vision and Pattern Recognition (CVPR), vol. 19, , pp. 772–779. rANSAC (2005)
Stewart, C.V.: Robust parameter estimation in computer vision. In: Society for Industrial and Applied Mathematics, vol. 41, pp. 513–537. rANSAC/LMS (1999)
Agarwal, S., Mierle, K., et al.: Ceres solver. http://ceres-solver.org
V. Moreno-Noguer. F., Lepetit and P. Fua, “Accurate non-iterative o(n) solution to the pnp problem,” in Proceedings of the 2007 IEEE International Conference on Computer Vision(ICCV), 2007, pp. 1–8, perspective N Point
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate o(n) solution to the PnP problem. Int. J. Comput. Vis. 81, 155–166 (2009)
Article Google Scholar
Hartley, R., Trumpf, J., Dai, Y., Li, H.: Rotation averaging. Int. J. Comput. Vis. 103, 267–305 (2013)
Article MathSciNet Google Scholar
Zhang, G., Dong, Z., Jia, J., Wong, T.-T., Bao, H.: Efficient non-consecutive feature tracking for structure-from-motion. In: Proceedings of the 11th European Conference on Computer Vision: Part V, ser. ECCV’10, pp. 422–435. Springer-Verlag, Berlin, Heidelberg (2010)
Ozyesil, O., Singer, A.: A robust camera location estimation by convex programming. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2674–2683. IEEE Conference Publications (2015)

Download references

Acknowledgements

This project was supported in part by the ITRC/IITP program (IITP-2020-0-01460) in South Korea, in part by the Ministry of Science, Innovation and Universities of the Spanish Government and the European Union through the research project RTI2018-099638-B-I00, and in part by the NRF (2017R1A2B3012701, 2018R1A6A3A11049832) in South Korea.

Author information

Authors and Affiliations

Ewha Womans University, 404-2 SK Telecom BLDG, 33 Daesin-dong, Seodaemun-gu, Seoul, 03765, Korea
Kyung Min Han
Department of Computing, University of Jaén, Campus Las Lagunillas s/n. 23071, Jaén, Spain
Antonio J. Rueda

Authors

Kyung Min Han
View author publications
You can also search for this author in PubMed Google Scholar
Antonio J. Rueda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyung Min Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 23378 KB)

A Comparing against ground truth

Let $\mathbf {G}$ be the ground truth camera network poses $\mathbf {G}=\left[ {{I},\bar{{H}}_{2}\bar{{H}}_{3},\cdots ,\bar{{H}}_{{n}}}\right] $ of a dataset, and $\mathbf {Q}$ be the estimated camera network poses $\mathbf {Q}={\left[ {I},\hat{{H}}_{2},\hat{{H}}_{3},\cdots ,\hat{{H}}_{{n}}\right] }$ from the same dataset. Here, I is $4\times 4$ identity matrix, ${H}_{k}$ is $4\times 4$ homogeneous matrix for kth camera pose. It is straightforward to notice that $\mathbf {G}$ is equal to $\mathbf {Q}$ up to similarity transform. That is,

$$\begin{aligned} \mathbf {G}={H_{s}}\cdot \mathbf {Q}. \end{aligned}$$

(10)

Therefore, we need to find a similarity transformation ${H_{s}}$ that minimizes

$$\begin{aligned} \underset{H_{s}}{\mathrm{argmin}}\left\| \mathbf {G}-({H_{s}}\mathbf {\cdot Q})\right\| ^{2}= & {} 0. \end{aligned}$$

(11)

An approximate solution to (11) is under relaxed orthonormality and determinant constraints of rotation parts of the camera matrices, computed by its least square sense approximation—i.e. multiplying $\mathbf {G}$ with pseudo inverse of $\mathbf {Q}$. Let us consider the linear least square approximation of (11) is ${H_{s,\mathrm{init}}}$.

Then, we let ${\bar{{H}}}_{k}=\{\bar{{q}}_{k},{\bar{{t}}}{}_{k}\}$ and ${{H}_{{s,\mathrm{init}}}}\cdot \hat{{H}}_{k}\equiv \{{q}_{k},{t}_{k}\}$, where ${\bar{{t}}}_{k}$ and ${t}_{k}$ are translation vectors of ${\bar{{H}}}_{k}$ and ${{H}_{s}}\hat{{H}_{k}}$, respectively, and ${\bar{{q}}}_{k}$ and ${q}_{k}$ are unit quaternion vectors representing the orientations of ${\bar{{H}}}_{k}$ and ${{H}_{s}}\hat{{H}_{k}}$, respectively. The nonlinear optimization of ${H}_{s}$ finalizes the alignment of two camera networks: $\mathbf {G}$ and ${{H}_{{s,\mathrm{init}}}}\cdot \mathbf {Q}$ . This optimization is done by

$$\begin{aligned} \underset{H_{s}}{\mathrm{argmin}}\,\sum (\left\| {t}_{k}-{\bar{t}}_{k}\right\| _{2}+\min (\left\| {q}_{k}-{\bar{{q}}}_{k}\right\| _{2},\,\left\| {q}_{k}+\bar{{q}}_{k}\right\| _{2})), \end{aligned}$$

(12)

We employed the Levenberg–Marquardt algorithm to minimize the translation and orientation residuals simultaneously. This procedure optimizes the similarity transform that aligns two camera networks on top of each other. Then, the camera poses are compared individually in terms of their orientations and locations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, K.M., Rueda, A.J. Robust and efficient object reconstructions from closed loop sequences. Machine Vision and Applications 32, 70 (2021). https://doi.org/10.1007/s00138-021-01193-7

Download citation

Received: 16 December 2019
Revised: 11 August 2020
Accepted: 05 March 2021
Published: 13 April 2021
DOI: https://doi.org/10.1007/s00138-021-01193-7

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust and efficient object reconstructions from closed loop sequences

Abstract

Access this article

Similar content being viewed by others

A Dense Pipeline for 3D Reconstruction from Image Sequences

Real-Time Monocular Segmentation and Pose Tracking of Multiple Objects

Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 23378 KB)

A Comparing against ground truth

Rights and permissions

About this article

Cite this article

Navigation

Robust and efficient object reconstructions from closed loop sequences

Abstract

Access this article

Similar content being viewed by others

A Dense Pipeline for 3D Reconstruction from Image Sequences

Real-Time Monocular Segmentation and Pose Tracking of Multiple Objects

Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 23378 KB)

A Comparing against ground truth

A Comparing against ground truth

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation