Skip to main content
Log in

Robust and efficient object reconstructions from closed loop sequences

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

We propose a new hierarchical structure from motion system for close loop sequences. Our system includes a novel approach for clustering cameras into multiple sets, whose camera poses are initially reconstructed separately and later globally registered w.r.t a single coordinate frame. Each of the multiple sets is robustly reconstructed by a novel guarded least median square protocol. Our method is accelerated by reducing the parameter space of bundle adjustment in the local reconstruction optimizations. We also propose a new synthetic dataset that could be useful in 3D object reconstruction problems. Extensive experiments with both synthetic and real data were carried out to validate our method. Our system presented better results than ACTS and GPE in terms of rotation and translation errors in the camera pose estimations, and the accuracy is quite close to COLMAP while our method is much faster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Lu, G., Han, K., DeSouza, G., Armer, J., Shyu, C.: A new algorithm for 3D registration and its application in self-monitoring and early detection of lymphedema. IRMB 35, 370–384 (2014)

    Google Scholar 

  2. Nguyen, C.V., Lovell, D.R., Adcock, M., Salle, J.L.: Capturing natural-colour 3D models of insects for species discovery and diagnostics. PLoS One 9, e94346 (2014)

    Article  Google Scholar 

  3. Roussel, J., Fischbach, A., Jahnke, S., Scharr, H.: 3D surface reconstruction of plant seeds by volume carving. In: Tsaftaris, H.S.S.A., Pridmore, T. (eds) Proceedings of the Computer Vision Problems in Plant Phenotyping (CVPPP), September 2015, pp. 7.1–7.13. BMVA Press (2015) https://doi.org/10.5244/C.29.CVPPP.7

  4. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1362–1376 (2010)

    Article  Google Scholar 

  5. Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. Int. J. Comput. Vis. 80(2), 189–210 (2007)

    Article  Google Scholar 

  6. Wu, C.: Towards linear-time incremental structure from motion. In: Proceedings of the 2013 International Conference on 3D Vision(3DV), pp. 127–134 (2013)

  7. Schonberger, J.L., Frahm, J.-M.: Structure-from-motion revisited. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 4104–4113. IEEE Conference Publications (2016)

  8. Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)

    Article  Google Scholar 

  9. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. In: ACM Transactions on Graphics, pp. 835–846 (2006)

  10. Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In:Proceedings of the 2009 IEEE International Conference on Computer Vision(ICCV), pp. 72–79 (2009)

  11. Hoppe, C., Klopschitz, M., Rumpler, M., Wendel, A., Kluckner, S., Bischof, H., Reitmayr, G.: Online feedback for structure-from-motion image acquisition. In: Proceedings of the British Machine Vision Conference, pp. 70.1–70.12. BMVA Press (2012)

  12. Wu, C.: Visualsfm: a visual structure from motion system @ONLINE. http://ccwu.me/vsfm/ (2010)

  13. Fitzgibbon, A.W., Zisserman, A.: Automatic camera recovery for closed or open image sequences. In: Proceedings of the 1998 European Conference on Computer Vision(ECCV), pp. 311–326 (1998)

  14. Nister, D.: Reconstruction from uncalibrated sequences with hierachiy of trifocal tensors. In: ECCV 2000, 6th European Conference on Computer Vision (2000)

  15. Havlena, M., Torii, A., Knopp, J., Pajdla, T.: Randomized structure from motion based on atomic 3D models from camera triplets. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2874–2881. IEEE (2000)

  16. Roberto Toldo, R.G., Farenzena, M.: Hierachical structure-and-motion recovery from uncalibrated images. Comput. Vis. Image Underst. 140, 127–143 (2015)

    Article  Google Scholar 

  17. Torr, P.H., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78, 138–156 (2000)

    Article  Google Scholar 

  18. Sim, K., Hartley, R.: Recovering camera motion using linfty minimization. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, ser. CVPR ’06. vol. 1, pp. 1230-1237. IEEE Computer Society, Washington, DC, USA (2006) https://doi.org/10.1109/CVPR.2006.247

  19. Crandall, D., Owens, A., Snavely, N., Huttenlocher, D.: Discrete-continuous optimization for large-scale structure from motion. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, ser. CVPR ’11, pp. 3001-3008. IEEE Computer Society, Washington, DC, USA (2011). https://doi.org/10.1109/CVPR.2011.5995626

  20. Jiang, N., Cui, Z., Tan, P.: A global linear method for camera pose registration. In: IEEE International Conference on Computer Vision (ICCV), pp. 481–488 (2013)

  21. Martinec, D., Pajdla, T.: Robust rotation and translation estimation in multiview reconstruction. In: Proceedings of the 2007 IEEE International Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1–8 (2007)

  22. Aeire-Nachimson, M., Kovalsky, S.Z., Kremelmancher-Shizerman, I., Singer, A., Basri, R.: Global motion estimation from point matches. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission(3DIMPVT), pp 81–88 (2012)

  23. Wilson, K., Snavely, N.: Robust global translation with 1dsfm. In: Proceedings of the 2014 European Conference on Computer Vision(ECCV), pp. 61–75 (2014)

  24. Govindu, V.M.: Combining two-view constraints for motion estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2001, pp. 218–225 (2001)

  25. Chatterjee, A., Govindu, V.M.: Efficient and robust large-scale rotation averaging. In: IEEE International Conference on Computer Vision, pp. 521–528. IEEE Conference Publications (2013)

  26. Courchay, J., Dalalyan, A.S., Keriven, R., Sturm, P.F.: Exploiting loops in the graph of trifocal tensors for calibrating a network of cameras. In: Proceedings of the 11th European Conference on Computer Vision, pp. 85–99 (2010)

  27. Sweeney, C.: Title multiview geometry library: tutorial & reference. http://theia-sfm.org

  28. Mur-Artal, R., Tardos, J.D.: Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 33, 1255–1262 (2017)

    Article  Google Scholar 

  29. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)

    Article  Google Scholar 

  30. Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, 2007. ISMAR 2007, pp. 225–234. IEEE (2007)

  31. Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular slam. In: Computer Vision-ECCV, pp. 834–849 (2014)

  32. Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE (2014)

  33. Tang, J., Folkesson, J., Jensfelt, P.: Geometric correspondence network for camera motion estimation. IEEE Robot. Autom. Lett. 3(2), 1010–1017 (2018)

    Article  Google Scholar 

  34. Mescheder, L.M., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4455–4465 (2019)

  35. Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: Surfacenet: an end-to-end 3d neural network for multiview stereopsis. In: IEEE International Conference on Computer Vision (ICCV), pp. 2326–2334 (2017)

  36. Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2463–2471 (2017)

  37. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  38. Sunil Arya, N.S.N., Mount, D.M., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 891–923 (1998)

    Article  MathSciNet  Google Scholar 

  39. Snavely, N.: Structure from motion (SFM) for unordered image collections. http://www.cs.cornell.edu/~snavely/bundler (2007)

  40. Atallah, M.J., Blanton, M. (eds.): Algorithms and Theory of Computation Handbook: General Concepts and Techniques, 2nd edn. Chapman & Hall/CRC, Boca Raton (2010)

    MATH  Google Scholar 

  41. Chum, O., Werner, T., Matas, J.: Two-view geometry estimation unaffected by a dominant plane. In: Computer Vision and Pattern Recognition (CVPR), vol. 19, , pp. 772–779. rANSAC (2005)

  42. Stewart, C.V.: Robust parameter estimation in computer vision. In: Society for Industrial and Applied Mathematics, vol. 41, pp. 513–537. rANSAC/LMS (1999)

  43. Agarwal, S., Mierle, K., et al.: Ceres solver. http://ceres-solver.org

  44. V. Moreno-Noguer. F., Lepetit and P. Fua, “Accurate non-iterative o(n) solution to the pnp problem,” in Proceedings of the 2007 IEEE International Conference on Computer Vision(ICCV), 2007, pp. 1–8, perspective N Point

  45. Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate o(n) solution to the PnP problem. Int. J. Comput. Vis. 81, 155–166 (2009)

    Article  Google Scholar 

  46. Hartley, R., Trumpf, J., Dai, Y., Li, H.: Rotation averaging. Int. J. Comput. Vis. 103, 267–305 (2013)

    Article  MathSciNet  Google Scholar 

  47. Zhang, G., Dong, Z., Jia, J., Wong, T.-T., Bao, H.: Efficient non-consecutive feature tracking for structure-from-motion. In: Proceedings of the 11th European Conference on Computer Vision: Part V, ser. ECCV’10, pp. 422–435. Springer-Verlag, Berlin, Heidelberg (2010)

  48. Ozyesil, O., Singer, A.: A robust camera location estimation by convex programming. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2674–2683. IEEE Conference Publications (2015)

Download references

Acknowledgements

This project was supported in part by the ITRC/IITP program (IITP-2020-0-01460) in South Korea, in part by the Ministry of Science, Innovation and Universities of the Spanish Government and the European Union through the research project RTI2018-099638-B-I00, and in part by the NRF (2017R1A2B3012701, 2018R1A6A3A11049832) in South Korea.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyung Min Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 23378 KB)

A Comparing against ground truth

A Comparing against ground truth

Let \(\mathbf {G}\) be the ground truth camera network poses \(\mathbf {G}=\left[ {{I},\bar{{H}}_{2}\bar{{H}}_{3},\cdots ,\bar{{H}}_{{n}}}\right] \) of a dataset, and \(\mathbf {Q}\) be the estimated camera network poses \(\mathbf {Q}={\left[ {I},\hat{{H}}_{2},\hat{{H}}_{3},\cdots ,\hat{{H}}_{{n}}\right] }\) from the same dataset. Here, I is \(4\times 4\) identity matrix, \({H}_{k}\) is \(4\times 4\) homogeneous matrix for kth camera pose. It is straightforward to notice that \(\mathbf {G}\) is equal to \(\mathbf {Q}\) up to similarity transform. That is,

$$\begin{aligned} \mathbf {G}={H_{s}}\cdot \mathbf {Q}. \end{aligned}$$
(10)

Therefore, we need to find a similarity transformation \({H_{s}}\) that minimizes

$$\begin{aligned} \underset{H_{s}}{\mathrm{argmin}}\left\| \mathbf {G}-({H_{s}}\mathbf {\cdot Q})\right\| ^{2}= & {} 0. \end{aligned}$$
(11)

An approximate solution to (11) is under relaxed orthonormality and determinant constraints of rotation parts of the camera matrices, computed by its least square sense approximation—i.e. multiplying \(\mathbf {G}\) with pseudo inverse of \(\mathbf {Q}\). Let us consider the linear least square approximation of (11) is \({H_{s,\mathrm{init}}}\).

Then, we let \({\bar{{H}}}_{k}=\{\bar{{q}}_{k},{\bar{{t}}}{}_{k}\}\) and \({{H}_{{s,\mathrm{init}}}}\cdot \hat{{H}}_{k}\equiv \{{q}_{k},{t}_{k}\}\), where \({\bar{{t}}}_{k}\) and \({t}_{k}\) are translation vectors of \({\bar{{H}}}_{k}\) and \({{H}_{s}}\hat{{H}_{k}}\), respectively, and \({\bar{{q}}}_{k}\) and \({q}_{k}\) are unit quaternion vectors representing the orientations of \({\bar{{H}}}_{k}\) and \({{H}_{s}}\hat{{H}_{k}}\), respectively. The nonlinear optimization of \({H}_{s}\) finalizes the alignment of two camera networks: \(\mathbf {G}\) and \({{H}_{{s,\mathrm{init}}}}\cdot \mathbf {Q}\) . This optimization is done by

$$\begin{aligned} \underset{H_{s}}{\mathrm{argmin}}\,\sum (\left\| {t}_{k}-{\bar{t}}_{k}\right\| _{2}+\min (\left\| {q}_{k}-{\bar{{q}}}_{k}\right\| _{2},\,\left\| {q}_{k}+\bar{{q}}_{k}\right\| _{2})), \end{aligned}$$
(12)

We employed the Levenberg–Marquardt algorithm to minimize the translation and orientation residuals simultaneously. This procedure optimizes the similarity transform that aligns two camera networks on top of each other. Then, the camera poses are compared individually in terms of their orientations and locations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, K.M., Rueda, A.J. Robust and efficient object reconstructions from closed loop sequences. Machine Vision and Applications 32, 70 (2021). https://doi.org/10.1007/s00138-021-01193-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01193-7

Navigation