Abstract
We propose a new hierarchical structure from motion system for close loop sequences. Our system includes a novel approach for clustering cameras into multiple sets, whose camera poses are initially reconstructed separately and later globally registered w.r.t a single coordinate frame. Each of the multiple sets is robustly reconstructed by a novel guarded least median square protocol. Our method is accelerated by reducing the parameter space of bundle adjustment in the local reconstruction optimizations. We also propose a new synthetic dataset that could be useful in 3D object reconstruction problems. Extensive experiments with both synthetic and real data were carried out to validate our method. Our system presented better results than ACTS and GPE in terms of rotation and translation errors in the camera pose estimations, and the accuracy is quite close to COLMAP while our method is much faster.
Similar content being viewed by others
References
Lu, G., Han, K., DeSouza, G., Armer, J., Shyu, C.: A new algorithm for 3D registration and its application in self-monitoring and early detection of lymphedema. IRMB 35, 370–384 (2014)
Nguyen, C.V., Lovell, D.R., Adcock, M., Salle, J.L.: Capturing natural-colour 3D models of insects for species discovery and diagnostics. PLoS One 9, e94346 (2014)
Roussel, J., Fischbach, A., Jahnke, S., Scharr, H.: 3D surface reconstruction of plant seeds by volume carving. In: Tsaftaris, H.S.S.A., Pridmore, T. (eds) Proceedings of the Computer Vision Problems in Plant Phenotyping (CVPPP), September 2015, pp. 7.1–7.13. BMVA Press (2015) https://doi.org/10.5244/C.29.CVPPP.7
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1362–1376 (2010)
Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. Int. J. Comput. Vis. 80(2), 189–210 (2007)
Wu, C.: Towards linear-time incremental structure from motion. In: Proceedings of the 2013 International Conference on 3D Vision(3DV), pp. 127–134 (2013)
Schonberger, J.L., Frahm, J.-M.: Structure-from-motion revisited. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 4104–4113. IEEE Conference Publications (2016)
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. In: ACM Transactions on Graphics, pp. 835–846 (2006)
Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In:Proceedings of the 2009 IEEE International Conference on Computer Vision(ICCV), pp. 72–79 (2009)
Hoppe, C., Klopschitz, M., Rumpler, M., Wendel, A., Kluckner, S., Bischof, H., Reitmayr, G.: Online feedback for structure-from-motion image acquisition. In: Proceedings of the British Machine Vision Conference, pp. 70.1–70.12. BMVA Press (2012)
Wu, C.: Visualsfm: a visual structure from motion system @ONLINE. http://ccwu.me/vsfm/ (2010)
Fitzgibbon, A.W., Zisserman, A.: Automatic camera recovery for closed or open image sequences. In: Proceedings of the 1998 European Conference on Computer Vision(ECCV), pp. 311–326 (1998)
Nister, D.: Reconstruction from uncalibrated sequences with hierachiy of trifocal tensors. In: ECCV 2000, 6th European Conference on Computer Vision (2000)
Havlena, M., Torii, A., Knopp, J., Pajdla, T.: Randomized structure from motion based on atomic 3D models from camera triplets. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2874–2881. IEEE (2000)
Roberto Toldo, R.G., Farenzena, M.: Hierachical structure-and-motion recovery from uncalibrated images. Comput. Vis. Image Underst. 140, 127–143 (2015)
Torr, P.H., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78, 138–156 (2000)
Sim, K., Hartley, R.: Recovering camera motion using linfty minimization. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, ser. CVPR ’06. vol. 1, pp. 1230-1237. IEEE Computer Society, Washington, DC, USA (2006) https://doi.org/10.1109/CVPR.2006.247
Crandall, D., Owens, A., Snavely, N., Huttenlocher, D.: Discrete-continuous optimization for large-scale structure from motion. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, ser. CVPR ’11, pp. 3001-3008. IEEE Computer Society, Washington, DC, USA (2011). https://doi.org/10.1109/CVPR.2011.5995626
Jiang, N., Cui, Z., Tan, P.: A global linear method for camera pose registration. In: IEEE International Conference on Computer Vision (ICCV), pp. 481–488 (2013)
Martinec, D., Pajdla, T.: Robust rotation and translation estimation in multiview reconstruction. In: Proceedings of the 2007 IEEE International Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1–8 (2007)
Aeire-Nachimson, M., Kovalsky, S.Z., Kremelmancher-Shizerman, I., Singer, A., Basri, R.: Global motion estimation from point matches. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission(3DIMPVT), pp 81–88 (2012)
Wilson, K., Snavely, N.: Robust global translation with 1dsfm. In: Proceedings of the 2014 European Conference on Computer Vision(ECCV), pp. 61–75 (2014)
Govindu, V.M.: Combining two-view constraints for motion estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2001, pp. 218–225 (2001)
Chatterjee, A., Govindu, V.M.: Efficient and robust large-scale rotation averaging. In: IEEE International Conference on Computer Vision, pp. 521–528. IEEE Conference Publications (2013)
Courchay, J., Dalalyan, A.S., Keriven, R., Sturm, P.F.: Exploiting loops in the graph of trifocal tensors for calibrating a network of cameras. In: Proceedings of the 11th European Conference on Computer Vision, pp. 85–99 (2010)
Sweeney, C.: Title multiview geometry library: tutorial & reference. http://theia-sfm.org
Mur-Artal, R., Tardos, J.D.: Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 33, 1255–1262 (2017)
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, 2007. ISMAR 2007, pp. 225–234. IEEE (2007)
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular slam. In: Computer Vision-ECCV, pp. 834–849 (2014)
Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE (2014)
Tang, J., Folkesson, J., Jensfelt, P.: Geometric correspondence network for camera motion estimation. IEEE Robot. Autom. Lett. 3(2), 1010–1017 (2018)
Mescheder, L.M., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4455–4465 (2019)
Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: Surfacenet: an end-to-end 3d neural network for multiview stereopsis. In: IEEE International Conference on Computer Vision (ICCV), pp. 2326–2334 (2017)
Fan, H., Su, H., Guibas, L.: A point set generation network for 3d object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2463–2471 (2017)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Sunil Arya, N.S.N., Mount, D.M., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 891–923 (1998)
Snavely, N.: Structure from motion (SFM) for unordered image collections. http://www.cs.cornell.edu/~snavely/bundler (2007)
Atallah, M.J., Blanton, M. (eds.): Algorithms and Theory of Computation Handbook: General Concepts and Techniques, 2nd edn. Chapman & Hall/CRC, Boca Raton (2010)
Chum, O., Werner, T., Matas, J.: Two-view geometry estimation unaffected by a dominant plane. In: Computer Vision and Pattern Recognition (CVPR), vol. 19, , pp. 772–779. rANSAC (2005)
Stewart, C.V.: Robust parameter estimation in computer vision. In: Society for Industrial and Applied Mathematics, vol. 41, pp. 513–537. rANSAC/LMS (1999)
Agarwal, S., Mierle, K., et al.: Ceres solver. http://ceres-solver.org
V. Moreno-Noguer. F., Lepetit and P. Fua, “Accurate non-iterative o(n) solution to the pnp problem,” in Proceedings of the 2007 IEEE International Conference on Computer Vision(ICCV), 2007, pp. 1–8, perspective N Point
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate o(n) solution to the PnP problem. Int. J. Comput. Vis. 81, 155–166 (2009)
Hartley, R., Trumpf, J., Dai, Y., Li, H.: Rotation averaging. Int. J. Comput. Vis. 103, 267–305 (2013)
Zhang, G., Dong, Z., Jia, J., Wong, T.-T., Bao, H.: Efficient non-consecutive feature tracking for structure-from-motion. In: Proceedings of the 11th European Conference on Computer Vision: Part V, ser. ECCV’10, pp. 422–435. Springer-Verlag, Berlin, Heidelberg (2010)
Ozyesil, O., Singer, A.: A robust camera location estimation by convex programming. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2674–2683. IEEE Conference Publications (2015)
Acknowledgements
This project was supported in part by the ITRC/IITP program (IITP-2020-0-01460) in South Korea, in part by the Ministry of Science, Innovation and Universities of the Spanish Government and the European Union through the research project RTI2018-099638-B-I00, and in part by the NRF (2017R1A2B3012701, 2018R1A6A3A11049832) in South Korea.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
A Comparing against ground truth
A Comparing against ground truth
Let \(\mathbf {G}\) be the ground truth camera network poses \(\mathbf {G}=\left[ {{I},\bar{{H}}_{2}\bar{{H}}_{3},\cdots ,\bar{{H}}_{{n}}}\right] \) of a dataset, and \(\mathbf {Q}\) be the estimated camera network poses \(\mathbf {Q}={\left[ {I},\hat{{H}}_{2},\hat{{H}}_{3},\cdots ,\hat{{H}}_{{n}}\right] }\) from the same dataset. Here, I is \(4\times 4\) identity matrix, \({H}_{k}\) is \(4\times 4\) homogeneous matrix for kth camera pose. It is straightforward to notice that \(\mathbf {G}\) is equal to \(\mathbf {Q}\) up to similarity transform. That is,
Therefore, we need to find a similarity transformation \({H_{s}}\) that minimizes
An approximate solution to (11) is under relaxed orthonormality and determinant constraints of rotation parts of the camera matrices, computed by its least square sense approximation—i.e. multiplying \(\mathbf {G}\) with pseudo inverse of \(\mathbf {Q}\). Let us consider the linear least square approximation of (11) is \({H_{s,\mathrm{init}}}\).
Then, we let \({\bar{{H}}}_{k}=\{\bar{{q}}_{k},{\bar{{t}}}{}_{k}\}\) and \({{H}_{{s,\mathrm{init}}}}\cdot \hat{{H}}_{k}\equiv \{{q}_{k},{t}_{k}\}\), where \({\bar{{t}}}_{k}\) and \({t}_{k}\) are translation vectors of \({\bar{{H}}}_{k}\) and \({{H}_{s}}\hat{{H}_{k}}\), respectively, and \({\bar{{q}}}_{k}\) and \({q}_{k}\) are unit quaternion vectors representing the orientations of \({\bar{{H}}}_{k}\) and \({{H}_{s}}\hat{{H}_{k}}\), respectively. The nonlinear optimization of \({H}_{s}\) finalizes the alignment of two camera networks: \(\mathbf {G}\) and \({{H}_{{s,\mathrm{init}}}}\cdot \mathbf {Q}\) . This optimization is done by
We employed the Levenberg–Marquardt algorithm to minimize the translation and orientation residuals simultaneously. This procedure optimizes the similarity transform that aligns two camera networks on top of each other. Then, the camera poses are compared individually in terms of their orientations and locations.
Rights and permissions
About this article
Cite this article
Han, K.M., Rueda, A.J. Robust and efficient object reconstructions from closed loop sequences. Machine Vision and Applications 32, 70 (2021). https://doi.org/10.1007/s00138-021-01193-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-021-01193-7