Skip to main content
Log in

Parameter-adaptive multi-frame joint pose optimization method

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Camera pose optimization is the basis of geometric vision works, such as 3D reconstruction, structure from motion, and visual odometry. We designed a multi-frame pose optimization method based on the inverse compositional algorithm. The neural networks are added into the optimization model to improve the problems of hyperparameter selection and loss function design. The multi-frame joint is used to fully utilize the constraints between the sequence images. A multi-layer stepwise method is used, which incorporates scale factors on the loss of each layer to enhance the convergence of the network. The simulation verifies that the proposed method achieves higher precision of pose estimation compared with the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Forster, C., Pizzoli, M., Scaramuzza, D.S.: Fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE (2014)

  2. Tola, E., Lepetit, V., Fua, P.: Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Pattern Anal. Mach. Intell. 32(5), 815–830 (2010)

    Article  Google Scholar 

  3. Whelan, T., Kaess, M., Johannsson, H., Fallon, M., Leonard, J.J., McDonald, J.: Real-time large-scale dense rgb-d slam with volumetric fusion. Int. J. Robot. Res. 34(4–5), 598–626 (2015)

    Article  Google Scholar 

  4. Schonberger, J.L., Frahm, J.-M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)

  5. More, J.J.: The Levenberg–Marquardt algorithm: implementation and theory in numerical analysis. Lecture Notes in Mathematics, p. 630 (1977)

  6. Li, S., Zhang, T., Zhang, D., Nie, Y., Wang, J.: Metric learning for patch-based 3-d image registration. In: IEEE Transactions on Automation Science and Engineering (2019)

  7. Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of rgb-d slam systems. In: Proceedings of the International Conference on Intelligent Robot Systems (IROS)

  8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  9. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)

    Article  Google Scholar 

  10. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.O.: An efficient alternative to sift or surf. In:2011 IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE (2011)

  11. Engel, J., Schöps, T., Cremers, D.: Lsd-slam: Large-scale direct monocular slam. In: European Conference on Computer Vision, pp. 834–849. Springer (2014)

  12. Indra Gandhi, M.P., et al.: Image registration quality assessment with similarity measures-a research study. In: 2015 International Conference on Communications and Signal Processing (ICCSP), pp. 0084–0088. IEEE (2015)

  13. Black, M.J., Anandan, P.: The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. Comput. Vis. Image Underst. 63(1), 75–104 (1996)

    Article  Google Scholar 

  14. Anandan, P.: A computational framework and an algorithm for the measurement of visual motion. Int. J. Comput. Vis. (IJCV) 2(3), 283–310 (1989)

    Article  Google Scholar 

  15. Kendall, A., Grimes, M., Cipolla, R.P.: A convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)

  16. Kendall, A., Cipolla, R., et al.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the CVPR, Vol. 3, pp. 8 (2017)

  17. Wu, J., Ma, L., Hu, X.: Delving deeper into convolutional neural networks for camera relocalization. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5644–5651. IEEE (2017)

  18. Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6896–6906 (2018)

  19. Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 675–687. Springer (2017)

  20. Costante, G., Mancini, M., Valigi, P., Ciarfuglia, T.A.: Exploring representation learning with cnns for frame-to-frame ego-motion estimation. IEEE Roboti. Autom. Lett. 1(1), 18–25 (2016)

    Article  Google Scholar 

  21. Wang, S., Clark, R., Wen, H., Trigoni, N.D.: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2043–2050. IEEE (2017)

  22. En, S., Lechervy, A., Jurie, F.: Rpnet: An End-to-End Network for Relative Camera Pose Estimation. Springer, Cham (2018)

    Google Scholar 

  23. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR, Vol. 2, pp. 7 (2017)

  24. Li, R., Wang, S., Long, Z., Gu, D.: Undeepvo: monocular visual odometry through unsupervised deep learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7286–7291. IEEE (2018)

  25. Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., Fragkiadaki, K.: Sfm-net: learning of structure and motion from video. arXiv:1704.07804, (2017)

  26. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  27. Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5667–5675 (2018)

  28. Shen, T., Luo, Z., Zhou, L., Deng, H., Zhang, R., Fang, T., Quan, L.: Beyond photometric loss for self-supervised ego-motion estimation. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6359–6365. IEEE (2019)

  29. DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)

  30. Han, X., Leung, T., Jia, Y., Sukthankar, R., Matchnet, A.C.B.: Unifying feature and metric learning for patch-based matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3279–3286 (2015)

  31. Yang, N., Wang, R., Stuckler, J., Cremers, D.: Deep virtual stereo odometry: leveraging deep depth prediction for monocular direct sparse odometry. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 817–833 (2018)

  32. Zhan, H., Weerasekera, C.S., Bian, J., Reid, I.: Visual odometry revisited: What should be learnt? arXiv:1909.09803 (2019)

  33. Tang, J., Ambrus, R., Guizilini, V., Pillai, S., Kim, H., Gaidon, A.: Self-supervised 3d keypoint learning for ego-motion estimation. arXiv:1912.03426 (2019)

  34. Tang, C., Tan, P.: Ba-net: dense bundle adjustment network. arXiv:1806.04807 (2018)

  35. Lv, Z., Dellaert, F., Rehg, J.M., Geiger, A.: Taking a deeper look at the inverse compositional algorithm. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4581–4590 (2019)

  36. Baker, S., Matthews, I.: Lucas-kanade 20 years on: a unifying framework. Int. J. Comput. Vis. 56(3), 221–255 (2004)

    Article  MATH  Google Scholar 

  37. Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: Deepim: deep iterative matching for 6d pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 683–698 (2018)

Download references

Funding

This study was funded by the National Natural Science Foundation of China (grant number: 62103432).

Author information

Authors and Affiliations

Authors

Contributions

Tao Zhang designed the research. Wei wu and Bangjie Li processed the data. Shaopeng Li drafted the manuscript. Yong Xian helped organize the manuscript.

Corresponding author

Correspondence to Shaopeng Li.

Ethics declarations

Conflict of Interest

Shaopeng Li and Tao Zhang declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Xian, Y., Wu, W. et al. Parameter-adaptive multi-frame joint pose optimization method. Vis Comput 39, 2529–2541 (2023). https://doi.org/10.1007/s00371-022-02476-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02476-4

Keywords

Navigation