Keypoint Heatmap Guided Self-Supervised Monocular Visual Odometry

Xiu, Haixin; Liang, Yiyou; Zeng, Hui

doi:10.1007/s10846-022-01685-2

Keypoint Heatmap Guided Self-Supervised Monocular Visual Odometry

Short Paper
Published: 28 July 2022

Volume 105, article number 78, (2022)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

243 Accesses
2 Citations
Explore all metrics

Abstract

Visual odometry is an important part of visual simultaneous localization and mapping (SLAM) system. In recent years, with the development of deep learning technique, the combination of visual odometry with deep learning has attracted more and more researchers’ attentions. Existing deep learning-based monocular visual odometry methods include a large number of calculations of redundant pixels, and they only consider the pose transformation between two adjacent frames, resulting in error accumulations. To solve the above problems, an end-to-end self-supervised monocular visual odometry method based on keypoint heatmap guidance is proposed in this paper. In the process of network training, the keypoint heatmap is used to guide network learning to reduce the influence of redundant pixels. The photometric error based on the pose consistency constraint of image sequence is calculated to reduce the accumulated error in the pose estimation of video sequence. Extensive experimental results on the KITTI visual odometry dataset have fully validated the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

LiDAR odometry survey: recent advancements and remaining challenges

Article Open access 09 February 2024

3D point cloud-based place recognition: a survey

Article Open access 07 March 2024

Data Availability

The code and data of the proposed method can be downloaded from the following address: https://github.com/kaixjl/kphm-vo

References

Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31, 1147–1163 (2015). https://doi.org/10.1109/TRO.2015.2463671
Article Google Scholar
Mur-Artal, R., Tardos, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33, 1255–1262 (2017). https://doi.org/10.1109/TRO.2017.2705103
Article Google Scholar
Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM3: an accurate open-source library for visual, visual-inertial, and multimap SLAM. IEEE Trans. Robot. 37, 1874–1890 (2021). https://doi.org/10.1109/TRO.2021.3075644
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings - IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2980–2988. IEEE (2017). https://doi.org/10.1109/ICCV.2017.322
Chapter Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(1137–1149), (2017). https://doi.org/10.1109/TPAMI.2016.2577031
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 337–345. IEEE (2018). https://doi.org/10.1109/CVPRW.2018.00060
Chapter Google Scholar
Sarlin, P.-E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4937–4946. IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00499
Chapter Google Scholar
Liu, Y., Shen, Z., Lin, Z., Peng, S., Bao, H., Zhou, X.: GIFT: learning transformation-invariant dense visual descriptors via group CNNs. In: Advances in Neural Information Processing Systems, pp. 6992–7003. MIT Press (2019) https://dl.acm.org/doi/10.5555/3454287.3454915
Google Scholar
Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., Sattler, T.: D2-net: a trainable CNN for joint description and detection of local features. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8084–8093. IEEE (2019). https://doi.org/10.1109/CVPR.2019.00828
Chapter Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and Ego-motion from video. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6612–6619. IEEE (2017). https://doi.org/10.1109/CVPR.2017.700
Chapter Google Scholar
Bian, J.-W., Li, Z., Wang, N., Zhan, H., Shen, C., Cheng, M.-M., Reid, I.: Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alche-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 1–11. MIT Press, Vancouver (2019)
Google Scholar
Almalioglu, Y., Saputra, M.R.U., De Gusmão, P.P.B., Markham, A., Trigoni, N.: GANVO: unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In: Proceedings - IEEE International Conference on Robotics and Automation (ICRA), pp. 5474–5480. IEEE (2019). https://doi.org/10.1109/ICRA.2019.8793512
Chapter Google Scholar
Yin, Z., Shi, J.: GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1983–1992. IEEE (2018). https://doi.org/10.1109/CVPR.2018.00212
Chapter Google Scholar
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: Proceedings - IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2320–2327. IEEE (2011). https://doi.org/10.1109/ICCV.2011.6126513
Chapter Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings - IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1150–1157. IEEE (1999). https://doi.org/10.1109/ICCV.1999.790410
Chapter Google Scholar
Newcombe, R.A., Fitzgibbon, A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S.: KinectFusion: real-time dense surface mapping and tracking. In: Proceedings - 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp. 127–136. IEEE (2011). https://doi.org/10.1109/ISMAR.2011.6092378
Chapter Google Scholar
Wang, R., Schworer, M., Cremers, D.: Stereo DSO: large-scale direct sparse visual odometry with stereo cameras. In: Proceedings - IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3923–3931. IEEE (2017). https://doi.org/10.1109/ICCV.2017.421
Chapter Google Scholar
Von Stumberg, L., Usenko, V., Cremers, D.: Direct sparse visual-inertial odometry using dynamic marginalization. In: Proceedings - IEEE International Conference on Robotics and Automation (ICRA), pp. 2510–2517. IEEE (2018). https://doi.org/10.1109/ICRA.2018.8462905
Chapter Google Scholar
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40, 611–625 (2018). https://doi.org/10.1109/TPAMI.2017.2658577
Article Google Scholar
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: Large-scale direct monocular SLAM. In: European Conference on Computer Vision (ECCV), pp. 834–849. Springer (2014). https://doi.org/10.1007/978-3-319-10605-2_54
Chapter Google Scholar
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera SLAM. IEEE Trans. Pattern Anal. Machine Intell. 29, 1052–1067 (2007)
Article Google Scholar
Holmes, S., Klein, G., Murray, D.W.: A square root unscented kalman filter for visual monoSLAM. In: Proceedings - IEEE International Conference on Robotics and Automation, pp. 3710–3716. IEEE (2008). https://doi.org/10.1109/ROBOT.2008.4543780
Chapter Google Scholar
Gamage, D., Drummond, T.: Reduced dimensionality extended Kalman filter for SLAM in a relative formulation. In: Proceedings - IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1365–1372. IEEE (2015). https://doi.org/10.1109/IROS.2015.7353545
Chapter Google Scholar
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proceedings - IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 1–10. IEEE (2007). https://doi.org/10.1109/ISMAR.2007.4538852
Chapter Google Scholar
Engel, J., Sturm, J., Cremers, D.: Semi-dense visual odometry for a monocular camera. In: Proceedings - IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1449–1456. IEEE (2013). https://doi.org/10.1109/ICCV.2013.183
Chapter Google Scholar
Wang, S., Clark, R., Wen, H., Trigoni, N.: Deepvo: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: Proceedings - IEEE International Conference on Robotics and Automation (ICRA), pp. 2043–2050. IEEE (2017). https://doi.org/10.1109/ICRA.2017.7989236
Chapter Google Scholar
Xue, F., Wang, X., Li, S., Wang, Q., Wang, J., Zha, H.: Beyond tracking: selecting memory and refining poses for deep visual odometry. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8567–8575. IEEE (2019). https://doi.org/10.1109/CVPR.2019.00877
Chapter Google Scholar
Ummenhofer, B., Zhou, H., Uhrig, J., Mayer, N., Ilg, E., Dosovitskiy, A., Brox, T.: DeMoN: depth and motion network for learning monocular stereo. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5622–5631. IEEE (2017). https://doi.org/10.1109/CVPR.2017.596
Chapter Google Scholar
Zou, Y., Ji, P., Tran, Q.-H., Huang, J.-B., Chandraker, M.: Learning monocular visual odometry via self-supervised long-term modeling. In: Computer Vision – ECCV 2020, pp. 710–727. Springer (2020). https://doi.org/10.1007/978-3-030-58568-6_42
Chapter Google Scholar
Zhou, H., Ummenhofer, B., Brox, T.: DeepTAM: deep tracking and mapping. In: European Conference on Computer Vision (ECCV), pp. 1–18. Springer (2018). https://doi.org/10.1007/978-3-030-01270-0_50
Chapter Google Scholar
Li, R., Wang, S., Long, Z., Gu, D.: UnDeepVO: monocular visual odometry through unsupervised deep learning. In: Proceedings - IEEE International Conference on Robotics and Automation (ICRA), pp. 7286–7291. IEEE (2018). https://doi.org/10.1109/ICRA.2018.8461251
Chapter Google Scholar
Wang, R., Pizer, S.M., Frahm, J.-M.: Recurrent neural network for (un-)supervised learning of monocular video visual odometry and depth. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5555–5564. IEEE (2019). https://doi.org/10.1109/CVPR.2019.00570
Chapter Google Scholar
Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W., Woo, W.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810. MIT Press (2015)
Google Scholar
Jau, Y.-Y., Zhu, R., Su, H., Chandraker, M.: Deep keypoint-based camera pose estimation with geometric constraints. In: Proceedings - IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4950–4957. IEEE (2020). https://doi.org/10.1109/IROS45743.2020.9341229
Chapter Google Scholar
Charles, R.Q., Su, H., Kaichun, M., Guibas, L.J.: PointNet: deep learning on point sets for 3d classification and segmentation. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 77–85. IEEE (2017). https://doi.org/10.1109/CVPR.2017.16
Chapter Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(600–612), (2004). https://doi.org/10.1109/TIP.2003.819861
Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6602–6611. IEEE (2017). https://doi.org/10.1109/CVPR.2017.699
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE (2016). https://doi.org/10.1109/CVPR.2016.90
Chapter Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048. IEEE (2016). https://doi.org/10.1109/CVPR.2016.438
Chapter Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012). https://doi.org/10.1109/CVPR.2012.6248074
Chapter Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR), pp. 1–15. ICLR, San Diego (2015). https://doi.org/10.48550/ARXIV.1412.6980
Chapter Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.: Digging into self-supervised monocular depth estimation. In: Proceedings - IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3827–3837. IEEE (South) (2019). https://doi.org/10.1109/ICCV.2019.00393
Chapter Google Scholar
Zhan, H., Garg, R., Weerasekera, C.S., Li, K., Agarwal, H., Reid, I.: Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 340–349. IEEE (2018). https://doi.org/10.1109/CVPR.2018.00043
Chapter Google Scholar
Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using geometric constraints. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5667–5675. IEEE (2018). https://doi.org/10.1109/CVPR.2018.00594
Chapter Google Scholar
Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., Black, M.J.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12232–12241. IEEE (2019). https://doi.org/10.1109/CVPR.2019.01252
Chapter Google Scholar
Shen, T., Luo, Z., Zhou, L., Deng, H., Zhang, R., Fang, T., Quan, L.: Beyond photometric loss for self-supervised ego-motion estimation. In: Proceedings - IEEE International Conference on Robotics and Automation (ICRA), pp. 6359–6365. IEEE (2019). https://doi.org/10.1109/ICRA.2019.8793479
Chapter Google Scholar
Li, Y., Ushiku, Y., Harada, T.: Pose graph optimization for unsupervised monocular visual Odometry. In: Proceedings - IEEE International Conference on Robotics and Automation (ICRA), pp. 5439–5445. IEEE (2019). https://doi.org/10.1109/ICRA.2019.8793706
Chapter Google Scholar
Li, S., Wang, X., Cao, Y., Xue, F., Yan, Z., Zha, H.: Self-supervised deep visual Odometry with online adaptation. In: Proceedings - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6338–6347. IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00637
Chapter Google Scholar

Download references

Funding

This work was supported by the National Key R&D Program of China (2020YFB1313002) and National Natural Science Foundation of China (Grant No. 61973029), and Scientific and Technological Innovation Foundation of Foshan (BK21BF004).

Author information

Authors and Affiliations

Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Haixin Xiu, Yiyou Liang & Hui Zeng
Shunde Graduate School of University of Science and Technology Beijing, Foshan, 528399, China
Hui Zeng

Authors

Haixin Xiu
View author publications
You can also search for this author in PubMed Google Scholar
Yiyou Liang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Haixin Xiu: Conceptualization, Methodology, Software, Formal Analysis, Writing Part Original Draft. Yiyou Liang: Investigation, Formal Analysis, Writing Part Original Draft. Hui Zeng: Supervision, Writing Review and Editing.

Corresponding author

Correspondence to Hui Zeng.

Ethics declarations

Ethical Approval

Not applicable.

Consent to Participate

Not applicable.

Consent to Publish

Not applicable.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiu, H., Liang, Y. & Zeng, H. Keypoint Heatmap Guided Self-Supervised Monocular Visual Odometry. J Intell Robot Syst 105, 78 (2022). https://doi.org/10.1007/s10846-022-01685-2

Download citation

Received: 04 February 2022
Accepted: 03 July 2022
Published: 28 July 2022
DOI: https://doi.org/10.1007/s10846-022-01685-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Keypoint Heatmap Guided Self-Supervised Monocular Visual Odometry

Abstract

Access this article

Similar content being viewed by others

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

LiDAR odometry survey: recent advancements and remaining challenges

3D point cloud-based place recognition: a survey

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Consent to Participate

Consent to Publish

Conflicts of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Keypoint Heatmap Guided Self-Supervised Monocular Visual Odometry

Abstract

Access this article

Similar content being viewed by others

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

LiDAR odometry survey: recent advancements and remaining challenges

3D point cloud-based place recognition: a survey

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Consent to Participate

Consent to Publish

Conflicts of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation