Skip to main content
Log in

Lightweight spatial attentive network for vehicular visual odometry estimation in urban environments

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Visual odometry is the process of estimating the motion between two consecutive images. Traditional visual odometry algorithms require the careful fabrication of state-of-the-art building blocks based on geometry. These algorithms are highly sensitive to noise, and performance degradation of a single subprocess compromises the performance of the entire system. On the other hand, learning-based methods automatically learn the features required through motion mapping. However, current learning-based methods are computationally expensive and require a significant amount of time to estimate the pose from a video sequence. This method proposes a lightweight deep neural networks architecture to estimate the odometry by exploiting the refined features through spatial attention. Three different training and test splits of the KITTI benchmark are used to effectively evaluate the proposed approach. The execution time of the proposed approach is \(\sim\)1 ms, speeded up by 47 times over [1]. Performed experiments demonstrate the promising performance of the proposed method to the methods used in the comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Wang S, Clark R, Wen H, Trigoni A (2018) End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int J Rob Res 37:513–542. https://doi.org/10.1177/0278364917734298

    Article  Google Scholar 

  2. Yousif K, Bab-Hadiashar A, Hoseinnezhad R (2015) An overview to visual odometry and visual slam: applications to mobile robotics. Intell Indus Syst 1(4):289–311. https://doi.org/10.1007/s40903-015-0032-7

    Article  Google Scholar 

  3. Zhai M, Xiang X (2021) Geometry understanding from autonomous driving scenarios based on feature refinement. Neural Comput Appl 33(8):3209–3220. https://doi.org/10.1007/s00521-020-05192-z

    Article  Google Scholar 

  4. Liu K, Li Q, Qiu G (2020) Posegan: a pose-to-image translation framework for camera localization. ISPRS J Photogramm Remote Sens 166:308–315. https://doi.org/10.1016/j.isprsjprs.2020.06.010

    Article  Google Scholar 

  5. Klein, G., Murray, D.: Parallel tracking and mapping for small ar workspaces. In: Proceedings of the IEEE and ACM International symposium on mixed and augmented reality, pp. 225–234 (2007)

  6. Davison AJ, Reid ID, Molton ND, Stasse O (2007) Monoslam: real-time single camera slam. IEEE Trans Pattern Anal Mach Intell 29(6):1052–1067. https://doi.org/10.1109/TPAMI.2007.1049

    Article  Google Scholar 

  7. Mur-Artal R, Montiel JMM, Tardos JD (2015) ORB-SLAM: a versatile and accurate monocular slam system. IEEE Trans Robot 31(5):1147–1163. https://doi.org/10.1109/TRO.2015.2463671

    Article  Google Scholar 

  8. Cao MW, Jia W, Zhao Y, Li SJ, Liu XP (2018) Fast and robust absolute camera pose estimation with known focal length. Neural Comput Appl 29(5):1383–1398. https://doi.org/10.1007/s00521-017-3032-6

    Article  Google Scholar 

  9. Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: Dtam: Dense tracking and mapping in real-time. In: Proceedings of the IEEE International conference on computer vision (ICCV), pp. 2320–2327 (2011)

  10. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp. 3354–3361 (2012)

  11. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  12. Muja, M., Lowe, D.G.: Fast matching of binary features. In: Proceedings of the IEEE Conference on Computer and Robot Vision, pp. 404–410 (2012)

  13. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: An efficient alternative to sift or surf. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571 (2011)

  14. Pumarola, A., Vakhitov, A., Agudo, A., Sanfeliu, A., Moreno-Noguer, F.: Pl-slam: Real-time monocular visual slam with points and lines. In: Proceedings of the IEEE International Conference on robotics and automation (ICRA), pp. 4503–4508 (2017)

  15. McCormac, J., Clark, R., Bloesch, M., Davison, A., Leutenegger, S.: Fusion++: Volumetric object-level slam. In: Proceedings of the IEEE International Conference on 3D Vision (3DV), pp. 32–41 (2018)

  16. Herrera, D.C., Kim, K., Kannala, J., Pulli, K., Heikkilä, J.: Dt-slam: Deferred triangulation for robust slam. In: Proceedings of the IEEE International Conference on 3D Vision (3DV), vol. 1, pp. 609–616 (2014)

  17. Engel, J., Schöps, T., Cremers, D.: Lsd-slam: Large-scale direct monocular slam. In: Proceedings of the European Conference on computer vision (ECCV), pp. 834–849 (2014)

  18. Forster, C., Pizzoli, M., Scaramuzza, D.: Svo: Fast semi-direct monocular visual odometry. In: Proceedings of the IEEE International Conference on robotics and automation (ICRA), pp. 15–22 (2014)

  19. Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. IEEE Trans Pattern Anal Mach Intell 40(3):611–625. https://doi.org/10.1109/TPAMI.2017.2658577

    Article  Google Scholar 

  20. Zubizarreta J, Aguinaga I, Montiel JMM (2020) Direct sparse mapping. IEEE Trans Robot 36(4):1363–1370. https://doi.org/10.1109/TRO.2020.2991614

    Article  Google Scholar 

  21. Roberts, R., Nguyen, H., Krishnamurthi, N., Balch, T.: Memory-based learning for visual odometry. In: Proceedings of the IEEE International Conference on robotics and automation (ICRA), pp. 47–52 (2008)

  22. Guizilini V, Ramos F (2013) Semi-parametric learning for visual odometry. Int J Rob Res 32(5):526–546. https://doi.org/10.1177/0278364912472245

    Article  Google Scholar 

  23. Kendall, A., Grimes, M., Cipolla, R.: Posenet: A convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE International Conference on computer vision (ICCV), pp. 2938–2946 (2015)

  24. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on computer vision and pattern Recognition (CVPR), pp. 2462–2470 (2017)

  25. CS Kumar, A., Bhandarkar, S.M., Prasad, M.: Depthnet: A recurrent neural network architecture for monocular depth prediction. In: Proceedings of the IEEE Conference on computer vision and pattern recognition workshops (CVPRW), pp. 283–291 (2018)

  26. Costante G, Mancini M, Valigi P, Ciarfuglia TA (2015) Exploring representation learning with cnns for frame-to-frame ego-motion estimation. IEEE Robot Autom Lett 1(1):18–25. https://doi.org/10.1109/LRA.2015.2505717

    Article  Google Scholar 

  27. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Proceedings of the European Conference on computer vision (ECCV), pp. 25–36 (2004)

  28. Li, X, Hou, Y, Wang, P, Gao, Z, Xu, M, Li, W Transformer guided geometry model for flow-based unsupervised visual odometry. Neural Comput Appl., 1–12 (2021). https://doi.org/10.1007/s00521-020-05545-8

  29. Muller, P., Savakis, A.: Flowdometry: An optical flow and deep learning based approach to visual odometry. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 624–631 (2017)

  30. Zhao B, Huang Y, Wei H, Hu X (2021) Ego-motion estimation using recurrent convolutional neural networks through optical flow learning. Electronics 10(3):222. https://doi.org/10.3390/electronics10030222

    Article  Google Scholar 

  31. Pandey T, Pena D, Byrne J, Moloney D (2021) Leveraging deep learning for visual odometry using optical flow. Sensors 21(4):1313. https://doi.org/10.3390/s21041313

    Article  Google Scholar 

  32. Sun, D., Yang, X., Liu, M.-Y., Kautz, J.: Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp. 8934–8943 (2018)

  33. Hui, T.-W., Tang, X., Loy, C.C.: Liteflownet: A lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp. 8981–8989 (2018)

  34. Saputra, M.R.U., Gusmão, P.P.B.D., Almalioglu, Y., Markham, A., Trigoni, A.: Distilling knowledge from a deep pose regressor network, pp. 263–272 (2019)

  35. Wang X, Zhang H (2020) Deep monocular visual odometry for ground vehicle. IEEE Access 8:175220–175229. https://doi.org/10.1109/ACCESS.2020.3025557

    Article  Google Scholar 

  36. Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: Proceedings of the Neural Information Processing Systems (NIPS) (2017)

  37. Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp. 6555–6564 (2017)

  38. Woo, S., Park, J., Lee, J.-Y., Kweon, I.-S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on computer vision (ECCV) (2018)

  39. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  40. Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: Dense 3d reconstruction in real-time. In: Proceedings of the Intelligent Vehicles Symposium (IV), pp. 963–968 (2011)

  41. Saputra, M.R.U., Gusmão, P.P.B.D., Wang, S., Markham, A., Trigoni, A.: Learning monocular visual odometry through geometry-aware curriculum learning, pp. 3549–3555 (2019)

  42. Liu, Y., Wang, H., Wang, J., Wang, X.: Unsupervised monocular visual odometry based on confidence evaluation. IEEE trans Intell Transp Syst, 1–10 (2021). https://doi.org/10.1109/TITS.2021.3053412

  43. Zhou, T., Brown, M.A., Snavely, N., Lowe, D.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6612–6619 (2017)

  44. Yin, Z., Shi, J.: GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose. In: Proceedings of the IEEE Conference on Computer vision and pattern recognition (CVPR), pp. 1983–1992 (2018)

  45. Bian, J.-W., Zhan, H., Wang, N., Li, Z., Zhang, L., Shen, C., Cheng, M.-M., Reid, I.: Unsupervised scale-consistent depth learning from video. Int J Comput Vis., 1–17 (2021). https://doi.org/10.1007/s11263-021-01484-6

  46. Blanco-Claraco J-L, Moreno-Duenas F-A, González-Jiménez J (2014) The málaga urban dataset: High-rate stereo and lidar in a realistic urban scenario. Int J Rob Res 33(2):207–214. https://doi.org/10.1177/0278364913507326

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the sponsors who provided YUTP Grant (015LC0-243) for this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irraivan Elamvazuthi.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gadipudi, N., Elamvazuthi, I., Lu, CK. et al. Lightweight spatial attentive network for vehicular visual odometry estimation in urban environments. Neural Comput & Applic 34, 18823–18836 (2022). https://doi.org/10.1007/s00521-022-07484-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07484-y

Keywords

Navigation