Abstract
Visual odometry is the process of estimating the motion between two consecutive images. Traditional visual odometry algorithms require the careful fabrication of state-of-the-art building blocks based on geometry. These algorithms are highly sensitive to noise, and performance degradation of a single subprocess compromises the performance of the entire system. On the other hand, learning-based methods automatically learn the features required through motion mapping. However, current learning-based methods are computationally expensive and require a significant amount of time to estimate the pose from a video sequence. This method proposes a lightweight deep neural networks architecture to estimate the odometry by exploiting the refined features through spatial attention. Three different training and test splits of the KITTI benchmark are used to effectively evaluate the proposed approach. The execution time of the proposed approach is \(\sim\)1 ms, speeded up by 47 times over [1]. Performed experiments demonstrate the promising performance of the proposed method to the methods used in the comparison.
Similar content being viewed by others
References
Wang S, Clark R, Wen H, Trigoni A (2018) End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int J Rob Res 37:513–542. https://doi.org/10.1177/0278364917734298
Yousif K, Bab-Hadiashar A, Hoseinnezhad R (2015) An overview to visual odometry and visual slam: applications to mobile robotics. Intell Indus Syst 1(4):289–311. https://doi.org/10.1007/s40903-015-0032-7
Zhai M, Xiang X (2021) Geometry understanding from autonomous driving scenarios based on feature refinement. Neural Comput Appl 33(8):3209–3220. https://doi.org/10.1007/s00521-020-05192-z
Liu K, Li Q, Qiu G (2020) Posegan: a pose-to-image translation framework for camera localization. ISPRS J Photogramm Remote Sens 166:308–315. https://doi.org/10.1016/j.isprsjprs.2020.06.010
Klein, G., Murray, D.: Parallel tracking and mapping for small ar workspaces. In: Proceedings of the IEEE and ACM International symposium on mixed and augmented reality, pp. 225–234 (2007)
Davison AJ, Reid ID, Molton ND, Stasse O (2007) Monoslam: real-time single camera slam. IEEE Trans Pattern Anal Mach Intell 29(6):1052–1067. https://doi.org/10.1109/TPAMI.2007.1049
Mur-Artal R, Montiel JMM, Tardos JD (2015) ORB-SLAM: a versatile and accurate monocular slam system. IEEE Trans Robot 31(5):1147–1163. https://doi.org/10.1109/TRO.2015.2463671
Cao MW, Jia W, Zhao Y, Li SJ, Liu XP (2018) Fast and robust absolute camera pose estimation with known focal length. Neural Comput Appl 29(5):1383–1398. https://doi.org/10.1007/s00521-017-3032-6
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: Dtam: Dense tracking and mapping in real-time. In: Proceedings of the IEEE International conference on computer vision (ICCV), pp. 2320–2327 (2011)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp. 3354–3361 (2012)
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014
Muja, M., Lowe, D.G.: Fast matching of binary features. In: Proceedings of the IEEE Conference on Computer and Robot Vision, pp. 404–410 (2012)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: An efficient alternative to sift or surf. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571 (2011)
Pumarola, A., Vakhitov, A., Agudo, A., Sanfeliu, A., Moreno-Noguer, F.: Pl-slam: Real-time monocular visual slam with points and lines. In: Proceedings of the IEEE International Conference on robotics and automation (ICRA), pp. 4503–4508 (2017)
McCormac, J., Clark, R., Bloesch, M., Davison, A., Leutenegger, S.: Fusion++: Volumetric object-level slam. In: Proceedings of the IEEE International Conference on 3D Vision (3DV), pp. 32–41 (2018)
Herrera, D.C., Kim, K., Kannala, J., Pulli, K., Heikkilä, J.: Dt-slam: Deferred triangulation for robust slam. In: Proceedings of the IEEE International Conference on 3D Vision (3DV), vol. 1, pp. 609–616 (2014)
Engel, J., Schöps, T., Cremers, D.: Lsd-slam: Large-scale direct monocular slam. In: Proceedings of the European Conference on computer vision (ECCV), pp. 834–849 (2014)
Forster, C., Pizzoli, M., Scaramuzza, D.: Svo: Fast semi-direct monocular visual odometry. In: Proceedings of the IEEE International Conference on robotics and automation (ICRA), pp. 15–22 (2014)
Engel J, Koltun V, Cremers D (2017) Direct sparse odometry. IEEE Trans Pattern Anal Mach Intell 40(3):611–625. https://doi.org/10.1109/TPAMI.2017.2658577
Zubizarreta J, Aguinaga I, Montiel JMM (2020) Direct sparse mapping. IEEE Trans Robot 36(4):1363–1370. https://doi.org/10.1109/TRO.2020.2991614
Roberts, R., Nguyen, H., Krishnamurthi, N., Balch, T.: Memory-based learning for visual odometry. In: Proceedings of the IEEE International Conference on robotics and automation (ICRA), pp. 47–52 (2008)
Guizilini V, Ramos F (2013) Semi-parametric learning for visual odometry. Int J Rob Res 32(5):526–546. https://doi.org/10.1177/0278364912472245
Kendall, A., Grimes, M., Cipolla, R.: Posenet: A convolutional network for real-time 6-dof camera relocalization. In: Proceedings of the IEEE International Conference on computer vision (ICCV), pp. 2938–2946 (2015)
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on computer vision and pattern Recognition (CVPR), pp. 2462–2470 (2017)
CS Kumar, A., Bhandarkar, S.M., Prasad, M.: Depthnet: A recurrent neural network architecture for monocular depth prediction. In: Proceedings of the IEEE Conference on computer vision and pattern recognition workshops (CVPRW), pp. 283–291 (2018)
Costante G, Mancini M, Valigi P, Ciarfuglia TA (2015) Exploring representation learning with cnns for frame-to-frame ego-motion estimation. IEEE Robot Autom Lett 1(1):18–25. https://doi.org/10.1109/LRA.2015.2505717
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Proceedings of the European Conference on computer vision (ECCV), pp. 25–36 (2004)
Li, X, Hou, Y, Wang, P, Gao, Z, Xu, M, Li, W Transformer guided geometry model for flow-based unsupervised visual odometry. Neural Comput Appl., 1–12 (2021). https://doi.org/10.1007/s00521-020-05545-8
Muller, P., Savakis, A.: Flowdometry: An optical flow and deep learning based approach to visual odometry. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 624–631 (2017)
Zhao B, Huang Y, Wei H, Hu X (2021) Ego-motion estimation using recurrent convolutional neural networks through optical flow learning. Electronics 10(3):222. https://doi.org/10.3390/electronics10030222
Pandey T, Pena D, Byrne J, Moloney D (2021) Leveraging deep learning for visual odometry using optical flow. Sensors 21(4):1313. https://doi.org/10.3390/s21041313
Sun, D., Yang, X., Liu, M.-Y., Kautz, J.: Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp. 8934–8943 (2018)
Hui, T.-W., Tang, X., Loy, C.C.: Liteflownet: A lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp. 8981–8989 (2018)
Saputra, M.R.U., Gusmão, P.P.B.D., Almalioglu, Y., Markham, A., Trigoni, A.: Distilling knowledge from a deep pose regressor network, pp. 263–272 (2019)
Wang X, Zhang H (2020) Deep monocular visual odometry for ground vehicle. IEEE Access 8:175220–175229. https://doi.org/10.1109/ACCESS.2020.3025557
Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: Proceedings of the Neural Information Processing Systems (NIPS) (2017)
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp. 6555–6564 (2017)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.-S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on computer vision (ECCV) (2018)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: Dense 3d reconstruction in real-time. In: Proceedings of the Intelligent Vehicles Symposium (IV), pp. 963–968 (2011)
Saputra, M.R.U., Gusmão, P.P.B.D., Wang, S., Markham, A., Trigoni, A.: Learning monocular visual odometry through geometry-aware curriculum learning, pp. 3549–3555 (2019)
Liu, Y., Wang, H., Wang, J., Wang, X.: Unsupervised monocular visual odometry based on confidence evaluation. IEEE trans Intell Transp Syst, 1–10 (2021). https://doi.org/10.1109/TITS.2021.3053412
Zhou, T., Brown, M.A., Snavely, N., Lowe, D.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6612–6619 (2017)
Yin, Z., Shi, J.: GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose. In: Proceedings of the IEEE Conference on Computer vision and pattern recognition (CVPR), pp. 1983–1992 (2018)
Bian, J.-W., Zhan, H., Wang, N., Li, Z., Zhang, L., Shen, C., Cheng, M.-M., Reid, I.: Unsupervised scale-consistent depth learning from video. Int J Comput Vis., 1–17 (2021). https://doi.org/10.1007/s11263-021-01484-6
Blanco-Claraco J-L, Moreno-Duenas F-A, González-Jiménez J (2014) The málaga urban dataset: High-rate stereo and lidar in a realistic urban scenario. Int J Rob Res 33(2):207–214. https://doi.org/10.1177/0278364913507326
Acknowledgements
The authors are grateful to the sponsors who provided YUTP Grant (015LC0-243) for this project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gadipudi, N., Elamvazuthi, I., Lu, CK. et al. Lightweight spatial attentive network for vehicular visual odometry estimation in urban environments. Neural Comput & Applic 34, 18823–18836 (2022). https://doi.org/10.1007/s00521-022-07484-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07484-y