A Semi-supervised Learning Based on Variational Autoencoder for Visual-Based Robot Localization

Liang, Kaiyun; He, Fazhi; Zhu, Yuanyuan; Gao, Xiaoxin

doi:10.1007/978-981-19-4546-5_48

Kaiyun Liang¹²,
Fazhi He¹²,
Yuanyuan Zhu¹² &
…
Xiaoxin Gao¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1491))

Included in the following conference series:

CCF Conference on Computer Supported Cooperative Work and Social Computing

738 Accesses

Abstract

Robot localization, the task of determining the current pose of a robot, is a crucial problem of mobile robotic. Visual-based robot localization, which using only cameras as exteroceptive sensors, has become extremely popular due to the relatively cheap cost of cameras. However, current approaches such as Bayes Filter based methods and Visual Odometry need knowledge of prior location and also rely on the feature points in images. This paper presents a novel semi-supervised learning method based on Variational Autoencoder (VAE) for visual-based robot localization, which does not rely on the prior location and feature points. Because our method does not need prior knowledge, it also can be used as a correction of dead reckoning. We adopt VAE as an unsupervised learning method to preprocess the environment images, followed by a supervised learning model to learn the mapping between the robot’s location and processed images. Therefore, one merit of the proposed approach is that it can adopt any state-of-the-art supervised learning models. Furthermore, this semi-supervised learning scheme is also suitable for improving other supervised learning problems by adding extra unlabeled data to the training data set to solve the problem in a semi-supervised manner. We show that this semi-supervised learning scheme can get a high accuracy for pose prediction using a surprisingly small number of labeled images compared to other machine learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bernstein, A.V., Kuleshov, A.P.: Manifold learning: generalization ability and tangent proximity. Int. J. Softw. Inform. 7(3) (2013)
Google Scholar
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2017)
Article Google Scholar
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_54
Chapter Google Scholar
Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference On Robotics and Automation (ICRA), pp. 15–22. IEEE (2014)
Google Scholar
Hashimoto, S., Namihira, K.: Self-localization from a 360-\(\circ \) camera based on the deep neural network. In: Arai, K., Kapoor, S. (eds.) CVC 2019. AISC, vol. 943, pp. 145–158. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-17795-9_11
Chapter Google Scholar
Huang, H., Li, Z., He, R., Sun, Z., Tan, T.: Introvae: Introspective variational autoencoders for photographic image synthesis. arXiv preprint arXiv:1807.06358 (2018)
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)
Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991)
Article Google Scholar
Kuleshov, A., Bernstein, A., Burnaev, E., Yanovich, Y.: Machine learning in appearance-based robot self-localization. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 106–112. IEEE (2017)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 1–27 (2008)
MATH Google Scholar
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Article Google Scholar
Pu, Y., et al.: Variational autoencoder for deep learning of images, labels and captions. arXiv preprint arXiv:1609.08976 (2016)
Razavi, A., Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. arXiv preprint arXiv:1906.00446 (2019)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)
Google Scholar
Scaramuzza, D., Fraundorfer, F.: Visual odometry [tutorial]. IEEE Robot. Autom. Mag. 18(4), 80–92 (2011)
Article Google Scholar
Thrun, S., Wolfram Burgard, D.F.: Probabilistic Robotics. The MIT Press, Cambridge (2005)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Thrun, S., Fox, D., Burgard, W., Dellaert, F.: Robust monte Carlo localization for mobile robots. Artif. Intell. 128(1–2), 99–141 (2001)
Article Google Scholar
Zhang, Z., Rebecq, H., Forster, C., Scaramuzza, D.: Benefit of large field-of-view cameras for visual odometry. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 801–808. IEEE (2016)
Google Scholar
Zhao, T., Zhao, R., Eskenazi, M.: Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. arXiv preprint arXiv:1703.10960 (2017)

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Wuhan University, Wuhan, Hubei, China
Kaiyun Liang, Fazhi He, Yuanyuan Zhu & Xiaoxin Gao

Authors

Kaiyun Liang
View author publications
You can also search for this author in PubMed Google Scholar
Fazhi He
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyuan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxin Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fazhi He .

Editor information

Editors and Affiliations

Shandong University, Jinan, China
Yuqing Sun
Fudan University, Shanghai, China
Tun Lu
Hunan University of Science and Technology, Xiangtan, China
Buqing Cao
Tongji University, Shanghai, China
Hongfei Fan
Guangdong University of Technology, Guangzhou, China
Dongning Liu
University of Warwick, Coventry, UK
Bowen Du
University of Shanghai for Science and Technology, Shanghai, China
Liping Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, K., He, F., Zhu, Y., Gao, X. (2022). A Semi-supervised Learning Based on Variational Autoencoder for Visual-Based Robot Localization. In: Sun, Y., et al. Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2021. Communications in Computer and Information Science, vol 1491. Springer, Singapore. https://doi.org/10.1007/978-981-19-4546-5_48

Download citation

DOI: https://doi.org/10.1007/978-981-19-4546-5_48
Published: 20 July 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4545-8
Online ISBN: 978-981-19-4546-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

A Semi-supervised Learning Based on Variational Autoencoder for Visual-Based Robot Localization