Unsupervised learning to detect loops using deep neural networks for visual SLAM system

Gao, Xiang; Zhang, Tao

doi:10.1007/s10514-015-9516-2

Unsupervised learning to detect loops using deep neural networks for visual SLAM system

Published: 11 December 2015

Volume 41, pages 1–18, (2017)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Xiang Gao¹ &
Tao Zhang¹

9462 Accesses
133 Citations
6 Altmetric
Explore all metrics

Abstract

This paper is concerned of the loop closure detection problem for visual simultaneous localization and mapping systems. We propose a novel approach based on the stacked denoising auto-encoder (SDA), a multi-layer neural network that autonomously learns an compressed representation from the raw input data in an unsupervised way. Different with the traditional bag-of-words based methods, the deep network has the ability to learn the complex inner structures in image data, while no longer needs to manually design the visual features. Our approach employs the characteristics of the SDA to solve the loop detection problem. The workflow of training the network, utilizing the features and computing the similarity score is presented. The performance of SDA is evaluated by a comparison study with Fab-map 2.0 using data from open datasets and physical robots. The results show that SDA is feasible for detecting loops at a satisfactory precision and can therefore provide an alternative way for visual SLAM systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, M., Konolige, K., & Blas, M. (2008). Censure: Center surround extremas for realtime feature detection and matching. In D. Forsyth, P. Torr, & A. Zisserman (Eds.), Computer vision–ECCV 2008. Lecture Notes in Computer Science (Vol. 5305, pp. 102–115). Berlin Heidelberg: Springer.
Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., Bouchard, N., Warde-Farley, D., & Bengio, Y. (2012). Theano: New features and speed improvements, arXiv preprint arXiv:1211.5590.
Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. In Computer Vision–ECCV 2006 (pp. 404–417). New York: Springer.
Beeson, P., Modayil, J., & Kuipers, B. (2010). Factoring the mapping problem: Mobile robot map-building in the hybrid spatial semantic hierarchy. International Journal of Robotics Research, 29,(4), 428–459. Times Cited: 16 Beeson, Patrick Modayil, Joseph Kuipers, Benjamin 16.
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.
Article Google Scholar
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., & Bengio, Y. (2010). Theano: A CPU and GPU math expression compiler. In Proceedings of the python for scientific computing conference (SciPy), Oral Presentation.
Boal, J., Sánchez-Miralles, Á., & Arranz, Á. (2014). Topological simultaneous localization and mapping: A survey. Robotica, 32, 803–821.
Article Google Scholar
Bo, L., Ren, X., & Fox, D. (2014). Learning hierarchical sparse features for RGB-D object recognition. International Journal of Robotics Research, 33(4), 581–599.
Article Google Scholar
Bourlard, H., & Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59(4–5), 291–294.
Article MATH MathSciNet Google Scholar
Bradski, G. (2000). The opencv library. Doctor Dobbs Journal, 25(11), 120–126.
Google Scholar
Cadena, C., Galvez-Lopez, D., Tardos, J. D., & Neira, J. (2012). Robust place recognition with stereo sequences. IEEE Transactions on Robotics, 28(4), 871–885.
Article Google Scholar
Chen, Z., Samarabandu, J., & Rodrigo, R. (2007). Recent advances in simultaneous localization and map-building using computer vision. Advanced Robotics, 21(3–4), 233–265.
Article Google Scholar
Cummins, M., & Newman, P. (2008). Fab-map: Probabilistic localization and mapping in the space of appearance. The International Journal of Robotics Research, 27(6), 647–665.
Article Google Scholar
Cummins, M., & Newman, P. (2011). Appearance-only slam at large scale with fab-map 2.0. International Journal of Robotics Research, 30(9), 1100–1123.
Article Google Scholar
Davison, A., Reid, I., Molton, N., & Stasse, O. (2007). Monoslam: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 1052–1067.
Article Google Scholar
de la Puente, P., & Rodriguez-Losada, D. (2014). Feature based graph-slam in structured environments. Autonomous Robots, 37(3), 243–260.
Article Google Scholar
Deng, L., Li, J., Huang, J.-T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., & Williams, J. et al. (2013). Recent advances in deep learning for speech research at microsoft. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 8604–8608), IEEE.
Dudek, G., & Jugessur, D. (2000). Robust place recognition using local appearance based methods. In Proceedings of ICRA’00 IEEE international conference on robotics and automation (Vol. 2, pp. 1030–1035), IEEE.
Endres, F., Hess, J., Sturm, J., Cremers, D., & Burgard, W. (2014). 3-d mapping with an rgb-d camera. IEEE Transactions on Robotics, 30(1), 177–187.
Article Google Scholar
Filliat, D. (2007). A visual bag of words method for interactive qualitative localization and mapping. In 2007 IEEE international conference on robotics and automation (ICRA) (pp. 3921–3926), IEEE.
Gao, X., & Zhang, T. (2015). Loop closure detection for visual slam systems using deep neural networks. In The 34th Chinese control conference, (Hangzhou, Zhejiang Province), technical committee on control theory (TCCT) of Chinese Association of Automation (CAA). Accepted July 2015.
Gao, X., & Zhang, T. (2015). Robust rgb-d simultaneous localization and mapping using planar point features. Robotics and Autonomous Systems, 72, 1–14.
Article Google Scholar
Gil, A., Mozos, O. M., Ballesta, M., & Reinoso, O. (2010). A comparative evaluation of interest point detectors and local descriptors for visual slam. Machine Vision and Applications, 21(6), 905–920.
Article Google Scholar
Hahnel, D., Burgard, W., Fox, D., & Thrun, S. (2003). An efficient fastslam algorithm for generating maps of large-scale cyclic environments from raw laser range measurements. In Proceedings of 2003 IEEE/RSJ international conference on intelligent robots and systems, (IROS 2003) (Vol. 1, pp. 206–211), IEEE.
Henry, P., Krainin, M., Herbst, E., Ren, X., & Fox, D. (2012). Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments. The International Journal of Robotics Research, 31(5), 647–663.
Article Google Scholar
Ho, K., & Newman, P. (2007). Detecting loop closure with scene sequences. International Journal of Computer Vision, 74(3), 261–286.
Article Google Scholar
Konolige, K., & Agrawal, M. (2008). Frameslam: From bundle adjustment to real-time visual mapping. IEEE Transactions on Robotics, 24(5), 1066–1077.
Article Google Scholar
Kostavelis, I., & Gasteratos, A. (2013). Learning spatially semantic representations for cognitive robot navigation. Robotics and Autonomous Systems, 61(12), 1460–1475.
Article Google Scholar
Kummerle, R., Grisetti, G., Strasdat, H., Konolige, K., & Burgard, W. (2011). G2o: A general framework for graph optimization. In IEEE international conference on robotics and automation (ICRA) (pp. 3607–3613), IEEE.
Kwon, H., Yousef, K. M. A., & Kak, A. C. (2013). Building 3d visual maps of interior space with a new hierarchical sensor fusion architecture. Robotics and Autonomous Systems, 61(8), 749–767.
Article Google Scholar
Labbe, M., & Michaud, F. (2013). Appearance-based loop closure detection for online large-scale and long-term operation. IEEE Transactions on Robotics, 29(3), 734–745.
Article Google Scholar
Latif, Y., Cadena, C., & Neira, J. (2013). Robust loop closing over time for pose graph slam. The International Journal of Robotics Research, 32(14), 1611–1626.
Article Google Scholar
Lepetit, V., & Fua, P. (2006). Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1465–1479.
Article Google Scholar
Liou, C.-Y., Cheng, W.-C., Liou, J.-W., & Liou, D.-R. (2014). Autoencoder for words. Neurocomputing, 139, 84–96.
Article Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Lu, X., Tsao, Y., Matsuda, S., & Hori, C. (2013). Speech enhancement based on deep denoising autoencoder. In INTERSPEECH (pp. 436–440).
Mesnil, G., Dauphin, Y., Glorot, X., Rifai, S., Bengio, Y., Goodfellow, I. J., et al. (2012). Unsupervised and transfer learning challenge: A deep learning approach. ICML Unsupervised and Transfer Learning, 27, 97–110.
Google Scholar
Morell-Gimenez, V., Saval-Calvo, M., Azorin-Lopez, J., Garcia-Rodriguez, J., Cazorla, M., Orts-Escolano, S., et al. (2014). A comparative study of registration methods for RGB-D video of static scenes. Sensors, 14(5), 8547–8576.
Article Google Scholar
Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP, 1, 331–340.
Google Scholar
Ng, A. (2011). Sparse autoencoder. CS294A Lecture notes (Vol. 72).
Poultney, C., Chopra, S., & Cun, Y. L. et al. (2006). Efficient learning of sparse representations with an energy-based model. In Advances in neural information processing systems (pp. 1137–1144).
Ren, X., Bo, L., & Fox, D. (2012). RGB-(D) scene labeling: Features and algorithms. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2759–2766), IEEE.
Rosten, E., & Drummond, T. (2006). Machine learning for high-speed corner detection. In Computer vision–ECCV 2006 (pp. 430–443). New York: Springer.
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). Orb: An efficient alternative to sift or surf. In 2011 IEEE international conference on computer vision (ICCV) (pp. 2564–2571), IEEE.
Salas-Moreno, R. F., Newcombe, R. A., Strasdat, H., Kelly, P. H. J., & Davison, A. J. (2013). Slam++: Simultaneous localisation and mapping at the level of objects. 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1352–1359).
Shi, Z., Liu, Z., Wu, X., & Xu, W. (2013). Feature selection for reliable data association in visual slam. Machine Vision and Applications, 24(4), 667–682.
Article Google Scholar
Strasdat, H., Montiel, J. M., & Davison, A. J. (2012). Visual slam: Why filter? Image and Vision Computing, 30(2), 65–77.
Article Google Scholar
Stuckler, J., & Behnke, S. (2014). Multi-resolution surfel maps for efficient dense 3d modeling and tracking. Journal of Visual Communication and Image Representation, 25(1), 137–147.
Article Google Scholar
Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of rgb-d SLAM systems. In 2012 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 573–580), IEEE.
Tian, B., Shim, V. A., Yuan, M., Srinivasan, C., Tang, H., & Li, H. (2013). Rgb-d based cognitive map building and navigation. In 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1562–1567), IEEE.
Ulrich, I., & Nourbakhsh, I. (2000). Appearance-based place recognition for topological localization In Proceedings of ICRA’00 IEEE international conference on robotics and automation (Vol. 2, pp. 1023–1029), IEEE.
Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on machine learning (pp. 1096–1103), ACM.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11, 3371–3408.
MATH MathSciNet Google Scholar
Wang, N., & Yeung, D.-Y. (2013). Learning a deep compact image representation for visual tracking. In Advances in neural information processing systems (pp. 809–817).
Wang, Y.-T., & Lin, G.-Y. (2014). Improvement of speeded-up robust features for robot visual simultaneous localization and mapping. Robotica, 32, 533–549.
Article Google Scholar
Williams, B., Klein, G., & Reid, I. (2011). Automatic relocalization and loop closing for real-time monocular SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1699–1712.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation, Tsinghua University, Beijing, 100084, China
Xiang Gao & Tao Zhang

Authors

Xiang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, X., Zhang, T. Unsupervised learning to detect loops using deep neural networks for visual SLAM system. Auton Robot 41, 1–18 (2017). https://doi.org/10.1007/s10514-015-9516-2

Download citation

Received: 22 May 2015
Accepted: 15 October 2015
Published: 11 December 2015
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10514-015-9516-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised learning to detect loops using deep neural networks for visual SLAM system

Abstract

Access this article

Similar content being viewed by others

Matching-range-constrained real-time loop closure detection with CNNs features

A Robust and Efficient SLAM System in Dynamic Environment Based on Deep Features

Towards Loop Closure Detection for SLAM Applications Using Bag of Visual Features: Experiments and Simulation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Matching-range-constrained real-time loop closure detection with CNNs features

A Robust and Efficient SLAM System in Dynamic Environment Based on Deep Features

Towards Loop Closure Detection for SLAM Applications Using Bag of Visual Features: Experiments and Simulation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation