Skip to main content
Log in

Unsupervised learning to detect loops using deep neural networks for visual SLAM system

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

This paper is concerned of the loop closure detection problem for visual simultaneous localization and mapping systems. We propose a novel approach based on the stacked denoising auto-encoder (SDA), a multi-layer neural network that autonomously learns an compressed representation from the raw input data in an unsupervised way. Different with the traditional bag-of-words based methods, the deep network has the ability to learn the complex inner structures in image data, while no longer needs to manually design the visual features. Our approach employs the characteristics of the SDA to solve the loop detection problem. The workflow of training the network, utilizing the features and computing the similarity score is presented. The performance of SDA is evaluated by a comparison study with Fab-map 2.0 using data from open datasets and physical robots. The results show that SDA is feasible for detecting loops at a satisfactory precision and can therefore provide an alternative way for visual SLAM systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  • Agrawal, M., Konolige, K., & Blas, M. (2008). Censure: Center surround extremas for realtime feature detection and matching. In D. Forsyth, P. Torr, & A. Zisserman (Eds.), Computer vision–ECCV 2008. Lecture Notes in Computer Science (Vol. 5305, pp. 102–115). Berlin Heidelberg: Springer.

  • Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., Bouchard, N., Warde-Farley, D., & Bengio, Y. (2012). Theano: New features and speed improvements, arXiv preprint arXiv:1211.5590.

  • Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. In Computer Vision–ECCV 2006 (pp. 404–417). New York: Springer.

  • Beeson, P., Modayil, J., & Kuipers, B. (2010). Factoring the mapping problem: Mobile robot map-building in the hybrid spatial semantic hierarchy. International Journal of Robotics Research, 29,(4), 428–459. Times Cited: 16 Beeson, Patrick Modayil, Joseph Kuipers, Benjamin 16.

  • Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.

    Article  Google Scholar 

  • Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., & Bengio, Y. (2010). Theano: A CPU and GPU math expression compiler. In Proceedings of the python for scientific computing conference (SciPy), Oral Presentation.

  • Boal, J., Sánchez-Miralles, Á., & Arranz, Á. (2014). Topological simultaneous localization and mapping: A survey. Robotica, 32, 803–821.

    Article  Google Scholar 

  • Bo, L., Ren, X., & Fox, D. (2014). Learning hierarchical sparse features for RGB-D object recognition. International Journal of Robotics Research, 33(4), 581–599.

    Article  Google Scholar 

  • Bourlard, H., & Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59(4–5), 291–294.

    Article  MATH  MathSciNet  Google Scholar 

  • Bradski, G. (2000). The opencv library. Doctor Dobbs Journal, 25(11), 120–126.

    Google Scholar 

  • Cadena, C., Galvez-Lopez, D., Tardos, J. D., & Neira, J. (2012). Robust place recognition with stereo sequences. IEEE Transactions on Robotics, 28(4), 871–885.

    Article  Google Scholar 

  • Chen, Z., Samarabandu, J., & Rodrigo, R. (2007). Recent advances in simultaneous localization and map-building using computer vision. Advanced Robotics, 21(3–4), 233–265.

    Article  Google Scholar 

  • Cummins, M., & Newman, P. (2008). Fab-map: Probabilistic localization and mapping in the space of appearance. The International Journal of Robotics Research, 27(6), 647–665.

    Article  Google Scholar 

  • Cummins, M., & Newman, P. (2011). Appearance-only slam at large scale with fab-map 2.0. International Journal of Robotics Research, 30(9), 1100–1123.

    Article  Google Scholar 

  • Davison, A., Reid, I., Molton, N., & Stasse, O. (2007). Monoslam: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 1052–1067.

    Article  Google Scholar 

  • de la Puente, P., & Rodriguez-Losada, D. (2014). Feature based graph-slam in structured environments. Autonomous Robots, 37(3), 243–260.

    Article  Google Scholar 

  • Deng, L., Li, J., Huang, J.-T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., & Williams, J. et al. (2013). Recent advances in deep learning for speech research at microsoft. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 8604–8608), IEEE.

  • Dudek, G., & Jugessur, D. (2000). Robust place recognition using local appearance based methods. In Proceedings of ICRA’00 IEEE international conference on robotics and automation (Vol. 2, pp. 1030–1035), IEEE.

  • Endres, F., Hess, J., Sturm, J., Cremers, D., & Burgard, W. (2014). 3-d mapping with an rgb-d camera. IEEE Transactions on Robotics, 30(1), 177–187.

    Article  Google Scholar 

  • Filliat, D. (2007). A visual bag of words method for interactive qualitative localization and mapping. In 2007 IEEE international conference on robotics and automation (ICRA) (pp. 3921–3926), IEEE.

  • Gao, X., & Zhang, T. (2015). Loop closure detection for visual slam systems using deep neural networks. In The 34th Chinese control conference, (Hangzhou, Zhejiang Province), technical committee on control theory (TCCT) of Chinese Association of Automation (CAA). Accepted July 2015.

  • Gao, X., & Zhang, T. (2015). Robust rgb-d simultaneous localization and mapping using planar point features. Robotics and Autonomous Systems, 72, 1–14.

    Article  Google Scholar 

  • Gil, A., Mozos, O. M., Ballesta, M., & Reinoso, O. (2010). A comparative evaluation of interest point detectors and local descriptors for visual slam. Machine Vision and Applications, 21(6), 905–920.

    Article  Google Scholar 

  • Hahnel, D., Burgard, W., Fox, D., & Thrun, S. (2003). An efficient fastslam algorithm for generating maps of large-scale cyclic environments from raw laser range measurements. In Proceedings of 2003 IEEE/RSJ international conference on intelligent robots and systems, (IROS 2003) (Vol. 1, pp. 206–211), IEEE.

  • Henry, P., Krainin, M., Herbst, E., Ren, X., & Fox, D. (2012). Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments. The International Journal of Robotics Research, 31(5), 647–663.

    Article  Google Scholar 

  • Ho, K., & Newman, P. (2007). Detecting loop closure with scene sequences. International Journal of Computer Vision, 74(3), 261–286.

    Article  Google Scholar 

  • Konolige, K., & Agrawal, M. (2008). Frameslam: From bundle adjustment to real-time visual mapping. IEEE Transactions on Robotics, 24(5), 1066–1077.

    Article  Google Scholar 

  • Kostavelis, I., & Gasteratos, A. (2013). Learning spatially semantic representations for cognitive robot navigation. Robotics and Autonomous Systems, 61(12), 1460–1475.

    Article  Google Scholar 

  • Kummerle, R., Grisetti, G., Strasdat, H., Konolige, K., & Burgard, W. (2011). G2o: A general framework for graph optimization. In IEEE international conference on robotics and automation (ICRA) (pp. 3607–3613), IEEE.

  • Kwon, H., Yousef, K. M. A., & Kak, A. C. (2013). Building 3d visual maps of interior space with a new hierarchical sensor fusion architecture. Robotics and Autonomous Systems, 61(8), 749–767.

    Article  Google Scholar 

  • Labbe, M., & Michaud, F. (2013). Appearance-based loop closure detection for online large-scale and long-term operation. IEEE Transactions on Robotics, 29(3), 734–745.

    Article  Google Scholar 

  • Latif, Y., Cadena, C., & Neira, J. (2013). Robust loop closing over time for pose graph slam. The International Journal of Robotics Research, 32(14), 1611–1626.

    Article  Google Scholar 

  • Lepetit, V., & Fua, P. (2006). Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1465–1479.

    Article  Google Scholar 

  • Liou, C.-Y., Cheng, W.-C., Liou, J.-W., & Liou, D.-R. (2014). Autoencoder for words. Neurocomputing, 139, 84–96.

    Article  Google Scholar 

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Lu, X., Tsao, Y., Matsuda, S., & Hori, C. (2013). Speech enhancement based on deep denoising autoencoder. In INTERSPEECH (pp. 436–440).

  • Mesnil, G., Dauphin, Y., Glorot, X., Rifai, S., Bengio, Y., Goodfellow, I. J., et al. (2012). Unsupervised and transfer learning challenge: A deep learning approach. ICML Unsupervised and Transfer Learning, 27, 97–110.

    Google Scholar 

  • Morell-Gimenez, V., Saval-Calvo, M., Azorin-Lopez, J., Garcia-Rodriguez, J., Cazorla, M., Orts-Escolano, S., et al. (2014). A comparative study of registration methods for RGB-D video of static scenes. Sensors, 14(5), 8547–8576.

    Article  Google Scholar 

  • Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP, 1, 331–340.

    Google Scholar 

  • Ng, A. (2011). Sparse autoencoder. CS294A Lecture notes (Vol. 72).

  • Poultney, C., Chopra, S., & Cun, Y. L. et al. (2006). Efficient learning of sparse representations with an energy-based model. In Advances in neural information processing systems (pp. 1137–1144).

  • Ren, X., Bo, L., & Fox, D. (2012). RGB-(D) scene labeling: Features and algorithms. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2759–2766), IEEE.

  • Rosten, E., & Drummond, T. (2006). Machine learning for high-speed corner detection. In Computer vision–ECCV 2006 (pp. 430–443). New York: Springer.

  • Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). Orb: An efficient alternative to sift or surf. In 2011 IEEE international conference on computer vision (ICCV) (pp. 2564–2571), IEEE.

  • Salas-Moreno, R. F., Newcombe, R. A., Strasdat, H., Kelly, P. H. J., & Davison, A. J. (2013). Slam++: Simultaneous localisation and mapping at the level of objects. 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1352–1359).

  • Shi, Z., Liu, Z., Wu, X., & Xu, W. (2013). Feature selection for reliable data association in visual slam. Machine Vision and Applications, 24(4), 667–682.

    Article  Google Scholar 

  • Strasdat, H., Montiel, J. M., & Davison, A. J. (2012). Visual slam: Why filter? Image and Vision Computing, 30(2), 65–77.

    Article  Google Scholar 

  • Stuckler, J., & Behnke, S. (2014). Multi-resolution surfel maps for efficient dense 3d modeling and tracking. Journal of Visual Communication and Image Representation, 25(1), 137–147.

    Article  Google Scholar 

  • Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of rgb-d SLAM systems. In 2012 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 573–580), IEEE.

  • Tian, B., Shim, V. A., Yuan, M., Srinivasan, C., Tang, H., & Li, H. (2013). Rgb-d based cognitive map building and navigation. In 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1562–1567), IEEE.

  • Ulrich, I., & Nourbakhsh, I. (2000). Appearance-based place recognition for topological localization In Proceedings of ICRA’00 IEEE international conference on robotics and automation (Vol. 2, pp. 1023–1029), IEEE.

  • Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on machine learning (pp. 1096–1103), ACM.

  • Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11, 3371–3408.

    MATH  MathSciNet  Google Scholar 

  • Wang, N., & Yeung, D.-Y. (2013). Learning a deep compact image representation for visual tracking. In Advances in neural information processing systems (pp. 809–817).

  • Wang, Y.-T., & Lin, G.-Y. (2014). Improvement of speeded-up robust features for robot visual simultaneous localization and mapping. Robotica, 32, 533–549.

    Article  Google Scholar 

  • Williams, B., Klein, G., & Reid, I. (2011). Automatic relocalization and loop closing for real-time monocular SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1699–1712.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, X., Zhang, T. Unsupervised learning to detect loops using deep neural networks for visual SLAM system. Auton Robot 41, 1–18 (2017). https://doi.org/10.1007/s10514-015-9516-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-015-9516-2

Keywords

Navigation