Skip to main content
Log in

Real-Time Visual Place Recognition Based on Analyzing Distribution of Multi-scale CNN Landmarks

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

What makes visual place recognition difficult to solve is the variation of the real-world places. In this work, an effective similarity measurement is proposed for visual place recognition in changing environments, based on Convolutional Neural Networks (CNNs) and content-based multi-scale landmarks. The image is firstly segmented into multi-scale landmarks with content information in order to adapt variations of viewpoint, then highly representative features of landmarks are derived from Convolutional Neural Networks (CNNs), which are robust against appearance variations. In the similarity measurement, the similarity between images is determined by analyzing both spatial and scale distributions of matched landmarks. Moreover, an efficient feature extraction and reduction strategy are proposed to generate all features of landmarks at one time. The efficiency of the proposed method makes it suitable for real-time applications. The proposed method is evaluated on two widespread datasets with varied viewpoint and appearance conditions and achieves superior performance against four other state-of-the-art methods, such as the bag-of-words model DBoW3 and the CNN-based Edge Boxes landmarks. Extensive experimentation demonstrates that integrating global and local information can provide more invariance in severe appearance changes, and considering the spatial distribution of landmarks can improve the robustness against viewpoint changes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Lowry, S., Sunderhauf, N., Newman, P., Leonard, J. J.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2015)

    Article  Google Scholar 

  2. Oliva, A., Torralba, A.A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  3. Dalal, N, Triggs, B: Histograms of oriented gradients for human detection. In: CVPR 2005. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, vol. 1, pp 886–893 (2005)

  4. Murillo, A. C., Kosecka, J.: Experiments in place recognition using gist panoramas. In: IEEE International Conference on Computer Vision Workshops, pp 2196–2203 (2009)

  5. Liu, Y., Zhang. H.: Visual loop closure detection with a compact image descriptor. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 1051–1056 (2012)

  6. Milford, M. J., Wyeth, G. F.: Seqslam: visual route-based navigation for sunny summer days and stormy winter nights. In: IEEE International Conference on Robotics and Automation, pp 1643–1649 (2012)

  7. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  8. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer Vision - ECCV 2006, European Conference on Computer Vision, Graz, Austria, May 7–13, 2006, Proceedings, pp 404–417 (2006)

  9. Rublee, E, Rabaud, V, Konolige, K, Bradski, G: Orb: an efficient alternative to SIFT or SURF[C]. 58(11), 2564–2571. In: 2011 IEEE international conference on Computer Vision (ICCV), pp 2564–2571. IEEE (2011)

  10. G’alvez-L’opez, D., Tard’os, J. D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188—1197 (2012). ISSN 1552-3098

    Google Scholar 

  11. Cummins, M., Newman, P. M.: Appearance-only slam at large scale with fab-map 2.0. Int. J. Robot. Res. 30(9), 1100–1123 (2011)

    Article  Google Scholar 

  12. Wang, J., Zha, H., Cipolla, R.: Combining interest points and edges for content-based image retrieval. In: IEEE International Conference on Image Processing, pp III–1256–9 (2005)

  13. Filliat, D.: A visual bag of words method for interactive qualitative localization and mapping. In: IEEE International Conference on Robotics and Automation, pp 3921–3926 (2011)

  14. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)

    Article  Google Scholar 

  15. Zitnick, CL, Dollar, P.: Edge Boxes: Locating Object Proposals from Edges. Springer International Publishing, New York (2014)

  16. Neubert, P., Protzel, P.: Beyond holistic descriptors, keypoints, and fixed patches: multiscale superpixel grids for place recognition in changing environments. IEEE Robot. Autom. Lett. 1(1), 484–491 (2016)

    Article  Google Scholar 

  17. Chen, Z., Jacobson, A., Sünderhauf, N., Upcroft, B., Liu, L., Shen, C., Reid, I., Milford, M.: Deep learning features at scale for visual place recognition[C]. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp 3223–3230. IEEE (2017)

  18. Sunderhauf, N., Shirazi, S., Jacobson, A., Pepperell, E., Dayoub, F., Upcroft, B., Milford, M.: Place recognition with convnet landmarks: viewpoint-robust, condition-robust, training-free. In: Robotics: Science and Systems, pp 296–296 (2015)

  19. Valgren, C., Lilienthal, A. J.: Sift, surf & seasons: appearance-based long-term localization in outdoor environments. Robot. Auton. Syst. 58(2), 149–156 (2010)

    Article  Google Scholar 

  20. Valgren, C., Lilienthal, A. J.: Sift, surf and seasons: long-term outdoor localization using local features. In: European Conference on Mobile Robots (2007)

  21. Nuske, S., Roberts, J., Wyeth, G.: Robust outdoor visual localization using a three-dimensional-edge map. J. Field Rob. 26(9), 728–756 (2010)

    Article  Google Scholar 

  22. Arroyo, R, Alcantarilla, P. F, Bergasa, L. M, Romera, E: Towards life-long visual localization using an efficient matching of binary sequences from images. In: IEEE International Conference on Robotics and Automation, pp 6328–6335 (2015)

  23. Yang, X., Cheng, K. T. T.: Local difference binary for ultrafast and distinctive feature description. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 188–194 (2013)

    Article  Google Scholar 

  24. Han, F., Yang, X., Deng, Y., Rentschler, M., Yang, D., Zhang, H.: Sral: shared representative appearance learning for long-term visual place recognition. IEEE Robot. Autom. Lett. 2(2), 1172–1179 (2017)

    Article  Google Scholar 

  25. Sunderhauf, N., Shirazi, S., Dayoub, F., Upcroft, B.: On the performance of convnet features for place recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 4297–4304 (2015)

  26. Gomez-Ojeda, R., Lopez-Antequera, M., Petkov, N., Gonzalezjimenez, J.: Training a convolutional neural network for appearance-invariant place recognition[J]. arXiv:1505.07428 (2015)

  27. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J: Netvlad: Cnn architecture for weakly supervised place recognition[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5297–5307 (2016)

  28. Jacobs, N, Roman, N, Pless, R: Consistent temporal variations in many outdoor scenes. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1–6 (2007)

  29. Ranganathan, A., Matsumoto, S., Ilstrup, D.: Towards illumination invariance for visual localization. In: IEEE International Conference on Robotics and Automation, pp 3791–3798 (2013)

  30. Neubert, P., Sünderhauf, N., Protzel, P.: Superpixel-based appearance change prediction for long-term navigation across seasons. Robot. Auton. Syst. 69(1), 15–27 (2015)

    Article  Google Scholar 

  31. Lowry, S. M., Milford, M. J., Wyeth, G. F.: Transforming morning to afternoon using linear regression techniques. In: IEEE International Conference on Robotics and Automation, pp 3950–3955 (2014)

  32. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  33. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.: Caffe: convolutional architecture for fast feature embedding. Eprint Arxiv, pp. 675–678 (2014)

  34. Glover, A.: Gardens point walking dataset. https://wiki.qut.edu.au/display/cyphy/Open+datasets+and+software (2014)

  35. Huber, D., Badino, H., Kanade, T.: The cmu visual localization data set. http://3dvis.ri.cmu.edu/data-sets/localization (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhe Xin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xin, Z., Cui, X., Zhang, J. et al. Real-Time Visual Place Recognition Based on Analyzing Distribution of Multi-scale CNN Landmarks. J Intell Robot Syst 94, 777–792 (2019). https://doi.org/10.1007/s10846-018-0804-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-018-0804-x

Keywords

Mathematics Subject Classification (2010)

Navigation