Real-Time Visual Place Recognition Based on Analyzing Distribution of Multi-scale CNN Landmarks

Xin, Zhe; Cui, Xiaoguang; Zhang, Jixiang; Yang, Yiping; Wang, Yanqing

doi:10.1007/s10846-018-0804-x

Real-Time Visual Place Recognition Based on Analyzing Distribution of Multi-scale CNN Landmarks

Published: 02 March 2018

Volume 94, pages 777–792, (2019)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Zhe Xin ORCID: orcid.org/0000-0001-9647-518X^1,2,
Xiaoguang Cui¹,
Jixiang Zhang¹,
Yiping Yang¹ &
…
Yanqing Wang¹

483 Accesses
11 Citations
Explore all metrics

Abstract

What makes visual place recognition difficult to solve is the variation of the real-world places. In this work, an effective similarity measurement is proposed for visual place recognition in changing environments, based on Convolutional Neural Networks (CNNs) and content-based multi-scale landmarks. The image is firstly segmented into multi-scale landmarks with content information in order to adapt variations of viewpoint, then highly representative features of landmarks are derived from Convolutional Neural Networks (CNNs), which are robust against appearance variations. In the similarity measurement, the similarity between images is determined by analyzing both spatial and scale distributions of matched landmarks. Moreover, an efficient feature extraction and reduction strategy are proposed to generate all features of landmarks at one time. The efficiency of the proposed method makes it suitable for real-time applications. The proposed method is evaluated on two widespread datasets with varied viewpoint and appearance conditions and achieves superior performance against four other state-of-the-art methods, such as the bag-of-words model DBoW3 and the CNN-based Edge Boxes landmarks. Extensive experimentation demonstrates that integrating global and local information can provide more invariance in severe appearance changes, and considering the spatial distribution of landmarks can improve the robustness against viewpoint changes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Lowry, S., Sunderhauf, N., Newman, P., Leonard, J. J.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2015)
Article Google Scholar
Oliva, A., Torralba, A.A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article MATH Google Scholar
Dalal, N, Triggs, B: Histograms of oriented gradients for human detection. In: CVPR 2005. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, vol. 1, pp 886–893 (2005)
Murillo, A. C., Kosecka, J.: Experiments in place recognition using gist panoramas. In: IEEE International Conference on Computer Vision Workshops, pp 2196–2203 (2009)
Liu, Y., Zhang. H.: Visual loop closure detection with a compact image descriptor. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 1051–1056 (2012)
Milford, M. J., Wyeth, G. F.: Seqslam: visual route-based navigation for sunny summer days and stormy winter nights. In: IEEE International Conference on Robotics and Automation, pp 1643–1649 (2012)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer Vision - ECCV 2006, European Conference on Computer Vision, Graz, Austria, May 7–13, 2006, Proceedings, pp 404–417 (2006)
Rublee, E, Rabaud, V, Konolige, K, Bradski, G: Orb: an efficient alternative to SIFT or SURF[C]. 58(11), 2564–2571. In: 2011 IEEE international conference on Computer Vision (ICCV), pp 2564–2571. IEEE (2011)
G’alvez-L’opez, D., Tard’os, J. D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188—1197 (2012). ISSN 1552-3098
Google Scholar
Cummins, M., Newman, P. M.: Appearance-only slam at large scale with fab-map 2.0. Int. J. Robot. Res. 30(9), 1100–1123 (2011)
Article Google Scholar
Wang, J., Zha, H., Cipolla, R.: Combining interest points and edges for content-based image retrieval. In: IEEE International Conference on Image Processing, pp III–1256–9 (2005)
Filliat, D.: A visual bag of words method for interactive qualitative localization and mapping. In: IEEE International Conference on Robotics and Automation, pp 3921–3926 (2011)
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)
Article Google Scholar
Zitnick, CL, Dollar, P.: Edge Boxes: Locating Object Proposals from Edges. Springer International Publishing, New York (2014)
Neubert, P., Protzel, P.: Beyond holistic descriptors, keypoints, and fixed patches: multiscale superpixel grids for place recognition in changing environments. IEEE Robot. Autom. Lett. 1(1), 484–491 (2016)
Article Google Scholar
Chen, Z., Jacobson, A., Sünderhauf, N., Upcroft, B., Liu, L., Shen, C., Reid, I., Milford, M.: Deep learning features at scale for visual place recognition[C]. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp 3223–3230. IEEE (2017)
Sunderhauf, N., Shirazi, S., Jacobson, A., Pepperell, E., Dayoub, F., Upcroft, B., Milford, M.: Place recognition with convnet landmarks: viewpoint-robust, condition-robust, training-free. In: Robotics: Science and Systems, pp 296–296 (2015)
Valgren, C., Lilienthal, A. J.: Sift, surf & seasons: appearance-based long-term localization in outdoor environments. Robot. Auton. Syst. 58(2), 149–156 (2010)
Article Google Scholar
Valgren, C., Lilienthal, A. J.: Sift, surf and seasons: long-term outdoor localization using local features. In: European Conference on Mobile Robots (2007)
Nuske, S., Roberts, J., Wyeth, G.: Robust outdoor visual localization using a three-dimensional-edge map. J. Field Rob. 26(9), 728–756 (2010)
Article Google Scholar
Arroyo, R, Alcantarilla, P. F, Bergasa, L. M, Romera, E: Towards life-long visual localization using an efficient matching of binary sequences from images. In: IEEE International Conference on Robotics and Automation, pp 6328–6335 (2015)
Yang, X., Cheng, K. T. T.: Local difference binary for ultrafast and distinctive feature description. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 188–194 (2013)
Article Google Scholar
Han, F., Yang, X., Deng, Y., Rentschler, M., Yang, D., Zhang, H.: Sral: shared representative appearance learning for long-term visual place recognition. IEEE Robot. Autom. Lett. 2(2), 1172–1179 (2017)
Article Google Scholar
Sunderhauf, N., Shirazi, S., Dayoub, F., Upcroft, B.: On the performance of convnet features for place recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 4297–4304 (2015)
Gomez-Ojeda, R., Lopez-Antequera, M., Petkov, N., Gonzalezjimenez, J.: Training a convolutional neural network for appearance-invariant place recognition[J]. arXiv:1505.07428 (2015)
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J: Netvlad: Cnn architecture for weakly supervised place recognition[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5297–5307 (2016)
Jacobs, N, Roman, N, Pless, R: Consistent temporal variations in many outdoor scenes. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1–6 (2007)
Ranganathan, A., Matsumoto, S., Ilstrup, D.: Towards illumination invariance for visual localization. In: IEEE International Conference on Robotics and Automation, pp 3791–3798 (2013)
Neubert, P., Sünderhauf, N., Protzel, P.: Superpixel-based appearance change prediction for long-term navigation across seasons. Robot. Auton. Syst. 69(1), 15–27 (2015)
Article Google Scholar
Lowry, S. M., Milford, M. J., Wyeth, G. F.: Transforming morning to afternoon using linear regression techniques. In: IEEE International Conference on Robotics and Automation, pp 3950–3955 (2014)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.: Caffe: convolutional architecture for fast feature embedding. Eprint Arxiv, pp. 675–678 (2014)
Glover, A.: Gardens point walking dataset. https://wiki.qut.edu.au/display/cyphy/Open+datasets+and+software (2014)
Huber, D., Badino, H., Kanade, T.: The cmu visual localization data set. http://3dvis.ri.cmu.edu/data-sets/localization (2011)

Download references

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhe Xin, Xiaoguang Cui, Jixiang Zhang, Yiping Yang & Yanqing Wang
University of Chinese Academy of Sciences, Beijing, China
Zhe Xin

Authors

Zhe Xin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Cui
View author publications
You can also search for this author in PubMed Google Scholar
Jixiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yiping Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yanqing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhe Xin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xin, Z., Cui, X., Zhang, J. et al. Real-Time Visual Place Recognition Based on Analyzing Distribution of Multi-scale CNN Landmarks. J Intell Robot Syst 94, 777–792 (2019). https://doi.org/10.1007/s10846-018-0804-x

Download citation

Received: 28 September 2017
Accepted: 26 February 2018
Published: 02 March 2018
Issue Date: 14 June 2019
DOI: https://doi.org/10.1007/s10846-018-0804-x

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Visual Place Recognition Based on Analyzing Distribution of Multi-scale CNN Landmarks

Abstract

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation