Abstract
What makes visual place recognition difficult to solve is the variation of the real-world places. In this work, an effective similarity measurement is proposed for visual place recognition in changing environments, based on Convolutional Neural Networks (CNNs) and content-based multi-scale landmarks. The image is firstly segmented into multi-scale landmarks with content information in order to adapt variations of viewpoint, then highly representative features of landmarks are derived from Convolutional Neural Networks (CNNs), which are robust against appearance variations. In the similarity measurement, the similarity between images is determined by analyzing both spatial and scale distributions of matched landmarks. Moreover, an efficient feature extraction and reduction strategy are proposed to generate all features of landmarks at one time. The efficiency of the proposed method makes it suitable for real-time applications. The proposed method is evaluated on two widespread datasets with varied viewpoint and appearance conditions and achieves superior performance against four other state-of-the-art methods, such as the bag-of-words model DBoW3 and the CNN-based Edge Boxes landmarks. Extensive experimentation demonstrates that integrating global and local information can provide more invariance in severe appearance changes, and considering the spatial distribution of landmarks can improve the robustness against viewpoint changes.
References
Lowry, S., Sunderhauf, N., Newman, P., Leonard, J. J.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2015)
Oliva, A., Torralba, A.A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Dalal, N, Triggs, B: Histograms of oriented gradients for human detection. In: CVPR 2005. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, vol. 1, pp 886–893 (2005)
Murillo, A. C., Kosecka, J.: Experiments in place recognition using gist panoramas. In: IEEE International Conference on Computer Vision Workshops, pp 2196–2203 (2009)
Liu, Y., Zhang. H.: Visual loop closure detection with a compact image descriptor. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 1051–1056 (2012)
Milford, M. J., Wyeth, G. F.: Seqslam: visual route-based navigation for sunny summer days and stormy winter nights. In: IEEE International Conference on Robotics and Automation, pp 1643–1649 (2012)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer Vision - ECCV 2006, European Conference on Computer Vision, Graz, Austria, May 7–13, 2006, Proceedings, pp 404–417 (2006)
Rublee, E, Rabaud, V, Konolige, K, Bradski, G: Orb: an efficient alternative to SIFT or SURF[C]. 58(11), 2564–2571. In: 2011 IEEE international conference on Computer Vision (ICCV), pp 2564–2571. IEEE (2011)
G’alvez-L’opez, D., Tard’os, J. D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188—1197 (2012). ISSN 1552-3098
Cummins, M., Newman, P. M.: Appearance-only slam at large scale with fab-map 2.0. Int. J. Robot. Res. 30(9), 1100–1123 (2011)
Wang, J., Zha, H., Cipolla, R.: Combining interest points and edges for content-based image retrieval. In: IEEE International Conference on Image Processing, pp III–1256–9 (2005)
Filliat, D.: A visual bag of words method for interactive qualitative localization and mapping. In: IEEE International Conference on Robotics and Automation, pp 3921–3926 (2011)
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)
Zitnick, CL, Dollar, P.: Edge Boxes: Locating Object Proposals from Edges. Springer International Publishing, New York (2014)
Neubert, P., Protzel, P.: Beyond holistic descriptors, keypoints, and fixed patches: multiscale superpixel grids for place recognition in changing environments. IEEE Robot. Autom. Lett. 1(1), 484–491 (2016)
Chen, Z., Jacobson, A., Sünderhauf, N., Upcroft, B., Liu, L., Shen, C., Reid, I., Milford, M.: Deep learning features at scale for visual place recognition[C]. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp 3223–3230. IEEE (2017)
Sunderhauf, N., Shirazi, S., Jacobson, A., Pepperell, E., Dayoub, F., Upcroft, B., Milford, M.: Place recognition with convnet landmarks: viewpoint-robust, condition-robust, training-free. In: Robotics: Science and Systems, pp 296–296 (2015)
Valgren, C., Lilienthal, A. J.: Sift, surf & seasons: appearance-based long-term localization in outdoor environments. Robot. Auton. Syst. 58(2), 149–156 (2010)
Valgren, C., Lilienthal, A. J.: Sift, surf and seasons: long-term outdoor localization using local features. In: European Conference on Mobile Robots (2007)
Nuske, S., Roberts, J., Wyeth, G.: Robust outdoor visual localization using a three-dimensional-edge map. J. Field Rob. 26(9), 728–756 (2010)
Arroyo, R, Alcantarilla, P. F, Bergasa, L. M, Romera, E: Towards life-long visual localization using an efficient matching of binary sequences from images. In: IEEE International Conference on Robotics and Automation, pp 6328–6335 (2015)
Yang, X., Cheng, K. T. T.: Local difference binary for ultrafast and distinctive feature description. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 188–194 (2013)
Han, F., Yang, X., Deng, Y., Rentschler, M., Yang, D., Zhang, H.: Sral: shared representative appearance learning for long-term visual place recognition. IEEE Robot. Autom. Lett. 2(2), 1172–1179 (2017)
Sunderhauf, N., Shirazi, S., Dayoub, F., Upcroft, B.: On the performance of convnet features for place recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 4297–4304 (2015)
Gomez-Ojeda, R., Lopez-Antequera, M., Petkov, N., Gonzalezjimenez, J.: Training a convolutional neural network for appearance-invariant place recognition[J]. arXiv:1505.07428 (2015)
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J: Netvlad: Cnn architecture for weakly supervised place recognition[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5297–5307 (2016)
Jacobs, N, Roman, N, Pless, R: Consistent temporal variations in many outdoor scenes. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1–6 (2007)
Ranganathan, A., Matsumoto, S., Ilstrup, D.: Towards illumination invariance for visual localization. In: IEEE International Conference on Robotics and Automation, pp 3791–3798 (2013)
Neubert, P., Sünderhauf, N., Protzel, P.: Superpixel-based appearance change prediction for long-term navigation across seasons. Robot. Auton. Syst. 69(1), 15–27 (2015)
Lowry, S. M., Milford, M. J., Wyeth, G. F.: Transforming morning to afternoon using linear regression techniques. In: IEEE International Conference on Robotics and Automation, pp 3950–3955 (2014)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.: Caffe: convolutional architecture for fast feature embedding. Eprint Arxiv, pp. 675–678 (2014)
Glover, A.: Gardens point walking dataset. https://wiki.qut.edu.au/display/cyphy/Open+datasets+and+software (2014)
Huber, D., Badino, H., Kanade, T.: The cmu visual localization data set. http://3dvis.ri.cmu.edu/data-sets/localization (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xin, Z., Cui, X., Zhang, J. et al. Real-Time Visual Place Recognition Based on Analyzing Distribution of Multi-scale CNN Landmarks. J Intell Robot Syst 94, 777–792 (2019). https://doi.org/10.1007/s10846-018-0804-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10846-018-0804-x
Keywords
- Visual place recognition
- Localization
- Convolutional neural networks
- Changing environments
- Landmark distribution