Abstract
Visual place recognition is a critical and challenging problem in both robotics and computer vision communities. In this paper, we focus on place recognition for visual Simultaneous Localization and Mapping (vSLAM) systems. These systems have been limited to handcrafted feature based paradigms for a long time, which normally use local visual information of images and are not sufficiently robust against variations applied to images. In this work, we address place recognition with the features automatically learned from data. First, we propose a graph-based visual place recognition method. The graph is constructed by combining the visual features extracted from convolutional neural networks (CNNs) and the temporal information of the images in a sequence. Second, we propose to employ diffusion process to enhance the data association in the graph to achieve more accurate recognition results. Finally, to evaluate the proposed method, we experiment on four commonly used datasets. Experimental results indicate that the proposed method is able to obtain significantly better performance (e.g. 95.37% recall at 100% of precision) than that of FAB-MAP (47.16% recall at 100% of precision), a commonly used method for place recognition based on handcrafted features, especially on some challenging datasets.
Similar content being viewed by others
References
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016)
Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1269–1277 (2015)
Bai, S., Zhou, Z., Wang, J., Bai, X., Latecki, L.J., Tian, Q.: Ensemble diffusion for retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 774–783 (2017)
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. Comput. Vis.–ECCV 2006, 404–417 (2006)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., Reid, I., Leonard, J.J.: Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans. Robot. 32(6), 1309–1332 (2016)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. Comput. Vis.–ECCV 2010, 778–792 (2010)
Chen, Z., Jacobson, A., Sünderhauf, N., Upcroft, B., Liu, L., Shen, C., Reid, I., Milford, M.: Deep learning features at scale for visual place recognition. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp 3223–3230. IEEE, Singapore (2017). https://doi.org/https://eprints.qut.edu.au/109651/. https://doi.org/10.1109/ICRA.2017.7989366
Chen, Z., Lam, O., Jacobson, A., Milford, M.: Convolutional neural network-based place recognition. Comput. Sci. (2014)
Cummins, M., Newman, P.: FAB-MAP: Probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 27(6), 647–665 (2008). https://doi.org/10.1177/0278364908090961. http://ijr.sagepub.com/cgi/content/abstract/27/6/647
Cummins, M., Newman, P.: Appearance-only slam at large scale with fab-map 2.0. Int. J. Robot. Res. 30 (9), 1100–1123 (2011)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255. IEEE (2009)
Donoser, M., Bischof, H.: Diffusion processes for retrieval revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1320–1327 (2013)
Galvez-LoPez, D., Tardos, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012)
Gao, X., Zhang, T.: Unsupervised learning to detect loops using deep neural networks for visual slam system. Auton. Robot. 41(1), 1–18 (2017)
Garcia-Fidalgo, E., Ortiz, A.: Hierarchical place recognition for topological mapping. IEEE Trans. Robot. 33(5), 1061–1074 (2017)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The Kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Gordo, A., Almazan, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017)
Guclu, O., Can, A.B.: Fast and effective loop closure detection to improve slam performance. J. Intell. Robot. Syst., 1–23 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ho, K.L., Newman, P.: Detecting loop closure with scene sequences. Int. J. Comput. Vis. 74(3), 261–286 (2007)
Hou, Y., Zhang, H., Zhou, S.: Convolutional neural network-based image representation for visual loop closure detection. In: IEEE International Conference on Information and Automation, pp. 2238–2245 (2015)
Hou, Y., Zhang, H., Zhou, S.: Evaluation of object proposals and convnet features for landmark-based visual place recognition. J. Intell. Robot. Syst., 1–16 (2017)
Iscen, A., Tolias, G., Avrithis, Y., Furon, T., Chum, O.: Efficient diffusion on region manifolds: Recovering small objects with compact cnn representations. CVPR (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Lategahn, H., Beck, J., Kitt, B., Stiller, C.: How to learn an illumination robust image feature for place recognition. In: 2013 IEEE Intelligent Vehicles Symposium (IV), pp. 285–291. IEEE (2013)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Lowry, S., Sünderhauf, N., Newman, P., Leonard, J.J., Cox, D., Corke, P., Milford, M.J.: Visual place recognition: A survey. IEEE Trans. Robot. 32(1), 1–19 (2016). https://doi.org/10.1109/TRO.2015.2496823
Naseer, T., Ruhnke, M., Stachniss, C., Spinello, L., Burgard, W.: Robust visual slam across seasons. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2529–2535. IEEE (2015)
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE Conference on Computer Vision (2017)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)
Radenović, F., Tolias, G., Chum, O.: Cnn image retrieval learns from bow: Unsupervised fine-tuning with hard examples. In: European Conference on Computer Vision, pp. 3–20. Springer (2016)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: Cnn features off-the-shelf: An astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014). arXiv:http://arXiv.org/abs/1403.6382v3
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:http://arXiv.org/abs/1409.1556 (2014)
Stumm, E., Mei, C., Lacroix, S., Chli, M.: Location graphs for visual place recognition. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 5475–5480. IEEE (2015)
Stumm, E., Mei, C., Lacroix, S., Nieto, J., Hutter, M., Siegwart, R.: Robust visual place recognition with graph kernels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4535–4544 (2016)
Sünderhauf, N., Shirazi, S., Dayoub, F., Upcroft, B., Milford, M.: On the performance of convnet features for place recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4297–4304. IEEE (2015)
Sunderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., Milford, M.: Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free. Springer International Publishing (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015). arXiv:1409.4842
Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of cnn activations. International Conference on Learning Representations (ICLR) (2016)
Vedaldi, A., Lenc, K.: Matconvnet – convolutional neural networks for matlab. In: Proceeding of the ACM Int. Conf. on Multimedia (2015)
Vysotska, O., Stachniss, C.: Lazy data association for image sequences matching under substantial appearance changes. IEEE Robot. Autom. Lett. 1(1), 213–220 (2016)
Williams, B., Cummins, M., Neira, J., Newman, P., Reid, I., Tardós, J.: A comparison of loop closing techniques in monocular slam. Robot. Auton. Syst. 57(12), 1188–1197 (2009)
Xie, L., Tian, Q., Zhou, W., Zhang, B.: Fast and accurate near-duplicate image search with affinity propagation on the imageweb. Comput. Vis. Image Underst. 124, 31–41 (2014)
Yang, F., Matei, B., Davis, L.S.: Re-ranking by multi-feature fusion with diffusion for image retrieval. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 572–579. IEEE (2015)
Yang, X., Koknar-Tezel, S., Latecki, L.J.: Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 357–364. IEEE (2009)
Zhang, X., Su, Y., Zhu, X.: Loop closure detection for visual slam systems using convolutional neural network. In: 2017 23rd International Conference on Automation and Computing (ICAC), pp. 1–6. IEEE (2017)
Zhou, B., Garcia, A.L., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. Adv. Neural Inf. Process. Syst. 1, 487–495 (2015)
Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B.: Ranking on data manifolds. In: Advances in Neural Information Processing Systems, pp. 169–176 (2004)
Chung, F., Lu, L., Vu, V.: Spectra of random graphs with given expected degrees. Proc. Nat. Acad. Sci. 100(11), 6313–6318 (2003)
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The work in this paper was conducted during Xiwu Zhang’s visit to University of Wollongong, Australia.
Rights and permissions
About this article
Cite this article
Zhang, X., Wang, L., Zhao, Y. et al. Graph-Based Place Recognition in Image Sequences with CNN Features. J Intell Robot Syst 95, 389–403 (2019). https://doi.org/10.1007/s10846-018-0917-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10846-018-0917-2