Graph-Based Place Recognition in Image Sequences with CNN Features

Zhang, Xiwu; Wang, Lei; Zhao, Yan; Su, Yan

doi:10.1007/s10846-018-0917-2

Graph-Based Place Recognition in Image Sequences with CNN Features

Published: 15 August 2018

Volume 95, pages 389–403, (2019)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Xiwu Zhang^1,2,
Lei Wang²,
Yan Zhao² &
…
Yan Su¹

1062 Accesses
36 Citations
Explore all metrics

Abstract

Visual place recognition is a critical and challenging problem in both robotics and computer vision communities. In this paper, we focus on place recognition for visual Simultaneous Localization and Mapping (vSLAM) systems. These systems have been limited to handcrafted feature based paradigms for a long time, which normally use local visual information of images and are not sufficiently robust against variations applied to images. In this work, we address place recognition with the features automatically learned from data. First, we propose a graph-based visual place recognition method. The graph is constructed by combining the visual features extracted from convolutional neural networks (CNNs) and the temporal information of the images in a sequence. Second, we propose to employ diffusion process to enhance the data association in the graph to achieve more accurate recognition results. Finally, to evaluate the proposed method, we experiment on four commonly used datasets. Experimental results indicate that the proposed method is able to obtain significantly better performance (e.g. 95.37% recall at 100% of precision) than that of FAB-MAP (47.16% recall at 100% of precision), a commonly used method for place recognition based on handcrafted features, especially on some challenging datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016)
Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1269–1277 (2015)
Bai, S., Zhou, Z., Wang, J., Bai, X., Latecki, L.J., Tian, Q.: Ensemble diffusion for retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 774–783 (2017)
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. Comput. Vis.–ECCV 2006, 404–417 (2006)
Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., Reid, I., Leonard, J.J.: Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans. Robot. 32(6), 1309–1332 (2016)
Article Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. Comput. Vis.–ECCV 2010, 778–792 (2010)
Google Scholar
Chen, Z., Jacobson, A., Sünderhauf, N., Upcroft, B., Liu, L., Shen, C., Reid, I., Milford, M.: Deep learning features at scale for visual place recognition. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp 3223–3230. IEEE, Singapore (2017). https://doi.org/https://eprints.qut.edu.au/109651/. https://doi.org/10.1109/ICRA.2017.7989366
Chen, Z., Lam, O., Jacobson, A., Milford, M.: Convolutional neural network-based place recognition. Comput. Sci. (2014)
Cummins, M., Newman, P.: FAB-MAP: Probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 27(6), 647–665 (2008). https://doi.org/10.1177/0278364908090961. http://ijr.sagepub.com/cgi/content/abstract/27/6/647
Article Google Scholar
Cummins, M., Newman, P.: Appearance-only slam at large scale with fab-map 2.0. Int. J. Robot. Res. 30 (9), 1100–1123 (2011)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255. IEEE (2009)
Donoser, M., Bischof, H.: Diffusion processes for retrieval revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1320–1327 (2013)
Galvez-LoPez, D., Tardos, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012)
Article Google Scholar
Gao, X., Zhang, T.: Unsupervised learning to detect loops using deep neural networks for visual slam system. Auton. Robot. 41(1), 1–18 (2017)
Article MathSciNet Google Scholar
Garcia-Fidalgo, E., Ortiz, A.: Hierarchical place recognition for topological mapping. IEEE Trans. Robot. 33(5), 1061–1074 (2017)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The Kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Gordo, A., Almazan, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017)
Article MathSciNet Google Scholar
Guclu, O., Can, A.B.: Fast and effective loop closure detection to improve slam performance. J. Intell. Robot. Syst., 1–23 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ho, K.L., Newman, P.: Detecting loop closure with scene sequences. Int. J. Comput. Vis. 74(3), 261–286 (2007)
Article Google Scholar
Hou, Y., Zhang, H., Zhou, S.: Convolutional neural network-based image representation for visual loop closure detection. In: IEEE International Conference on Information and Automation, pp. 2238–2245 (2015)
Hou, Y., Zhang, H., Zhou, S.: Evaluation of object proposals and convnet features for landmark-based visual place recognition. J. Intell. Robot. Syst., 1–16 (2017)
Iscen, A., Tolias, G., Avrithis, Y., Furon, T., Chum, O.: Efficient diffusion on region manifolds: Recovering small objects with compact cnn representations. CVPR (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Lategahn, H., Beck, J., Kitt, B., Stiller, C.: How to learn an illumination robust image feature for place recognition. In: 2013 IEEE Intelligent Vehicles Symposium (IV), pp. 285–291. IEEE (2013)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Lowry, S., Sünderhauf, N., Newman, P., Leonard, J.J., Cox, D., Corke, P., Milford, M.J.: Visual place recognition: A survey. IEEE Trans. Robot. 32(1), 1–19 (2016). https://doi.org/10.1109/TRO.2015.2496823
Article Google Scholar
Naseer, T., Ruhnke, M., Stachniss, C., Spinello, L., Burgard, W.: Robust visual slam across seasons. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2529–2535. IEEE (2015)
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE Conference on Computer Vision (2017)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)
Radenović, F., Tolias, G., Chum, O.: Cnn image retrieval learns from bow: Unsupervised fine-tuning with hard examples. In: European Conference on Computer Vision, pp. 3–20. Springer (2016)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: Cnn features off-the-shelf: An astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014). arXiv:http://arXiv.org/abs/1403.6382v3
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:http://arXiv.org/abs/1409.1556 (2014)
Stumm, E., Mei, C., Lacroix, S., Chli, M.: Location graphs for visual place recognition. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 5475–5480. IEEE (2015)
Stumm, E., Mei, C., Lacroix, S., Nieto, J., Hutter, M., Siegwart, R.: Robust visual place recognition with graph kernels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4535–4544 (2016)
Sünderhauf, N., Shirazi, S., Dayoub, F., Upcroft, B., Milford, M.: On the performance of convnet features for place recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4297–4304. IEEE (2015)
Sunderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., Milford, M.: Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free. Springer International Publishing (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015). arXiv:1409.4842
Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of cnn activations. International Conference on Learning Representations (ICLR) (2016)
Vedaldi, A., Lenc, K.: Matconvnet – convolutional neural networks for matlab. In: Proceeding of the ACM Int. Conf. on Multimedia (2015)
Vysotska, O., Stachniss, C.: Lazy data association for image sequences matching under substantial appearance changes. IEEE Robot. Autom. Lett. 1(1), 213–220 (2016)
Article Google Scholar
Williams, B., Cummins, M., Neira, J., Newman, P., Reid, I., Tardós, J.: A comparison of loop closing techniques in monocular slam. Robot. Auton. Syst. 57(12), 1188–1197 (2009)
Article Google Scholar
Xie, L., Tian, Q., Zhou, W., Zhang, B.: Fast and accurate near-duplicate image search with affinity propagation on the imageweb. Comput. Vis. Image Underst. 124, 31–41 (2014)
Article Google Scholar
Yang, F., Matei, B., Davis, L.S.: Re-ranking by multi-feature fusion with diffusion for image retrieval. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 572–579. IEEE (2015)
Yang, X., Koknar-Tezel, S., Latecki, L.J.: Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 357–364. IEEE (2009)
Zhang, X., Su, Y., Zhu, X.: Loop closure detection for visual slam systems using convolutional neural network. In: 2017 23rd International Conference on Automation and Computing (ICAC), pp. 1–6. IEEE (2017)
Zhou, B., Garcia, A.L., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. Adv. Neural Inf. Process. Syst. 1, 487–495 (2015)
Google Scholar
Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B.: Ranking on data manifolds. In: Advances in Neural Information Processing Systems, pp. 169–176 (2004)
Chung, F., Lu, L., Vu, V.: Spectra of random graphs with given expected degrees. Proc. Nat. Acad. Sci. 100(11), 6313–6318 (2003)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, 210094, China
Xiwu Zhang & Yan Su
School of Computing and Information Technology, University of Wollongong, Wollongong, NSW, 2522, Australia
Xiwu Zhang, Lei Wang & Yan Zhao

Authors

Xiwu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yan Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Lei Wang or Yan Su.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work in this paper was conducted during Xiwu Zhang’s visit to University of Wollongong, Australia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Wang, L., Zhao, Y. et al. Graph-Based Place Recognition in Image Sequences with CNN Features. J Intell Robot Syst 95, 389–403 (2019). https://doi.org/10.1007/s10846-018-0917-2

Download citation

Received: 04 March 2018
Accepted: 08 August 2018
Published: 15 August 2018
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s10846-018-0917-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph-Based Place Recognition in Image Sequences with CNN Features

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Graph-Based Place Recognition in Image Sequences with CNN Features

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation