Evaluation of Object Proposals and ConvNet Features for Landmark-based Visual Place Recognition

Hou, Yi; Zhang, Hong; Zhou, Shilin

doi:10.1007/s10846-017-0735-y

Evaluation of Object Proposals and ConvNet Features for Landmark-based Visual Place Recognition

Published: 07 November 2017

Volume 92, pages 505–520, (2018)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Yi Hou¹,
Hong Zhang² &
Shilin Zhou¹

513 Accesses
21 Citations
9 Altmetric
Explore all metrics

Abstract

Despite significant progress has been made in visual place recognition for mobile robot navigation, challenges remain, especially in changing environments. Recently, a landmark-based visual place description technique has achieved impressive results under conditions of significant environmental and viewpoint changes, raising the interest of the community in it. This technique combines the strengths of object proposals and convolutional neural networks (ConvNets), which are the latest achievements in object detection and deep learning research. The idea is to detect landmarks in an image with an object proposal method and then characterize these landmarks as features (known as ConvNet features) computed by a ConvNet for matching landmarks. Although a large number of object proposal approaches and ConvNet features have been proposed, it remains unclear how to select or combine object proposals and ConvNet features for a landmark-based visual place recognition system. In this paper we conduct a thorough evaluation of 13 state-of-the-art object proposal methods and 13 kinds of modern ConvNet features on six datasets with various environmental and viewpoint changes, in terms of their place recognition accuracy and computational efficiency. Our study identifies the strengths and weaknesses of object proposal methods and ConvNet features with respect to environmental changes. Conclusions drawn from our analysis are expected to be useful for developing landmark-based visual place recognition systems and benefit other related research fields.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

References

Lowry, S., Süenderhauf, N., Newman, P., Leonard, J.J., Cox, D., Corke, P., Milford, M.J.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2016)
Article Google Scholar
Süenderhauf, N., Dayoub, F., Shirazi, S., Upcroft, B., Milford, M.: On the performance of ConvNet features for place recognition. In: IEEE international conference on intelligent robots and systems (IROS), pp 4297–4304 (2015)
Cummins, M., Newman, P.: FAB-MAP: probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 27(6), 647–665 (2008)
Article Google Scholar
Milford, M.J., Wyeth, G.F.: SeqSLAM: visual route-based navigation for sunny summer days and stormy winter nights. In: IEEE international conference on robotics and automation (ICRA), pp 1643–1649 (2012)
Liu, Y., Zhang, H.: Towards improving the efficiency of sequence-based SLAM. In: IEEE international conference on mechatronics and automation (ICMA), pp 1261–1266 (2013)
Milford, M.: Vision-based place recognition: how low can you go? Int. J. Robot. Res. 32(7), 766–789 (2013)
Article Google Scholar
Naseer, T., Spinello, L., Burgard, W., Stachniss, C.: Robust visual robot localization across seasons using network flows. In: AAAI conference on artificial intelligence, pp 2564–2570 (2014)
Glover, A.J., Maddern, W.P., Milford, M.J., Wyeth, G.F.: FAB-MAP + RatSLAM: appearance-Based SLAM for multiple times of day. In: IEEE international conference on robotics and automation (ICRA), pp 3507–3512 (2010)
Neubert, P., Süenderhauf, N., Protzel, P.: Superpixel-based appearance change prediction for long-term navigation across seasons. Robot. Auton. Syst. 69, 15–27 (2015)
Article Google Scholar
Hou, Y., Zhang, H., Zhou, S.: Convolutional neural network-based image representation for visual loop closure detection. In: IEEE international conference on information and automation(ICIA), pp 2238–2245 (2015)
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: IEEE international conference on computer vision (ICCV), pp 1470–1477 (2003)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Computer vision-ECCV, 3951, pp 404–417 (2006)
Chapter Google Scholar
Singh, G., Kosecka, J.: Visual loop closing using gist descriptors in manhattan world. In: IEEE international conference on robotics and automation (ICRA) omnidirectional robot vision workshop (2010)
Süenderhauf, N., Protzel, P.: BRIEF-Gist–Closing the loop by simple means. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1234–1241 (2011)
Liu, Y., Zhang, H.: Visual loop closure detection with a compact image descriptor. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1051–1056 (2012)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article Google Scholar
McManus, C., Upcroft, B., Newman, P.: Scene signatures: localised and point-less features for localisation. In: Robotics science and systems (RSS) (2014)
Kentaro, Y., Masatoshi, A., Yuuto, C., Kanji, T.: An experimental study of the effects of landmark discovery and retrieval on visual place recognition across seasons. In: Workshop on visual place recognition in changing environments at the IEEE international conference on robotics and automation (2015)
Süenderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., Milford, M.: Place recognition with ConvNet landmarks: viewpoint-robust, condition-robust, training-free. In: Robotics: science and systems (2015)
Hosang, J., Benenson, R., Dollár, P., Schiele, B.: What makes for effective detection proposals? IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 814–830 (2016)
Article Google Scholar
Cheng, M.-M., Zhang, Z., Lin, W.-Y., Torr, P.: BING: binarized normed gradients for objectness estimation at 300Fps. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3286–3293 (2014)
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Computer vision-ECCV, pp 391–405 (2014)
Google Scholar
Krähenbühl, P., Koltun, V.: Geodesic object proposals. In: Computer vision-ECCV, 8693, pp 725–739 (2014)
Google Scholar
Arbelaez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 328–335 (2014)
Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)
Article Google Scholar
Rahtu, E., Kannala, J., Blaschko, M.: Learning a category independent object detection cascade. In: IEEE international conference on computer vision (ICCV), pp 1052–1059 (2011)
Manen, S., Guillaumin, M., Van Gool, L.: Prime object proposals with randomized prim’s algorithm. In: IEEE international conference on computer vision (ICCV), pp 2536–2543 (2013)
Rantalankila, P., Kannala, J., Rahtu, E.: Generating object segmentation proposals using global and local search. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2417–2424 (2014)
Humayun, A., Li, F., Rehg, J.M.: RIGOR: reusing inference in graph cuts for generating object regions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 336–343 (2014)
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Krahenbuhl, P., Koltun, V.: Learning to propose objects. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1574–1582 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp 91–99 (2015)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7263–7271 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (NIPS), pp 1097–1105 (2012)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR) (2015)
Girshick, R.: Fast R-CNN. In: IEEE international conference on computer vision (ICCV), pp 1440–1448 (2015)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-Level accuracy with 50X fewer parameters and < 0.5MB model size. In: arXiv:1602.07360 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778 (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826 (2016)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1251–1258 (2017)
Chavali, N., Agrawal, H., Mahendru, A., Batra, D.: Object-proposal evaluation protocol is ‘Gameable’. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 835–844 (2016)
Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a petrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: IMagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Dasgupta, S.: Experiments with random projection. In: Conference on uncertainty in artificial intelligence, pp 143–151 (2000)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: The Seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 245–250 (2001)
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E.: SSD: single shot multiBox detector. In: Computer vision - ECCV 2016, pp 21–37 (2016)
Chapter Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9 (2015)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Computer vision-ECCV, pp 818–833 (2014)
Google Scholar
Liu, Y., Feng, R., Zhang, H.: Keypoint matching by outlier pruning with consensus constraint. In: IEEE international conference on robotics and automation (ICRA), pp 5481–5486 (2015)
Chen, Z., Lam, O., Jacobson, A., Milford, M.: Convolutional neural network-based place recognition. In: Australasian conference on robotics and automation (2014)
Mapillary. https://www.mapillary.com, accessed on May 15, 2016
Su, W., Yuan, Y., Zhu, M.: A relationship between the average precision and the area under the ROC curve. In: International conference on the theory of information retrieval, pp 349–352 (2015)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: 22Nd ACM international conference on multimedia, pp 675–678 (2014)
Chollet, F., et al.: Keras. In: https://github.com/fchollet/keras (2015)

Download references

Acknowledgements

We thank Yubin Kuang from Mapillary [54] for providing Halenseestraße and Kurfürstendamm datasets. We appreciate the helpful comments from reviewers. We also thank the support from the Hunan Provincial Innovation Foundation for Postgraduate (CX2014B021), the Hunan Provincial Natural Science Foundation of China (2015JJ3018) and the China Scholarship Council. This research is also supported in part by the Program of Foshan Innovation Team (Grant No. 2015IT100072) and by NSFC (Grant No. 61673125).

Author information

Authors and Affiliations

College of Electronic Science and Engineering, National University of Defense Technology, Changsha, Hunan, People’s Republic of China
Yi Hou & Shilin Zhou
Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
Hong Zhang

Authors

Yi Hou
View author publications
You can also search for this author in PubMed Google Scholar
Hong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shilin Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Hou.

Additional information

This work was partially supported by NSERC. This work was done when Y. Hou was visiting at University of Alberta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hou, Y., Zhang, H. & Zhou, S. Evaluation of Object Proposals and ConvNet Features for Landmark-based Visual Place Recognition. J Intell Robot Syst 92, 505–520 (2018). https://doi.org/10.1007/s10846-017-0735-y

Download citation

Received: 04 April 2016
Accepted: 01 November 2017
Published: 07 November 2017
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10846-017-0735-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of Object Proposals and ConvNet Features for Landmark-based Visual Place Recognition

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of Object Proposals and ConvNet Features for Landmark-based Visual Place Recognition

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation