Skip to main content

Advertisement

Log in

Evaluation of Object Proposals and ConvNet Features for Landmark-based Visual Place Recognition

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Despite significant progress has been made in visual place recognition for mobile robot navigation, challenges remain, especially in changing environments. Recently, a landmark-based visual place description technique has achieved impressive results under conditions of significant environmental and viewpoint changes, raising the interest of the community in it. This technique combines the strengths of object proposals and convolutional neural networks (ConvNets), which are the latest achievements in object detection and deep learning research. The idea is to detect landmarks in an image with an object proposal method and then characterize these landmarks as features (known as ConvNet features) computed by a ConvNet for matching landmarks. Although a large number of object proposal approaches and ConvNet features have been proposed, it remains unclear how to select or combine object proposals and ConvNet features for a landmark-based visual place recognition system. In this paper we conduct a thorough evaluation of 13 state-of-the-art object proposal methods and 13 kinds of modern ConvNet features on six datasets with various environmental and viewpoint changes, in terms of their place recognition accuracy and computational efficiency. Our study identifies the strengths and weaknesses of object proposal methods and ConvNet features with respect to environmental changes. Conclusions drawn from our analysis are expected to be useful for developing landmark-based visual place recognition systems and benefit other related research fields.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lowry, S., Süenderhauf, N., Newman, P., Leonard, J.J., Cox, D., Corke, P., Milford, M.J.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2016)

    Article  Google Scholar 

  2. Süenderhauf, N., Dayoub, F., Shirazi, S., Upcroft, B., Milford, M.: On the performance of ConvNet features for place recognition. In: IEEE international conference on intelligent robots and systems (IROS), pp 4297–4304 (2015)

  3. Cummins, M., Newman, P.: FAB-MAP: probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 27(6), 647–665 (2008)

    Article  Google Scholar 

  4. Milford, M.J., Wyeth, G.F.: SeqSLAM: visual route-based navigation for sunny summer days and stormy winter nights. In: IEEE international conference on robotics and automation (ICRA), pp 1643–1649 (2012)

  5. Liu, Y., Zhang, H.: Towards improving the efficiency of sequence-based SLAM. In: IEEE international conference on mechatronics and automation (ICMA), pp 1261–1266 (2013)

  6. Milford, M.: Vision-based place recognition: how low can you go? Int. J. Robot. Res. 32(7), 766–789 (2013)

    Article  Google Scholar 

  7. Naseer, T., Spinello, L., Burgard, W., Stachniss, C.: Robust visual robot localization across seasons using network flows. In: AAAI conference on artificial intelligence, pp 2564–2570 (2014)

  8. Glover, A.J., Maddern, W.P., Milford, M.J., Wyeth, G.F.: FAB-MAP + RatSLAM: appearance-Based SLAM for multiple times of day. In: IEEE international conference on robotics and automation (ICRA), pp 3507–3512 (2010)

  9. Neubert, P., Süenderhauf, N., Protzel, P.: Superpixel-based appearance change prediction for long-term navigation across seasons. Robot. Auton. Syst. 69, 15–27 (2015)

    Article  Google Scholar 

  10. Hou, Y., Zhang, H., Zhou, S.: Convolutional neural network-based image representation for visual loop closure detection. In: IEEE international conference on information and automation(ICIA), pp 2238–2245 (2015)

  11. Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: IEEE international conference on computer vision (ICCV), pp 1470–1477 (2003)

  12. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  13. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Computer vision-ECCV, 3951, pp 404–417 (2006)

    Chapter  Google Scholar 

  14. Singh, G., Kosecka, J.: Visual loop closing using gist descriptors in manhattan world. In: IEEE international conference on robotics and automation (ICRA) omnidirectional robot vision workshop (2010)

  15. Süenderhauf, N., Protzel, P.: BRIEF-Gist–Closing the loop by simple means. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1234–1241 (2011)

  16. Liu, Y., Zhang, H.: Visual loop closure detection with a compact image descriptor. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1051–1056 (2012)

  17. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  Google Scholar 

  18. McManus, C., Upcroft, B., Newman, P.: Scene signatures: localised and point-less features for localisation. In: Robotics science and systems (RSS) (2014)

  19. Kentaro, Y., Masatoshi, A., Yuuto, C., Kanji, T.: An experimental study of the effects of landmark discovery and retrieval on visual place recognition across seasons. In: Workshop on visual place recognition in changing environments at the IEEE international conference on robotics and automation (2015)

  20. Süenderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., Milford, M.: Place recognition with ConvNet landmarks: viewpoint-robust, condition-robust, training-free. In: Robotics: science and systems (2015)

  21. Hosang, J., Benenson, R., Dollár, P., Schiele, B.: What makes for effective detection proposals? IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 814–830 (2016)

    Article  Google Scholar 

  22. Cheng, M.-M., Zhang, Z., Lin, W.-Y., Torr, P.: BING: binarized normed gradients for objectness estimation at 300Fps. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3286–3293 (2014)

  23. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Computer vision-ECCV, pp 391–405 (2014)

    Google Scholar 

  24. Krähenbühl, P., Koltun, V.: Geodesic object proposals. In: Computer vision-ECCV, 8693, pp 725–739 (2014)

    Google Scholar 

  25. Arbelaez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 328–335 (2014)

  26. Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)

    Article  Google Scholar 

  27. Rahtu, E., Kannala, J., Blaschko, M.: Learning a category independent object detection cascade. In: IEEE international conference on computer vision (ICCV), pp 1052–1059 (2011)

  28. Manen, S., Guillaumin, M., Van Gool, L.: Prime object proposals with randomized prim’s algorithm. In: IEEE international conference on computer vision (ICCV), pp 2536–2543 (2013)

  29. Rantalankila, P., Kannala, J., Rahtu, E.: Generating object segmentation proposals using global and local search. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2417–2424 (2014)

  30. Humayun, A., Li, F., Rehg, J.M.: RIGOR: reusing inference in graph cuts for generating object regions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 336–343 (2014)

  31. Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  32. Krahenbuhl, P., Koltun, V.: Learning to propose objects. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1574–1582 (2015)

  33. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp 91–99 (2015)

  34. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7263–7271 (2017)

  35. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (NIPS), pp 1097–1105 (2012)

  36. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)

  37. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR) (2015)

  38. Girshick, R.: Fast R-CNN. In: IEEE international conference on computer vision (ICCV), pp 1440–1448 (2015)

  39. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-Level accuracy with 50X fewer parameters and < 0.5MB model size. In: arXiv:1602.07360 (2016)

  40. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778 (2016)

  41. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826 (2016)

  42. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1251–1258 (2017)

  43. Chavali, N., Agrawal, H., Mahendru, A., Batra, D.: Object-proposal evaluation protocol is ‘Gameable’. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 835–844 (2016)

  44. Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a petrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)

    Article  Google Scholar 

  45. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: IMagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  46. Dasgupta, S.: Experiments with random projection. In: Conference on uncertainty in artificial intelligence, pp 143–151 (2000)

  47. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: The Seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 245–250 (2001)

  48. Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788 (2016)

  49. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E.: SSD: single shot multiBox detector. In: Computer vision - ECCV 2016, pp 21–37 (2016)

    Chapter  Google Scholar 

  50. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9 (2015)

  51. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Computer vision-ECCV, pp 818–833 (2014)

    Google Scholar 

  52. Liu, Y., Feng, R., Zhang, H.: Keypoint matching by outlier pruning with consensus constraint. In: IEEE international conference on robotics and automation (ICRA), pp 5481–5486 (2015)

  53. Chen, Z., Lam, O., Jacobson, A., Milford, M.: Convolutional neural network-based place recognition. In: Australasian conference on robotics and automation (2014)

  54. Mapillary. https://www.mapillary.com, accessed on May 15, 2016

  55. Su, W., Yuan, Y., Zhu, M.: A relationship between the average precision and the area under the ROC curve. In: International conference on the theory of information retrieval, pp 349–352 (2015)

  56. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: 22Nd ACM international conference on multimedia, pp 675–678 (2014)

  57. Chollet, F., et al.: Keras. In: https://github.com/fchollet/keras (2015)

Download references

Acknowledgements

We thank Yubin Kuang from Mapillary [54] for providing Halenseestraße and Kurfürstendamm datasets. We appreciate the helpful comments from reviewers. We also thank the support from the Hunan Provincial Innovation Foundation for Postgraduate (CX2014B021), the Hunan Provincial Natural Science Foundation of China (2015JJ3018) and the China Scholarship Council. This research is also supported in part by the Program of Foshan Innovation Team (Grant No. 2015IT100072) and by NSFC (Grant No. 61673125).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Hou.

Additional information

This work was partially supported by NSERC. This work was done when Y. Hou was visiting at University of Alberta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, Y., Zhang, H. & Zhou, S. Evaluation of Object Proposals and ConvNet Features for Landmark-based Visual Place Recognition. J Intell Robot Syst 92, 505–520 (2018). https://doi.org/10.1007/s10846-017-0735-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-017-0735-y

Keywords

Navigation