Explorations on visual localization from active to passive

Yang, Yongquan; Wu, Yang; Chen, Ning

doi:10.1007/s11042-018-6347-0

Explorations on visual localization from active to passive

Published: 06 July 2018

Volume 78, pages 2269–2309, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yongquan Yang¹,
Yang Wu¹ &
Ning Chen²

308 Accesses
6 Citations
Explore all metrics

Abstract

In this paper, we novelly consider visual localization in active and passive two ways, with simple definition that active localization assists device to estimate location of its interest while passive localization aids device to estimate its own location in environment. Expecting to indicate some insights into visual localization, we specifically performed two explorations on active localization and more importantly explored to upgrade them from active to passive localization with extra geometry information available. In order to produce unconstrained and accurate 2D location estimation of interested object, we constructed an active localization system by fusing detection, tracking and recognition. Based on recognition, we proposed a collaborative strategy making mutual enhancement between detection and tracking possible to obtain better performance on 2D location estimation. Meanwhile, to actively estimate semantic location of interested visual region, we employed latest state-of-the-art light weight CNN models specifically designed for efficiency and trained two of them with large place dataset in perspective of scene recognition. What’s more, using depth information available from RGB-D camera, we improved the active system for 2D location of interested object to a passive system for relative 3D location of device to the interested object. Firstly estimated was the 3D location of the interested object in the coordinate system of device, then relative location of device to the interested object in world coordinate system was deduced with appropriate assumption. Evaluations both subjectively on a RGB-D sequence obtained in a lab environment and practically on a robotic platform in an office environment indicated that the improved system was suitable for autonomous following robot. As well, the active system for rough semantic location estimation of interested visual region was promoted to a passive system for fine location estimation of device, with available 3D map describing the visited environment. In perspective of place recognition, we first adopted one of the efficient CNN models previously trained for semantic location estimation as a base to generate CNN features for both retrieval of candidate loops in the map and geometrical consistency checking of retrieved loops, then true loops were used to deduce fine location of device itself in environment. Comparison with state-of-the-art results reflected that the promoted system was adequate for long-term robotic autonomy. Achieving favorable performances, the presented four explorations have implied adequacy for elaborating on some insights into visual localization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Enhanced Monte Carlo Localization with Visual Place Recognition for Robust Robot Localization

Article 03 February 2015

Feature Matching in the Changed Environments for Visual Localization

Learning Visual Landmarks for Localization with Minimal Supervision

References

P Viola, M Jones (2001) Rapid object detection using a boosted cascade of simple features. CVPR
N Dalal, B Triggs (2005) Histograms of Oriented Gradients for Human Detection. CVPR
P Felzenszwalb, D Mcallester, D Ramanan (2008) A discriminatively trained, multiscale, deformable part modelfor. CVPR
Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object detection an semantic segmentation. CVPR
T-Y Lin, P Dollár, R Girshick, K He, B Hariharan, S Belongie (2017) Feature pyramid networks for object detection. CVPR
R Girshick (2015) Fast R-CNN. ICCV
S Ren, K He, R Girshick, J Sun (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS
K He, G Gkioxari, P Dollár, R Girshick (2017) Mask R-CNN. ICCV
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016) You only look once: Unified, real-time object detection. CVPR 1(2)
W Liu, D Anguelov, D Erhan, C Szegedy, S Reed (2016) SSD: Single shot multibox detector. ECCV
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. CVPR 1(2):8
Google Scholar
C-Y Fu, W Liu, A Ranga, A Tyagi, AC Berg (2016) DSSD:Deconvolutional single shot detector. arXiv:1701.06659
A Krizhevsky, I Sutskever, GE Hinton (2012) ImageNet Classification with Deep Convolutional Neural Networks. Adv Neu Info Proc Syst (NIPS) 1097–1105
J Deng, W Dong, R Socher, L-J Li, K Li, L Fei-Fei (2009) Imagenet: A large-scale hierarchical image database. Proc CVPR
Y Jia, E Shelhamer, J Donahue, S Karayev, J Long, R Girshick, S Guadar-rama, T Darrell (2014) Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X, Tensorflow (2016) A system for large-scale machine learning. Tech Rep Google Brain arXiv:1603.04467
T Chen, M Li, Y Li, M Lin, N Wang, M Wang, T Xiao, B Xu, C Zhang, Z Zhang (2015) MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. Neural Information Processing Systems. Workshop on Machine Learning Systems
Freund Y, Schapire R (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139
Article MathSciNet Google Scholar
T-Y Lin, P Goyal, R Girshick et al (2017) Focal Loss for Dense Object Detection, in ICCV
Jasper R, Uijlings R, van de Sande KEA, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
A Shrivastava, A Gupta, R Girshick (2016) Training region-based object detectors with online hard example mining. CVPR
Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning. Springer series in statistics Springer, Berlin
MATH Google Scholar
BD Lucas, T Kanade (1981) An iterative image registration technique with an application to stereo vision. IJCAI
Bradski GR (1998) Computer vision face tracking for use in a perceptual user interface. Intel Technol J 2(2):12–21
Google Scholar
Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25(5):564–577
Article Google Scholar
S Avidan (2004) Support Vector Tracking. IEEE Trans Patt Anal Mach Intel 1064–1072
Avidan S (2007) Ensemble tracking. IEEE Trans Pattern Anal Mach Intell 29(2):261–271
Article Google Scholar
Babenko B, Yang MH, Belongie S (2009) Visual tracking with online multiple instance learning, in CVPR
K Zhang, L Zhang, M-H Yang (2012) Real-Time compressive tracking, In ECCV
Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. TPAMI 34(7):1409–1422
Article Google Scholar
Zhong W, Lu H, Yang M-H (2012) Robust object tracking via sparse collaborative appearance model. CVPR
Wen L, Cai Z, Lei Z (2014) Robustonline learned Spatio-Temporal Context model for visual tracking. IEEE Trans Image Proc
Adam A, Rivlin E, Shimshoni I (2006) Robust fragments based tracking using the integral histogram. CVPR 1:798–805
Google Scholar
Nebehay G, Pflugfelder R (2015) Clustering of static-adaptive correspondences for deformable object tracking. CVPR
Pernici F, Del Bimbo A (2014) Object tracking by oversampling local features. TPAMI 36(12)
DS Bolme, JR Beveridge, BA Draper, YM Lui (2010) Visual object tracking using adaptive correlation filters. CVPR
J. F. Henriques, R. Caseiro, P. Martins, J. Batista, Exploiting the Circulant Structure of Tracking-by-detection with Kernels (2012) in ECCV. Springer Berlin Heidelberg 702–715
JF Henriques, R Caseiro, P Martins, J Batista (2014) High-speed tracking with kernelized correlation filters
Y Li, J Zhu (2014) A scale adaptive kernel correlation filter tracker with feature integration, in Computer Vision-ECCV 2014 Workshops. Springer 254–265
M Danelljan, G Häger, FS Khan, M Felsberg (2014) Accurate scale estimation for robust visual tracking, in Proceedings of the British Machine Vision Conference BMVC
M Danelljan, FS Khan, M Felsberg, and J vd Weijer (2014) Adaptive color attributes for real-time visual tracking, in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE 1090–1097
M Danelljan, FS Khan, M Felsberg (2015) Convolutional features for correlation filter based visual tracking. ICCV Workshops
T Liu, G Wang, Q Yang (2015) Real-time part-based visual tracking via adaptive correlation filters. Proc IEEE Conf Comput Vis Patt Recog 4902–4912
C Ma, X Yang, C Zhang, M-H Yang (2015) Long-term correlation tracking. Proc IEEE Conf Comput Vis Patt Recog 5388–5396
M Danelljan, G Bhat, FS Khan, M Felsberg (2017) Eco: Efficient convolution operators for tracking. CVPR
Chen Z, Hong Z, Tao D (2015) An experimental survey on correlation filter-based tracking. Comput Sci 53(6025):68–83
Google Scholar
N Wang, D-Y Yeung (2013) Learning a deep compact image representation for visual tracking. Adv Neu Info Proc Syst 809–817
N Wang , S Li , A Gupta , DY Yeung (2015) Transferring Rich Feature Hierarchies for Robust Visual Tracking. Comput Sci
Ma C, Huang JB, Yang X, Yang MH (2015) Hierarchical convolutional features for visual tracking. CVPR
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. CVPR
L Bertinetto, J Valmadre, JF Henriques, A Vedaldi, PHS Torr (2016) Fully-convolutional Siamese networksfor object tracking. arXiv:1606.09549
Held D, Thrun S, Savarese S (2016) Learning to track at 100 FPS with deep regression networks. ECCV
C Harris, M Stephens (1988) A combined corner and edge detector. Proc AVC 147–151
P Beaudet (1978) Rotationally invariant image operators. Proc IJCPR
Lindeberg T (1998) Feature detection with automatic scale selection. IJCV 30(2):79–116
Article Google Scholar
D G Lowe (1999) Object recognition from local scale-invariant features. Proc CVPR 1150–1157
H Bay, T Tuytelaars, LV Gool (2006) Surf: Speeded up robust features. Proc ECCV 404–417
E Rosten T Drummond (2005) Fusing points and lines for high performance tracking. Proc ICCV 1508–1515
E. Mair, G. D. Hager, D. Burschka, M. Suppa, and G. Hirzinger (2010) Adaptive and generic corner detection based on the accelerated segment test. Proc ECCV
M Calonder, V Lepetit, C Strecha, P Fua (2010) Brief: Binary robust independent elementary features. Proc ECCV 778–792
E Rublee, V Rabaud, K Konolige, G Bradski (2011) Orb: An efficient alternative to sift or surf. Proc ICCV 2564–2571
S Leutenegger, M Chli, R Siegwart (2011) Brisk: Binary robust invariant scalable keypoints. Proc ICCV 2548–2555
A Alahi, R Ortiz, P Vandergheynst (2012) Freak: Fast retina keypoint. Proc CVPR 510–517
Y Uchida (2016) Local Feature Detectors, Descriptors, and Image Representations: A Survey, arXiv:1607.08368
J Sivic, A Zisserman (2003) Video google: A text retrieval approach to object matching in videos. Proc ICCV1470–1477
D Nistér, H Stewénius (2006) Scalable recognition with a vocabulary tree. Proc CVPR 2161–2168
Y Jiang, C Ngo, J Yang (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. Proc CIVR 494–501
H Jégou, M Douze, C Schmid (2008) Hamming embedding and weak geometric consistency for large scale image search. Proc ECCV 304–317
Galvez-Lopez D, Tardos JD (2012) Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot 28(5):1188–1197
Article Google Scholar
S Khan, D Wollherr (2015) Ibuild: Incremental bag of binary words for appearance based loop closure detection, in 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE 5441–5447
L Han, L Fang (2017) Multi-Index Hashing for Loop closure Detection. Int Conf Multimed Expo
L Han, L Fang (2017) Beyond SIFT Using Binary features in Loop Closure Detection. IROS
K Chatfield, K Simonyan, A Vedaldi, A Zisserman (2014) Return of the Devil in the Details: Delving Deep into Convolutional Nets. Bri Mach Vis Conf (BMVC)
K Simonyan, A Zisserman (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR. URL http://arxiv.org/abs/1409.1556
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. Eur Conf Comput Vis (ECCV) 8689:584–599
Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. IEEE Conf Comput Vis Patt Recog (CVPR) 7–12. doi: https://doi.org/10.1109/CVPR.2015.7298594
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Comput Vis Pattern Recog. https://arxiv.org/abs/1512.03385
Zhang X, Liu Z (2015) A survey on stereo vision matching algorithms. Intell Control Autom 22(12):2026–2031
Google Scholar
Kumari D, Kaur K (2016) A survey on stereo matching techniques for 3D vision in image processing. Int J Eng Manuf 4:40–49
Google Scholar
Wei YM, Kang L, Yang B (2013) WU Ling-Da, applications of structure from motion: a survey. J Zhejiang Univ Sci C 14(7):486–494
Article Google Scholar
O Ozyesil, V Voroninski, R Basri (2017) A Singer, A Survey of Structure from Motion. Acta Numerica 26
Aulinas J, Petillot Y, Salvi J, Lladó X (2008) The SLAM problem: a survey. Artif Intel Res Develop 184(1):363–371
Google Scholar
Gouda W, Gomaa W, Ogawa T (2014) Vision based SLAM for humanoid robots: a survey, Japan-Egypt international conference on. Electronics:170–175
Taketomi T, Uchiyama H, Ikeda S (2017) Visual SLAM algorithms: a survey from 2010 to 2016. Ipsj Trans Comput Vis Appl 9(1):16
Article Google Scholar
Zhang X, Zhou X, Lin M, Sun J (2017) Shufflenet: An extremely efficient convolutional neural network for mobile devices. CVPR. arXiv preprint arXiv:1707.01083
Luo JH, Wu J, and Lin W (2017) Thinet: A filter level pruning method for deep neural network compression. in ICCV
B Zhou, A Lapedriza, J Xiao, A Torralba, A Oliva (2014) Learning deep features for scene recognition using places database. Adv Neu Info Proc Syst
B Zhou, A Lapedriza, A Khosla, A Oliva, A Torralba (2017) Places: A 10 million image database for scene recognition. IEEE Trans Pat Anal Mach Intel 99
Kalal Z, Mikolajczyk K, Matas J (2010) Forward-backward error: automatic detection of tracking failures. In: Proceedings of the 2010 20th International Conference on Pattern Recognition. IEEE Comput Soc Washington 2756–2759
Kalal Z, Matas J, Mikolajczyk K (2010) P-N learning: bootstrapping binary classifiers by structural constraints. In: 23rd IEEE Conference on Computer Vision and Pattern Recognition, CVPR, San Francisco
J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek (2013) Image classification with the fisher vector: Theory and practice. Int’l J Comput Vis
C Doersch, A Gupta, AA Efros (2013) Mid-level visual element discovery as discriminative mode seeking. Adv Neu Info Proc Syst
Nebehay G, Pflugfelder R (2014) Consensus-based matching and tracking of keypoints. TPAMI 27(10). doi: https://doi.org/10.1109/WACV.2014.6836013
Y Yang, N Chen, S Jiang (2017) Collaborative strategy for visual object tracking. Multimed Tools Appl 1–21
Vojir T, Matas J (2014) The enhanced flock of trackers. RRIV
Kwon J, Lee KM (2009) Tracking of a non-rigid object via patch-based sampling. CVPR
Klein DA, Schulz D, Frintrop S, Cremers AB (2010) Adaptive real-time video-tracking for arbitrary objects. IEEE/RSJ 6219(1):772–777
Google Scholar
Hare S, Saffari A, Torr PHS (2011) Struck: Structured output tracking with kernels. ICCV IEEE Int Conf 263–270
Zhang K, Zhang L, Liu Q, Zhang D, Yang M-H (2014) Fast tracking via dense spatio-temporal context learning. ECCV
M Jaderberg, A Vedaldi, A Zisserman (2014) Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866
V Lebedev, Y Ganin, M Rakhuba, I Oseledets, V Lempitsky (2014) Speeding-up convolutional neural networks using fine-tuned cp-decomposition. arXiv preprint arXiv:1412.6553
Zhang X, Zou J, He K, Sun J (2016) Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell 38(10):1943–1955
Article Google Scholar
W Wen, C Wu, Y Wang, Y Chen, H Li (2016) Learning structured sparsity in deep neural networks. Adv Neu Info Proc Syst 2074–2082
M Rastegari, V Ordonez, J Redmon, A Farhadi (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. Eur Conf Comput Vis 525–542
AG Howard (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. forthcoming
L Sifre (2014) Rigid-motion scattering for image classification, Ph. D. thesis
Ulrich I, Nourbakhsh I (2000) Appearance-based place recognition for topological localization. ICRA 2:1023–1029
Google Scholar
Knopp J, Sivic J, Pajdla T (2010) Avoiding confusing features in place recognition. ECCV 6311:748–761
Google Scholar
Lowry S, Sünderhauf N, Newman P, Leonard JJ, Cox D (2016) Visual place recognition: a survey. IEEE Trans Robots 32(1):1–19
Article Google Scholar
Williams B, Klein G, Reid I (2011) Automatic re-localization and loop closing for real-time monocular slam. IEEE Trans Pattern Anal Mach Intell 33(9):1699–1712
Article Google Scholar
H Strasdat (2012) Local accuracy and global consistency for efficient visual slam, Ph.D. thesis, Citeseer
J Engel, T Schöps, D Cremers (2014) Lsd-slam: Large-scale direct monocular slam, in European Conference on Computer Vision. Springer 834–849
D Hahnel , W Burgard , D Fox , S Thrun (2003) An efficient fastSLAM algorithm for generating maps of large-scale cyclic environments from raw laser range measurements. IROS
JiaWang Bian, Wen-Yan Lin, Yasuyuki Matsushita, Sai-Kit Yeung, Tan Dat Nguyen, Ming-Ming Cheng (2017) GMS: Grid-based Motion Statistics for Fast, Ultra-robust Feature Correspondence. Conf Comput Vis Patt Recog (CVPR)
Latif Y, Cadena C, Neira J (2013) Robust loop closing over time for pose graph SLAM. Int J Robot Res 32(14):1611–1626
Article Google Scholar
Nister D, Naroditsky O, Bergen J (2004) Visual odometry. IEEE Comput Soc Conf Comput Vis Patt Recog 1(1):I-652–I-659
MATH Google Scholar
Y Hou, H Zhang, S Zhou (2015) Convolutional neuralnetwork-based image representation for visual loop closure detection, in information and automation, 2015 IEEE International Conference on. IEEE 2238–2245
Cummins M, Newman P (2008) Fab-map: probabilistic localization and mapping in the space of appearance. Int J Robot Res 27(6):647–665
Article Google Scholar
Labbe M, Michaud F (2013) Appearance-based loop closure detection for online large-scale and long-term operation. IEEE Trans Robot 29(3):734–745
Article Google Scholar
Kejriwal N, Kumar S, Shibata T (2016) High performance loop closure detection using bag of word pairs. Robot Auton Syst 77:55–65
Article Google Scholar

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number 15K16024. We gratefully acknowledge Intel China Lab and Beijing Qfeel Technology Co., Ltd., China for equipment support.

Author information

Authors and Affiliations

Nara Institute of Science and Technology, Ikoma, Japan
Yongquan Yang & Yang Wu
Xi’an Polytechnic University, Xi’an, China
Ning Chen

Authors

Yongquan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ning Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongquan Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

ESM 1

(MP4 9095 kb)

ESM 2

(MP4 10,062 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Wu, Y. & Chen, N. Explorations on visual localization from active to passive. Multimed Tools Appl 78, 2269–2309 (2019). https://doi.org/10.1007/s11042-018-6347-0

Download citation

Received: 22 December 2017
Revised: 19 May 2018
Accepted: 29 June 2018
Published: 06 July 2018
Issue Date: January 2019
DOI: https://doi.org/10.1007/s11042-018-6347-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Explorations on visual localization from active to passive

Abstract

Access this article

Similar content being viewed by others

Enhanced Monte Carlo Localization with Visual Place Recognition for Robust Robot Localization

Feature Matching in the Changed Environments for Visual Localization

Learning Visual Landmarks for Localization with Minimal Supervision

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Electronic supplementary material

ESM 1

ESM 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Explorations on visual localization from active to passive

Abstract

Access this article

Similar content being viewed by others

Enhanced Monte Carlo Localization with Visual Place Recognition for Robust Robot Localization

Feature Matching in the Changed Environments for Visual Localization

Learning Visual Landmarks for Localization with Minimal Supervision

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Electronic supplementary material

ESM 1

ESM 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation