Skip to main content
Log in

Memory efficient large-scale image-based localization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Local features have been widely used in the area of image-based localization. However, large-scale 2D-to-3D matching problems still involve massive memory consumption, which is mainly caused by the high dimensionality of the features (e.g. 128 dimensions of SIFT feature). This paper introduces a new method that decreases local features’ high dimensionality for reducing memory capacity and accelerating the descriptor matching process. With this new method, all descriptors are projected into a lower dimensional space through the new learned matrices that are able to reduce the curse of dimensionality in the large scale image-based localization. The low dimensional descriptors are then mapped into a Hamming space for further reducing the memory requirement. This study also proposes an image-based localization pipeline based on the new learned Hamming descriptors. The new learned descriptor and the localization pipeline are applied to two challenging datasets. The experimental results show that the proposed method achieves extraordinary image registration performance compared with the published results from state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: Proceedings of 2012 IEEE conference on computer vision and pattern recognition (CVPR). pp 2911–2918

  2. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). J Comput Vis Image Underst (CVIU) 110(3):346–359

    Article  Google Scholar 

  3. Beltran A, Abargues C, Granell C, Núñez M, Díaz L, Huerta J (2013) A virtual globe tool for searching and visualizing geo-referenced media resources in social networks. Multimed Tools Appl (JMTA):1–25

  4. Broder A (1997) On the resemblance and containment of documents. In: Proceedings of compression and complexity of sequences. pp 21–29

  5. Broder A, Charikar M, Frieze A, Mitzenmacher M (1998) Min-wise independent permutations. J Comput Syst Sci 60:327–336

    MathSciNet  Google Scholar 

  6. Brown M, Hua G, Winder S (2011) Discriminative learning of local image descriptors. IEEE Trans Patt Anal Mach Intell (TPAMI) 33(1):43–57

    Article  Google Scholar 

  7. Castle R, Klein G, Murray D (2008) Video-rate localization in multiple maps for wearable augmented reality. In: Proceedings of the 2008 12th IEEE international symposium on wearable computers (ISWC). pp 15–22

  8. Crandall D, Owens A, Snavely N, Huttenlocher D (2011) Discrete-continuous optimization for large-scale structure from motion. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition (CVPR). pp. 3001–3008

  9. Cummins M, Newman P (2008) Fab-map: probabilistic localization and mapping in the space of appearance. Int J Robot Res(IJRR) 27(6):647–665

    Article  Google Scholar 

  10. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  MathSciNet  Google Scholar 

  11. Frahm J, Georgel P, Gallup D, Johnson T, Raguram R, Wu C, Jen Y, Dunn E, Clipp B, Lazebnik S, Pollefeys M (2010) Building Rome on a cloudless day. In: Proceedings of the 11th European conference on computer vision (ECCV). pp 368–381

  12. Gao Y, Wang M, Zha Z, Shen J, Li X, Wu X (2013) Visual-textual joint relevance learning for tag-based social image search. IEEE Trans Image Process (TIP) 22(1):363–376

    Article  MathSciNet  Google Scholar 

  13. Gao Y, Wang M, Zha Z, Tian Q, Dai Q, Zhang N (2011) Less is more: efficient 3-d object retrieval with query view selection. IEEE Trans Multimed (TMM) 13(5):1007–1018

    Article  Google Scholar 

  14. Han Y, Wu F, Tao D, Shao J, Zhuang Y, Jiang J (2012) Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans Circ Syst Video Tech 22(10):1485–1496

    Article  Google Scholar 

  15. Han Y, Yang Y, Zhou X (2013) Co-regularized ensemble for feature selection. In: Proceedings of the 23rd international joint conference on artificial intelligence (IJCAI)

  16. Hartley R, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press. ISBN: 0521540518

  17. Heath K, Gelfand N, Ovsjanikov M, Aanjaneya M, Guibas L (2010) Image webs: computing and exploiting connectivity in image collections. In: Proceedings of the 2010 IEEE conference on computer vision and pattern recognition (CVPR). pp 3432–3439

  18. Hua G, Brown M, Winder S (2007) Discriminant embedding for local image descriptors. In: Proceedings of the 2007 IEEE 11th international conference on computer vision (ICCV). pp 1–8

  19. Irschara A, Zach C, Frahm J, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: Proceedings of the 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR). pp 2599–2606

  20. Jacobs N, Miskell K, Pless R (2011) Webcam geo-localization using aggregate light levels. In: Proceedings of 2011 IEEE workshops on applications of computer vision (WACV). pp 132–138

  21. Jolliffe I (1986) Principal component analysis. Springer Verlag

  22. Kalia R, Lee KD, Samir B, Je SK, Oh WG (2011) An analysis of the effect of different image preprocessing techniques on the performance of surf: speeded up robust features. In: Proceedings of the 2011 17th Korea-Japan joint workshop on frontiers of computer vision. pp 1–6

  23. Ke Y, Sukthankar R (2004) Pca-sift: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 2. pp 506–513

  24. Kulis B, Darrell T (2009) Learning to hash with binary reconstructive embeddings. In: Proceedings of the 23nd annual conference on neural information processing systems (NIPS). pp 1042–1050

  25. Kulis B, Grauman K (2009) Kernelized locality-sensitive hashing for scalable image search. In: Proceedings of the 2009 IEEE 12th international conference on computer vision (ICCV). pp 2130–2137

  26. Leonard J, Durrant-Whyte H (1991) Simultaneous map building and localization for an autonomous mobile robot. In: Proceedings of the 1991 IEEE/RSJ international workshop on intelligent robots and systems ’91. ’Intelligence for mechanical systems, vol 3. pp 1442–1447

  27. Li Y, Snavely N, Huttenlocher DP (2010) Location recognition using prioritized feature matching. In: Proceedings of the 11th European conference on computer vision (ECCV). pp 791–804

  28. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis (IJCV) 60(2):91–110

    Article  Google Scholar 

  29. Ma Z, Yang Y, Cai Y, Sebe N, Hauptmann A (2012) Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In: Proceedings of the 20th ACM international conference on multimedia (MM). pp 469–478

  30. Ma Z, Yang Y, Sebe N, Hauptmann A (2014) Knowledge adaptation with partially shared features for event detection using few exemplars. In: IEEE transactions on pattern analysis and machine intelligence. 10.1109/TPAMI.2014.2306419

  31. Mika S, Ratsch G, Weston J, Scholkopf B, Mullers K (1999) Fisher discriminant analysis with kernels. In: Proceedings of the 1999 IEEE signal processing society workshop neural networks for signal processing IX. pp 41–48

  32. Muja M, Lowe D (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: Proceedings of the 2009 international conference on computer vision theory and applications (VISAPP). pp 331–340

  33. Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 2. pp 2161–2168

  34. Philbin J, Isard M, Sivic J, Zisserman A (2010) Descriptor learning for efficient retrieval. In: Proceedings of the 11th European conference on computer vision conference on Computer vision (ECCV). pp 677–691

  35. Powell M (1964) An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput J 7:155–162

    Article  MATH  MathSciNet  Google Scholar 

  36. Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: Proceedings of the 22nd annual conference on neural information processing systems (NIPS). pp 1509–1517

  37. Robertson D, Cipolla R (2004) An image-based system for urban navigation. In: Proceedings of the 2004 British machine vision conference (BMVC). pp 819–828

  38. Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2d-to-3d matching. In: Proceedings of the 2011 IEEE international conference on computer vision (ICCV). pp 667–674

  39. Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proceedings of the 2007 IEEE conference on computer vision and pattern recognition (CVPR). pp 1–7

  40. Shao H, Svoboda T, Tuytelaars T, Van Gool L (2003) Hpat indexing for fast object/scene recognition based on local appearance. In: Proceedings of the 2003 international conference on image and video retrieval (CIVR). pp 71–80

  41. Smith R, Cheeseman P (1986) On the representation and estimation of spatial uncertainty. Int J Robot Res (IJRR) 5(6):56–68

    Article  Google Scholar 

  42. Snavely N, Seitz S, Szeliski R (2006) Photo tourism: exploring photo collections in 3d. ACM Transit Graph 25(3):835–846

    Article  Google Scholar 

  43. Steinhoff U, Dusan O, Perko R, Schiele B, Leonardis A (2007) How computer vision can help in outdoor positioning. In: Proceedings of the 2007 European conference on ambient intelligence (AmI). pp 124–141

  44. Strecha C, Bronstein A, Bronstein M, Fua P (2012) LDAHash: improved matching with smaller descriptors. IEEE Trans Patt Anal Mach Intell (TPAMI) 34:66–78

    Article  Google Scholar 

  45. Tola E, Lepetit V, Fua P (2010) Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Patt Anal Mach Intell (TPAMI) 32(5):815–830

    Article  Google Scholar 

  46. Wang H, Yan S, Xu D, Tang X, Huang T (2007) Trace ratio vs. ratio trace for dimensionality reduction. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 1–8

  47. Wang M, Gao Y, Lu K, Rui Y (2013) View-based discriminative probabilistic modeling for 3d object retrieval and recognition. IEEE Trans Image Process (TIP) 22(4):1395–1407

    Article  MathSciNet  Google Scholar 

  48. Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Proceedings of the 22nd annual conference on neural information processing systems (NIPS). pp 1753–1760

  49. Wendel A, Irschara A, Bischof H (2011) Natural landmark-based monocular localization for mavs. In: Proceedings of the 2011 IEEE international conference on robotics and automation (ICRA). pp 5792–5799

  50. Winder S, Hua G, Brown M (2009) Picking the best daisy. In: Proceedings of the 2009 IEEE conference on computer vision and pattern recognition (CVPR). pp 178–185

  51. Xiao J, Chen J, Yeung D, Quan L (2008) Structuring visual words in 3d for arbitrary-view object localization. In: Proceedings of the 10th European conference on computer vision (ECCV). pp 725–737

  52. Xuan K, Zhao G, Taniar D, Safar M, Srinivasan B (2011) Voronoi-based multi-level range search in mobile navigation. Multimed Tools Appl (JMTA) 53(2):459–479

    Article  Google Scholar 

  53. Yagnik J, Strelow D, Ross DA, Lin RS (2011) The power of comparative reasoning. In: Proceedings of the 2011 IEEE international conference on computer vision (ICCV). pp 2431–2438

  54. Yang Y, Nie F, Luo J, Zhuang Y, Pan, Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Patt Anal Mach Intell (TPAMI) 34:723–742

    Article  Google Scholar 

  55. Yang Y, Zhuang Y, Wu F, YH, P (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimed (TMM) 10:437–446

    Article  Google Scholar 

  56. Yu S, Yang Y, Hauptmann A (2013) Harry potter’s marauder’s map: localizing and tracking multiple persons-of-interest by nonnegative discretization. In: Proceedings of 2013 IEEE conference on computer vision and pattern recognition (CVPR)

  57. Zhang W, Kosecka J (2006) Image based localization in urban environments. In: Proceedings of the 3rd international symposium on 3D data processing, visualization, and transmission (3DPVT). pp 33–40

Download references

Acknowledgements

This work has been financially supported by European Master in Informatics program, RWTH Aachen University, University of Trento and the PhD program of University of Delaware. The authors are grateful to Torsten Sattler and Leif Kobbelt from RWTH Aachen University for their great help to make this work accomplished.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoyu Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, G., Sebe, N., Xu, C. et al. Memory efficient large-scale image-based localization. Multimed Tools Appl 74, 479–503 (2015). https://doi.org/10.1007/s11042-014-1977-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-1977-3

Keywords

Navigation