Skip to main content

Vision-Based Fine-Grained Location Estimation

  • Chapter
  • First Online:
Multimodal Location Estimation of Videos and Images
  • 971 Accesses

Abstract

In this chapter, we explore a variety of vision-based location estimation techniques, in which the goal is to determine the location of an image at a fine-grained level. First, we introduce the concept about image-based location and landmark recognition (Sect. 4.1), which determines the location of a given image by leveraging collections of geo-located images. Early techniques usually treat this as a similar image matching problem and use the geo-tags transferred from the matched database images. Some recent works have examined how to estimate more fine-grained and comprehensive geo-context information, such as the viewing direction estimation (Sect. 4.3) of photos. Next we will review the techniques for city-scale location recognition, informative codebook generation, and geo-visual clustering (Sect. 4.4). Moreover, we will introduce the structure-from-motion technique, which is closely related to estimating the camera geo-location by generating 3D models. With the 3D scenes reconstructed from the image collections, images are localized by 2D–3D alignment (Sect. 4.5). The camera location, viewing direction, and scene location are estimated simultaneously, which are essential to various applications. Moreover, another class of vision-based location estimation technique using satellite-imagery database is also described (Sect. 4.6).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. G. Schindler, M. Brown, R. Szeliski, City-scale location recognition. in Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2007. IEEE Conference on, pp. 1–7. IEEE (2007)

    Google Scholar 

  2. W. Zhang, J. Kosecka, Image based localization in urban environments. in 3D Data Processing, Visualization, and Transmission, Third International Symposium on, pp. 33–40. IEEE (2006)

    Google Scholar 

  3. J. Hays, A. Efros, Im2gps: estimating geographic information from a single image. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2008. IEEE Conference on, pp. 1–8. IEEE (2008)

    Google Scholar 

  4. A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  5. D. Chen, G. Baatz, K. Koser, S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, X. Chen, J. Bach, M. Pollefeys, et al., City-scale landmark identification on mobile devices. in Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2011, IEEE Conference on, pp. 737–744. IEEE (2011)

    Google Scholar 

  6. A. Zamir, M. Shah, Accurate image localization based on google maps street view. Comput. Vis.-ECCV 2010, 255–268 (2010)

    Google Scholar 

  7. D. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  8. H. Bay, T. Tuytelaars, L. Van Gool, Surf: Speeded up robust features. In: Computer Vision-ECCV 2006. (Springer, Berlin, 2006), pp. 404–417

    Google Scholar 

  9. J. Matas, O. Chum, M. Urban, T. Pajdla, Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)

    Article  Google Scholar 

  10. X. Li, C. Wu, C. Zach, S. Lazebnik, J. Frahm, Modeling and recognition of landmark image collections using iconic scene graphs. Comput. Vis.-ECCV, 427–440 (2008)

    Google Scholar 

  11. M. Park, J. Luo, R. Collins, Y. Liu, Beyond gps: determining the camera viewing direction of a geotagged image. in Proceedings of the international conference on Multimedia, pp. 631–634. ACM (2010)

    Google Scholar 

  12. M. Muja, D.G. Lowe, Fast approximate nearest neighbors with automatic algorithm configuration. in International Conference on Computer Vision Theory and Application VISSAPP’09). INSTICC Press (2009). pp. 331–340

    Google Scholar 

  13. J. Sivic, A. Zisserman, Video google: A text retrieval approach to object matching in videos. Computer Vision, 2003. in Proceedings of the Ninth IEEE International Conference on, pp. 1470–1477. IEEE (2003)

    Google Scholar 

  14. D. Nister, H. Stewenius, Scalable recognition with a vocabulary tree. in Proceedings of the Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2. IEEE (2006), pp. 2161–2168

    Google Scholar 

  15. Y. Avrithis, Y. Kalantidis, G. Tolias, E. Spyrou, Retrieving landmark and non-landmark images from community photo collections. in Proceedings of the international conference on Multimedia. ACM (2010), pp. 153–162

    Google Scholar 

  16. H. Liu, T. Mei, H. Li, J. Luo, S. Li, Robust and accurate mobile visual localization and its applications. ACM Trans. Multimedia Comput. Commun. Appl. 9(1s), 51:1–51:22 (2013). doi:10.1145/2491735. http://doi.acm.org/10.1145/2491735

  17. G. Schroth, R. Huitl, D. Chen, M. Abu-Alqumsan, A. Al-Nuaimi, E. Steinbach, Mobile visual location recognition. Signal Proc. Mag. IEEE 28(4), 77–89 (2011)

    Article  Google Scholar 

  18. F. Yu, R. Ji, S. Chang, Active query sensing for mobile location search. in Proceedings of the 19th ACM international conference on Multimedia. ACM (2011), pp. 3–12

    Google Scholar 

  19. P. Turcot, D. Lowe, Better matching with fewer features: The selection of useful features in large database recognition problems. in Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pp. 2109–2116. IEEE (2009)

    Google Scholar 

  20. J. Knopp, J. Sivic, T. Pajdla, Avoiding confusing features in place recognition. Comput. Vis.-ECCV 2010 6311, 748–761 (2010)

    Article  Google Scholar 

  21. C. Doersch, S. Singh, A. Gupta, J. Sivic, A.A. Efros, What makes paris look like paris? ACM Trans. Graph. 31(4), 101:1–101:9 (2012)

    Article  Google Scholar 

  22. R. Ji, L. Duan, J. Chen, H. Yao, J. Yuan, Y. Rui, W. Gao, Location discriminative vocabulary coding for mobile landmark search. Int. J. Comput. Vis. 96(3), 290–314 (2012)

    Article  MATH  Google Scholar 

  23. H. Liu, T. Mei, J. Luo, H. Li, S. Li, Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing. in Proceedings of the 20th ACM international conference on Multimedia, MM ’12, pp. 9–18. ACM, New York, NY, USA (2012). doi:10.1145/2393347.2393357. http://doi.acm.org/10.1145/2393347.2393357

  24. B. Frey, D. Dueck, Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  25. J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman, Object retrieval with large vocabularies and fast spatial matching. in Computer Vision and Pattern Recognition (CVPR) 2007. IEEE Conference on, pp. 1–8. IEEE (2007)

    Google Scholar 

  26. R.I. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, 2nd edn. (Cambridge University Press, Cambridge, 2004) ISBN: 0521540518

    Google Scholar 

  27. A. Irschara, C. Zach, J. Frahm, H. Bischof, From structure-from-motion point clouds to fast location recognition. in Proceedings of the Computer Vision and Pattern Recognition, (CVPR) 2009. IEEE Conference on, pp. 2599–2606. IEEE (2009)

    Google Scholar 

  28. Y. Li, N. Snavely, D. Huttenlocher, Location recognition using prioritized feature matching. Comput. Vis.-ECCV 2010 88, 791–804 (2010)

    Article  Google Scholar 

  29. Y. Li, N. Snavely, D. Huttenlocher, P. Fua, Worldwide pose estimation using 3d point clouds. in Proceedings of the Computer Vision-ECCV 2012. Springer (2012), pp. 15–29

    Google Scholar 

  30. T. Sattler, B. Leibe, L. Kobbelt, Fast image-based localization using direct 2d–3d matching. in Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 667–674. IEEE (2011)

    Google Scholar 

  31. N. Snavely, S. Seitz, R. Szeliski, Photo tourism: exploring photo collections in 3d. In: ACM Transactions on Graphics (TOG), vol. 25, pp. 835–846. ACM (2006)

    Google Scholar 

  32. S. Arya, D. Mount, N. Netanyahu, R. Silverman, A. Wu, An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  33. D. Nistér, An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 756–770 (2004)

    Article  Google Scholar 

  34. K. Josephson, M. Byrod, Pose estimation with radial distortion and unknown focal length. in Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2009. IEEE Conference on, pp. 2419–2426. IEEE (2009)

    Google Scholar 

  35. C. Chen, K. Grauman, Clues from the beaten path: Location estimation with bursty sequences of tourist photos. in Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2011, IEEE Conference on, pp. 1569–1576. IEEE (2011)

    Google Scholar 

  36. S. Bourke, K. McCarthy, B. Smyth, The social camera: a case-study in contextual image recommendation. in Proceedings of the 16th international conference on Intelligent user interfaces. ACM (2011), pp. 13–22

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Liu, H., Mei, T., Li, H., Luo, J. (2015). Vision-Based Fine-Grained Location Estimation. In: Choi, J., Friedland, G. (eds) Multimodal Location Estimation of Videos and Images. Springer, Cham. https://doi.org/10.1007/978-3-319-09861-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09861-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09860-9

  • Online ISBN: 978-3-319-09861-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics