Abstract
In this chapter, we explore a variety of vision-based location estimation techniques, in which the goal is to determine the location of an image at a fine-grained level. First, we introduce the concept about image-based location and landmark recognition (Sect. 4.1), which determines the location of a given image by leveraging collections of geo-located images. Early techniques usually treat this as a similar image matching problem and use the geo-tags transferred from the matched database images. Some recent works have examined how to estimate more fine-grained and comprehensive geo-context information, such as the viewing direction estimation (Sect. 4.3) of photos. Next we will review the techniques for city-scale location recognition, informative codebook generation, and geo-visual clustering (Sect. 4.4). Moreover, we will introduce the structure-from-motion technique, which is closely related to estimating the camera geo-location by generating 3D models. With the 3D scenes reconstructed from the image collections, images are localized by 2D–3D alignment (Sect. 4.5). The camera location, viewing direction, and scene location are estimated simultaneously, which are essential to various applications. Moreover, another class of vision-based location estimation technique using satellite-imagery database is also described (Sect. 4.6).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
G. Schindler, M. Brown, R. Szeliski, City-scale location recognition. in Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2007. IEEE Conference on, pp. 1–7. IEEE (2007)
W. Zhang, J. Kosecka, Image based localization in urban environments. in 3D Data Processing, Visualization, and Transmission, Third International Symposium on, pp. 33–40. IEEE (2006)
J. Hays, A. Efros, Im2gps: estimating geographic information from a single image. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2008. IEEE Conference on, pp. 1–8. IEEE (2008)
A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
D. Chen, G. Baatz, K. Koser, S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, X. Chen, J. Bach, M. Pollefeys, et al., City-scale landmark identification on mobile devices. in Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2011, IEEE Conference on, pp. 737–744. IEEE (2011)
A. Zamir, M. Shah, Accurate image localization based on google maps street view. Comput. Vis.-ECCV 2010, 255–268 (2010)
D. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
H. Bay, T. Tuytelaars, L. Van Gool, Surf: Speeded up robust features. In: Computer Vision-ECCV 2006. (Springer, Berlin, 2006), pp. 404–417
J. Matas, O. Chum, M. Urban, T. Pajdla, Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
X. Li, C. Wu, C. Zach, S. Lazebnik, J. Frahm, Modeling and recognition of landmark image collections using iconic scene graphs. Comput. Vis.-ECCV, 427–440 (2008)
M. Park, J. Luo, R. Collins, Y. Liu, Beyond gps: determining the camera viewing direction of a geotagged image. in Proceedings of the international conference on Multimedia, pp. 631–634. ACM (2010)
M. Muja, D.G. Lowe, Fast approximate nearest neighbors with automatic algorithm configuration. in International Conference on Computer Vision Theory and Application VISSAPP’09). INSTICC Press (2009). pp. 331–340
J. Sivic, A. Zisserman, Video google: A text retrieval approach to object matching in videos. Computer Vision, 2003. in Proceedings of the Ninth IEEE International Conference on, pp. 1470–1477. IEEE (2003)
D. Nister, H. Stewenius, Scalable recognition with a vocabulary tree. in Proceedings of the Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2. IEEE (2006), pp. 2161–2168
Y. Avrithis, Y. Kalantidis, G. Tolias, E. Spyrou, Retrieving landmark and non-landmark images from community photo collections. in Proceedings of the international conference on Multimedia. ACM (2010), pp. 153–162
H. Liu, T. Mei, H. Li, J. Luo, S. Li, Robust and accurate mobile visual localization and its applications. ACM Trans. Multimedia Comput. Commun. Appl. 9(1s), 51:1–51:22 (2013). doi:10.1145/2491735. http://doi.acm.org/10.1145/2491735
G. Schroth, R. Huitl, D. Chen, M. Abu-Alqumsan, A. Al-Nuaimi, E. Steinbach, Mobile visual location recognition. Signal Proc. Mag. IEEE 28(4), 77–89 (2011)
F. Yu, R. Ji, S. Chang, Active query sensing for mobile location search. in Proceedings of the 19th ACM international conference on Multimedia. ACM (2011), pp. 3–12
P. Turcot, D. Lowe, Better matching with fewer features: The selection of useful features in large database recognition problems. in Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pp. 2109–2116. IEEE (2009)
J. Knopp, J. Sivic, T. Pajdla, Avoiding confusing features in place recognition. Comput. Vis.-ECCV 2010 6311, 748–761 (2010)
C. Doersch, S. Singh, A. Gupta, J. Sivic, A.A. Efros, What makes paris look like paris? ACM Trans. Graph. 31(4), 101:1–101:9 (2012)
R. Ji, L. Duan, J. Chen, H. Yao, J. Yuan, Y. Rui, W. Gao, Location discriminative vocabulary coding for mobile landmark search. Int. J. Comput. Vis. 96(3), 290–314 (2012)
H. Liu, T. Mei, J. Luo, H. Li, S. Li, Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing. in Proceedings of the 20th ACM international conference on Multimedia, MM ’12, pp. 9–18. ACM, New York, NY, USA (2012). doi:10.1145/2393347.2393357. http://doi.acm.org/10.1145/2393347.2393357
B. Frey, D. Dueck, Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman, Object retrieval with large vocabularies and fast spatial matching. in Computer Vision and Pattern Recognition (CVPR) 2007. IEEE Conference on, pp. 1–8. IEEE (2007)
R.I. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, 2nd edn. (Cambridge University Press, Cambridge, 2004) ISBN: 0521540518
A. Irschara, C. Zach, J. Frahm, H. Bischof, From structure-from-motion point clouds to fast location recognition. in Proceedings of the Computer Vision and Pattern Recognition, (CVPR) 2009. IEEE Conference on, pp. 2599–2606. IEEE (2009)
Y. Li, N. Snavely, D. Huttenlocher, Location recognition using prioritized feature matching. Comput. Vis.-ECCV 2010 88, 791–804 (2010)
Y. Li, N. Snavely, D. Huttenlocher, P. Fua, Worldwide pose estimation using 3d point clouds. in Proceedings of the Computer Vision-ECCV 2012. Springer (2012), pp. 15–29
T. Sattler, B. Leibe, L. Kobbelt, Fast image-based localization using direct 2d–3d matching. in Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 667–674. IEEE (2011)
N. Snavely, S. Seitz, R. Szeliski, Photo tourism: exploring photo collections in 3d. In: ACM Transactions on Graphics (TOG), vol. 25, pp. 835–846. ACM (2006)
S. Arya, D. Mount, N. Netanyahu, R. Silverman, A. Wu, An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)
D. Nistér, An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 756–770 (2004)
K. Josephson, M. Byrod, Pose estimation with radial distortion and unknown focal length. in Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2009. IEEE Conference on, pp. 2419–2426. IEEE (2009)
C. Chen, K. Grauman, Clues from the beaten path: Location estimation with bursty sequences of tourist photos. in Proceedings of the Computer Vision and Pattern Recognition (CVPR) 2011, IEEE Conference on, pp. 1569–1576. IEEE (2011)
S. Bourke, K. McCarthy, B. Smyth, The social camera: a case-study in contextual image recommendation. in Proceedings of the 16th international conference on Intelligent user interfaces. ACM (2011), pp. 13–22
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Liu, H., Mei, T., Li, H., Luo, J. (2015). Vision-Based Fine-Grained Location Estimation. In: Choi, J., Friedland, G. (eds) Multimodal Location Estimation of Videos and Images. Springer, Cham. https://doi.org/10.1007/978-3-319-09861-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-09861-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09860-9
Online ISBN: 978-3-319-09861-6
eBook Packages: EngineeringEngineering (R0)