skip to main content
research-article

Robust and accurate mobile visual localization and its applications

Published: 17 October 2013 Publication History

Abstract

Mobile applications are becoming increasingly popular. More and more people are using their phones to enjoy ubiquitous location-based services (LBS). The increasing popularity of LBS creates a fundamental problem: mobile localization. Besides traditional localization methods that use GPS or wireless signals, using phone-captured images for localization has drawn significant interest from researchers. Photos contain more scene context information than the embedded sensors, leading to a more precise location description. With the goal being to accurately sense real geographic scene contexts, this article presents a novel approach to mobile visual localization according to a given image (typically associated with a rough GPS position). The proposed approach is capable of providing a complete set of more accurate parameters about the scene geo-context including the real locations of both the mobile user and perhaps more importantly the captured scene, as well as the viewing direction. To figure out how to make image localization quick and accurate, we investigate various techniques for large-scale image retrieval and 2D-to-3D matching. Specifically, we first generate scene clusters using joint geo-visual clustering, with each scene being represented by a reconstructed 3D model from a set of images. The 3D models are then indexed using a visual vocabulary tree structure. Taking geo-tags of the database image as prior knowledge, a novel location-based codebook weighting scheme proposed to embed this additional information into the codebook. The discriminative power of the codebook is enhanced, thus leading to better image retrieval performance. The query image is aligned with the models obtained from the image retrieval results, and eventually registered to a real-world map. We evaluate the effectiveness of our approach using several large-scale datasets and achieving estimation accuracy of a user's location within 13 meters, viewing direction within 12 degrees, and viewing distance within 26 meters. Of particular note is our showcase of three novel applications based on localization results: (1) an on-the-spot tour guide, (2) collaborative routing, and (3) a sight-seeing guide. The evaluations through user studies demonstrate that these applications are effective in facilitating the ideal rendezvous for mobile users.

References

[1]
Avrithis, Y., Kalantidis, Y., Tolias, G., and Spyrou, E. 2010. Retrieving landmark and non-landmark images from community photo collections. In Proceedings of the International Conference on Multimedia. ACM, 153--162.
[2]
Bourke, S., McCarthy, K., and Smyth, B. 2011. The social camera: A case-study in contextual image recommendation. In Proceedings of the 16th International Conference on Intelligent User Interfaces. ACM, 13--22.
[3]
Chen, D., Baatz, G., Koser, K., Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., and Grzeszczuk, R. 2011. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 737--744.
[4]
Crandall, D., Backstrom, L., Huttenlocher, D., and Kleinberg, J. 2009. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web. ACM, 761--770.
[5]
Doersch, C., Singh, S., Gupta, A., Sivic, J., and Efros, A. A. 2012. What makes Paris look like Paris? ACM Trans. Graph. 31, 4, 101:1--101:9.
[6]
Frey, B. and Dueck, D. 2007. Clustering by passing messages between data points. Science 315, 5814, 972--976.
[7]
Girod, B., Chandrasekhar, V., Chen, D., Cheung, N., Grzeszczuk, R., Reznik, Y., Takacs, G., Tsai, S., and Vedantham, R. 2011. Mobile visual search. IEEE Signal Proces. Mag. 28, 4, 61--76.
[8]
Hartley, R. I. and Zisserman, A. 2004. Multiple View Geometry in Computer Vision 2nd Ed. Cambridge University Press.
[9]
Irschara, A., Zach, C., Frahm, J., and Bischof, H. 2009. From structure-from-motion point clouds to fast location recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). IEEE, 2599--2606.
[10]
Ji, R., Duan, L., Chen, J., Yao, H., Rui, Y., Chang, S., and Gao, W. 2011. Towards low bit rate mobile visual search with multiple-channel coding. In Proceedings of the 19th ACM International Conference on Multimedia. ACM, 573--582.
[11]
Ji, R., Duan, L., Chen, J., Yao, H., Yuan, J., Rui, Y., and Gao, W. 2012. Location discriminative vocabulary coding for mobile landmark search. Int. J. Comput. Vision 96, 3, 290--314.
[12]
Josephson, K. and Byrod, M. 2009. Pose estimation with radial distortion and unknown focal length. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). IEEE, 2419--2426.
[13]
Knopp, J., Sivic, J., and Pajdla, T. 2010. Avoiding confusing features in place recognition. In Proceedings of the European Conference on Computer Vision (ECCV'10). 748--761.
[14]
Kroepfl, M., Wexler, Y., and Ofek, E. 2010. Efficiently locating photographs in many panoramas. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 119--128.
[15]
Li, X., Wu, C., Zach, C., Lazebnik, S., and Frahm, J. 2008. Modeling and recognition of landmark image collections using iconic scene graphs. In Proceedings of the European Conference on Computer Vision (ECCV'08). 427--440.
[16]
Li, Y., Snavely, N., and Huttenlocher, D. 2010. Location recognition using prioritized feature matching. In Proceedings of the European Conference on Computer Vision (ECCV'10). 791--804.
[17]
Liu, H., Mei, T., Luo, J., Li, H., and Li, S. 2012. Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing. In Proceedings of the 20th ACM International Conference on Multimedia (MM'12). ACM, New York, NY, 9--18.
[18]
Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2, 91--110.
[19]
Luo, J., Joshi, D., Yu, J., and Gallagher, A. 2011. Geotagging in multimedia and computer visionla survey. Multimedia Tools Appl. 51, 1, 187--211.
[20]
Luo, Z., Li, H., Tang, J., Hong, R., and Chua, T. 2009. Viewfocus: Explore places of interests on Google maps using photos with view direction filtering. In Proceedings of the 17th ACM International Conference on Multimedia. ACM, 963--964.
[21]
Nistér, D. 2004. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intel. 26, 6, 756--770.
[22]
Nister, D. and Stewenius, H. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Vol. 2, IEEE, 2161--2168.
[23]
Park, M., Luo, J., Collins, R., and Liu, Y. 2010. Beyond GPS: Determining the camera viewing direction of a geotagged image. In Proceedings of the International Conference on Multimedia. ACM, 631--634.
[24]
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07). IEEE, 1--8.
[25]
Philbin, J., Isard, M., Sivic, J., and Zisserman, A. 2010. Descriptor learning for efficient retrieval. In Proceedings of the European Conference on Computer Vision (ECCV'10). Springer, 677--691.
[26]
Sattler, T., Leibe, B., and Kobbelt, L. 2011. Fast image-based localization using direct 2d-to-3d matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'11) IEEE, 667--674.
[27]
Schindler, G., Brown, M., and Szeliski, R. 2007. City-scale location recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07). IEEE, 1--7.
[28]
Schroth, G., Huitl, R., Chen, D., Abu-Alqumsan, M., Al-Nuaimi, A., and Steinbach, E. 2011. Mobile visual location recognition. IEEE Signal Process. Mag. 28, 4, 77--89.
[29]
Sivic, J. and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision. IEEE, 1470--1477.
[30]
Snavely, N., Seitz, S., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3d. ACM Trans. Graph. 25, 835--846.
[31]
Turcot, P. and Lowe, D. 2009. Better matching with fewer features: The selection of useful features in large database recognition problems. In Proceedings of the IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2109--2116.
[32]
Wang, X., Yang, M., Cour, T., Zhu, S., Yu, K., and Han, T. X. 2011. Contextual weighting for vocabulary tree based image retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). IEEE, 209--216.
[33]
Yu, F., Ji, R., and Chang, S. 2011. Active query sensing for mobile location search. In Proceedings of the 19th ACM International Conference on Multimedia. ACM, 3--12.
[34]
Zamir, A. and Shah, M. 2010. Accurate image localization based on google maps street view. In Proceedings of the European Conference on Computer Vision (ECCV'10). 255--268.
[35]
Zhang, S., Yang, M., Cour, T., Yu, K., and Metaxas, D. N. 2012. Query specific fusion for image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV'12). Springer, 660--673.
[36]
Zhang, W. and Kosecka, J. 2006. Image based localization in urban environments. In Proceedings of the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission. IEEE, 33--40.
[37]
Zhou, W., Lu, Y., Li, H., Song, Y., and Tian, Q. 2010. Spatial coding for large scale partial-duplicate web image search. In Proceedings of the International Conference on Multimedia. ACM, 511--520.
[38]
Zhuang, J., Mei, T., Hoi, S. C., Xu, Y.-Q., and Li, S. 2011. When recommendation meets mobile: Contextual and personalized recommendation on the go. In Proceedings of the 13th International Conference on Ubiquitous Computing (UbiComp'11). ACM, New York, NY, 153--162.

Cited By

View all
  • (2023)Weakly Supervised Hashing with Reconstructive Cross-modal AttentionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358918519:6(1-19)Online publication date: 12-Jul-2023
  • (2021)Lifelog Image Retrieval Based on Semantic Relevance MappingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344620917:3(1-18)Online publication date: 22-Jul-2021
  • (2018)Common Crucial Feature for Crowdsourcing Based Mobile Visual Location Recognition2018 25th IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2018.8451477(908-912)Online publication date: Oct-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 9, Issue 1s
Special Sections on the 20th Anniversary of ACM International Conference on Multimedia, Best Papers of ACM Multimedia 2012
October 2013
218 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2523001
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2013
Accepted: 01 May 2013
Revised: 01 May 2013
Received: 01 February 2013
Published in TOMM Volume 9, Issue 1s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Mobile visual localization
  2. geo-tagging
  3. location-based services
  4. scene reconstruction

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Weakly Supervised Hashing with Reconstructive Cross-modal AttentionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358918519:6(1-19)Online publication date: 12-Jul-2023
  • (2021)Lifelog Image Retrieval Based on Semantic Relevance MappingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344620917:3(1-18)Online publication date: 22-Jul-2021
  • (2018)Common Crucial Feature for Crowdsourcing Based Mobile Visual Location Recognition2018 25th IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2018.8451477(908-912)Online publication date: Oct-2018
  • (2017)Enhancing Transmission Collision Detection for Distributed TDMA in Vehicular NetworksACM Transactions on Multimedia Computing, Communications, and Applications10.1145/309283313:3s(1-21)Online publication date: 14-Jul-2017
  • (2017)A survey on context-aware mobile visual recognitionMultimedia Systems10.1007/s00530-016-0523-823:6(647-665)Online publication date: 1-Nov-2017
  • (2016)Applying Seamful Design in Location-Based Mobile Museum ApplicationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/296272012:4(1-23)Online publication date: 24-Aug-2016
  • (2015)Monitoring adolescent alcohol use via multimodal analysis in social multimediaProceedings of the 2015 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2015.7363914(1509-1518)Online publication date: 29-Oct-2015
  • (2014)Mobile Landmark Search with 3D ModelsIEEE Transactions on Multimedia10.1109/TMM.2014.230274416:3(623-636)Online publication date: 1-Apr-2014
  • (2014)Vision-Based Fine-Grained Location EstimationMultimodal Location Estimation of Videos and Images10.1007/978-3-319-09861-6_4(63-83)Online publication date: 5-Oct-2014

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media