skip to main content
research-article

Active query sensing: Suggesting the best query view for mobile visual search

Published: 16 October 2012 Publication History

Abstract

While much exciting progress is being made in mobile visual search, one important question has been left unexplored in all current systems. When searching objects or scenes in the 3D world, which viewing angle is more likely to be successful? More particularly, if the first query fails to find the right target, how should the user control the mobile camera to form the second query? In this article, we propose a novel Active Query Sensing system for mobile location search, which actively suggests the best subsequent query view to recognize the physical location in the mobile environment. The proposed system includes two unique components: (1) an offline process for analyzing the saliencies of different views associated with each geographical location, which predicts the location search precisions of individual views by modeling their self-retrieval score distributions. (2) an online process for estimating the view of an unseen query, and suggesting the best subsequent view change. Specifically, the optimal viewing angle change for the next query can be formulated as an online information theoretic approach. Using a scalable visual search system implemented over a NYC street view dataset (0.3 million images), we show a performance gain by reducing the failure rate of mobile location search to only 12% after the second query. We have also implemented an end-to-end functional system, including user interfaces on iPhones, client-server communication, and a remote search server. This work may open up an exciting new direction for developing interactive mobile media applications through the innovative exploitation of active sensing and query formulation.

References

[1]
Baatz, G., Koser, K., Chen, D., Grzeszczuk, R., and Pollefeys, M. 2010. Handling urban location recognition as a 2d homothetic problem. In Proceedings of the 10th European Conference on Computer Vision.
[2]
Chen, D., Baatz, G., Koser, K., Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., and Grzeszczuk, R. 2011. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR).
[3]
Crandall, D., Backstrom, L., and Huttenlocher, D. 2009. Mapping the world's photos. In Proceedings of the 18th International World Wide Web Conference (WWW).
[4]
Datta, R., Joshi, D., Li, J., and Wang, J. 2008. Image retrieval ideas, influences, and trends of the new age. ACM Comput. Surv.
[5]
Eade, E. and Drummond, T. 2008. Unified loop closing and recovery for real time monocular slam. In Proceedings of the British Machine Vision Conference (BMVC).
[6]
Girod, B., Chandrasekhar, V., Chen, D., Cheung, N.-M., Grzeszczuk, R., Reznik, Y., Takacs, G., Tsai, S., and Vedantham, R. 2011. Mobile visual search. IEEE Sig. Process. Mag.
[7]
Goggles. http://www.google.com/mobile/goggles/.
[8]
He, B. and Ounis, I. 2004. Infer query performance using pre-retrieval predictors. In Proceedings of the International Symposium on String Processing and Information Retrieval.
[9]
He, J., Feng, J., Lin, T.-H., Liu, X., and Chang, S.-F. 2012. Mobile product search with bag of hash bits and boundary rerankings. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR).
[10]
Irschara, A., Zach, C., Frahm, J., and Bischof, H. 2009. From structure-from-motion point clouds to fast location recognition. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR).
[11]
Kaneva, B., Sivic, J., Torralba, A., Avidan, S., and Freeman, W. 2010. Matching and predicting street level images. In Workshop on Vision for Cognitive Tasks, Proceedings of the 10th European Conference on Computer Vision (ECCV).
[12]
Knopp, J., Sivic, J., and Pajdla, T. 2010. Avoiding confusing features in place recognition. In Proceedings of the 10th European Conference on Computer Vision (ECCV).
[13]
Kooaba. http://www.kooaba.com/.
[14]
Kwok, K., Grunfeld, L., Sun, H., Deng, P., and Dinstl, N. 2004. Robust track experiments using pircs. In Proceedings of the 13th Text Retrieval Conference (TREC).
[15]
Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. In Int. J. Comput. Vis.
[16]
Nister, D. and Stewenius, H. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR).
[17]
Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene a holistic representation of the spatial envelope. Int. J. Comput. Vis.
[18]
Point and Find. http://www.pointandfind.nokia.com/.
[19]
Rui, Y., Huang, T., and Chang, S. 1999. Image retrieval current techniques, promising directions, and open issues. J. Vis. Commun. Image Represent.
[20]
Schindler, G. and Brown, M. 2007. City-scale location recognition. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR).
[21]
Sivic, J. and Zisserman, A. 2003. Video google a text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV).
[22]
Snaptell. http://www.snaptell.com/.
[23]
Tong, S. and Chang, E. 2001. Support vector machine active learning for image retrieval. In Proceedings of ACM Multimedia.
[24]
Wu, J. and Rehg, J. 2009. Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV).
[25]
YomTov, E., Fine, S., Carmel, D., and Darlow, A. 2005. Learning to estimate query difficulty. In Proceedings of ACM SIGIR.
[26]
Yu, F., Ji, R., and Chang, S.-F. 2011. Active query sensing for mobile location search. In Proceedings of ACM Multimedia.
[27]
Zha, Z.-J., Yang, L., Mei, T., Wang, M., and Wang, Z. 2009. Visual query suggestion. In Proceedings of ACM Multimedia.
[28]
Zhang, W. and Kosecka, J. 2006. Image based localization in urban environments. In Proceedings of the 3rd International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT).

Cited By

View all
  • (2017)Mobile multi-view object image searchMultimedia Tools and Applications10.1007/s11042-016-3659-976:10(12433-12456)Online publication date: 1-May-2017
  • (2015)Snap n' shop: Visual search-based mobile shopping made a breeze by machine and crowd intelligenceProceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)10.1109/ICOSC.2015.7050803(173-180)Online publication date: Feb-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 8, Issue 3s
Special section of best papers of ACM multimedia 2011, and special section on 3D mobile multimedia
September 2012
173 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2348816
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2012
Accepted: 01 June 2012
Revised: 01 May 2012
Received: 01 March 2012
Published in TOMM Volume 8, Issue 3s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Mobile visual search
  2. active query sensing
  3. content-based image retrieval
  4. mobile location recognition

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Mobile multi-view object image searchMultimedia Tools and Applications10.1007/s11042-016-3659-976:10(12433-12456)Online publication date: 1-May-2017
  • (2015)Snap n' shop: Visual search-based mobile shopping made a breeze by machine and crowd intelligenceProceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)10.1109/ICOSC.2015.7050803(173-180)Online publication date: Feb-2015

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media