skip to main content
10.1145/2072298.2072310acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

JIGSAW: interactive mobile visual search with multimodal queries

Published: 28 November 2011 Publication History

Abstract

The traditional text-based visual search has not been sufficiently improved over the years to accommodate the new emerging demand of mobile users. While on the go, searching on one's phone is becoming pervasive. This paper presents an innovative application for mobile phone users to facilitate their visual search experience. By taking advantage of smart phone functionalities such as multi-modal and multi-touch interactions, users can more conveniently formulate their search intent, and thus search performance can be significantly improved. The system, called JIGSAW (Joint search with ImaGe, Speech, And Words), represents one of the first attempts to create an interactive and multi-modal mobile visual search application. The key of JIGSAW is the composition of an exemplary image query generated from the raw speech via multi-touch user interaction, as well as the visual search based on the exemplary image. Through JIGSAW, users can formulate their search intent in a natural way like playing a jigsaw puzzle on the phone screen: 1) a user speaks a natural sentence as the query, 2) the speech is recognized and transferred to text which is further decomposed to keywords through entity extraction, 3) the user selects preferred exemplary images that can visually represent his/her intent and composes a query image via multi-touch, and 4) the composite image is then used as a visual query to search similar images. We have deployed JIGSAW on a real-world phone system, evaluated the performance on one million images, and demonstrated that it is an effective complement to existing mobile visual search applications.

References

[1]
http://www.pwc.com/gx/en/communications /review/features/mobile-data.jhtml.
[2]
H. Bay, T. Tuytelaars, and L. Van Gool. SURF: speeded-up robust features. In Proc. of ECCV, pages 346--359, 2008.
[3]
Y. Cao, H. Wang, C. Wang, Z. Li, L. Zhang, and L. Zhang. MindFinder: interactive sketch-based image search on millions of images. In Proc. of ACM International Conference on Multimedia, pages 1605--1608, 2010.
[4]
V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod. CHoG: Compressed histogram of gradients--a low bit-rate feature descriptor. Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pages 2504--2511, 2009.
[5]
V. R. Chandrasekhar, S. S. Tsai, G. Takacs, D. M. Chen, N. M. Cheung, Y. Reznik, R. Vedantham, R. Grzeszczuk, and B. Girod. Low latency image retrieval with progressive transmission of CHoG descriptors. In Proc. of ACM Multimedia Workshop on Mobile Cloud Media Computing, pages 41--46, 2010.
[6]
K. Church, B. Smyth, P. Cotter, and K. Bradley. Mobile information access: A study of emerging search behavior on the mobile internet. ACM Transactions on the Web, 1(1), May 2007.
[7]
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. Imagenet: a large-scale hierarchical image database. 2009.
[8]
Digimarc Discover. "https://www.digimarc.com/discover/".
[9]
L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1):59--70, 2007.
[10]
B. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315(5814):972--976, 2007.
[11]
GazoPa. http://www.gazopa.com/.
[12]
Google Goggles. http://www.google.com/mobile/goggles/.
[13]
G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. 2007.
[14]
M. Jia, X. Fan, X. Xie, M. Li, and W. Ma. Photo-to-Search: Using camera phones to inquire of the surrounding world. In Proc. of Mobile Data Management, 2006.
[15]
X. Li. Understanding the semantic structure of noun phrase queries. In Proc. of Annual Meeting of the Association for Computational Linguistics, pages 1337--1345, 2010.
[16]
LinkMe Mobile. http://www.snapnow.co.uk/.
[17]
Y. Liu, T. Mei, and X.-S. Hua. CrowdReranking: exploring multiple search engines for visual search reranking. In Proc. of ACM SIGIR conference on Research and Development in Information Retrieval, pages 500--507, 2009.
[18]
D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004.
[19]
Y.-F. Ma and H.-J. Zhang. Contrast-based image attention analysis by using fuzzy growing. In Proc. of ACM Multimedia, pages 374--381, Nov 2003.
[20]
G. Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39--41, 1995.
[21]
D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pages 2161--2168, 2006.
[22]
NOKIA Point and Find. http://pointandfind.nokia.com/.
[23]
L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proc. of the IEEE, 77(2):257--286, 1989.
[24]
S. Sekine, K. Sudo, and C. Nobata. Extended named entity hierarchy. In Proc. of LREC-2002, 2002.
[25]
Smart VisualTM Kooaba. http://www.kooaba.com/.
[26]
SnapTell. http://www.snaptell.com/.
[27]
G. Takacs, Y. Xiong, R. Grzeszczuk, and et al. Outdoors augmented reality on mobile phone using loxel-based visual feature organization. In Proc. of ACM International Conference on Multimedia Information Retrieval, pages 427--434, 2008.
[28]
TinEye. http://www.tineye.com/.
[29]
J. Wang and X.-S. Hua. Interactive image search by color map. ACM Trans. on Intelligent Systems and Technology, 3(1), 2012.
[30]
Xcavator. http://www.xcavator.net/.
[31]
H. Xu, J. Wang, X. Hua, and S. Li. Image search by concept map. In Proc. of ACM SIGIR conference on Research and development in information retrieval, pages 275--282, 2010.
[32]
W.-K. Yang, A. Cho, D.-S. Jeong, and W.-G. Oh. Image description and matching scheme for identical image searching. Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns, Computation World, pages 669--674, 2009.

Cited By

View all
  • (2024)Consumer Usage of Mobile Visual Search in ChinaJournal of Global Information Management10.4018/JGIM.34973132:1(1-29)Online publication date: 23-Jul-2024
  • (2020)Online social image ranking in diversified preferencesEURASIP Journal on Image and Video Processing10.1186/s13640-020-00540-42020:1Online publication date: 23-Nov-2020
  • (2020)Conversational User Interfaces on Mobile DevicesProceedings of the 2nd Conference on Conversational User Interfaces10.1145/3405755.3406130(1-11)Online publication date: 22-Jul-2020
  • Show More Cited By

Index Terms

  1. JIGSAW: interactive mobile visual search with multimodal queries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '11: Proceedings of the 19th ACM international conference on Multimedia
    November 2011
    944 pages
    ISBN:9781450306164
    DOI:10.1145/2072298
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 November 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. mobile visual search
    2. query formulation
    3. user interface

    Qualifiers

    • Research-article

    Conference

    MM '11
    Sponsor:
    MM '11: ACM Multimedia Conference
    November 28 - December 1, 2011
    Arizona, Scottsdale, USA

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Consumer Usage of Mobile Visual Search in ChinaJournal of Global Information Management10.4018/JGIM.34973132:1(1-29)Online publication date: 23-Jul-2024
    • (2020)Online social image ranking in diversified preferencesEURASIP Journal on Image and Video Processing10.1186/s13640-020-00540-42020:1Online publication date: 23-Nov-2020
    • (2020)Conversational User Interfaces on Mobile DevicesProceedings of the 2nd Conference on Conversational User Interfaces10.1145/3405755.3406130(1-11)Online publication date: 22-Jul-2020
    • (2019)Visual Arts Search on Mobile DevicesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/332633615:2s(1-23)Online publication date: 3-Jul-2019
    • (2017)Effective Fisher vector aggregation for 3D object retrieval2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952456(1747-1751)Online publication date: Mar-2017
    • (2017)Mobile multi-view object image searchMultimedia Tools and Applications10.1007/s11042-016-3659-976:10(12433-12456)Online publication date: 1-May-2017
    • (2016)Image Retargeting for Preserving Robust Local Feature: Application to Mobile Visual SearchIEEE Transactions on Multimedia10.1109/TMM.2015.250072718:1(128-137)Online publication date: Jan-2016
    • (2015)On-Device Mobile Landmark Recognition Using Binarized Descriptor with Multifeature FusionACM Transactions on Intelligent Systems and Technology10.1145/27952347:1(1-29)Online publication date: 7-Oct-2015
    • (2015)MoVieUp: Automatic Mobile Video MashupIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2015.241655425:12(1941-1954)Online publication date: Dec-2015
    • (2014)Multimedia search rerankingACM Computing Surveys10.1145/253679846:3(1-38)Online publication date: 1-Jan-2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media