research-article

JIGSAW: interactive mobile visual search with multimodal queries

Authors:

Shipeng LiAuthors Info & Claims

MM '11: Proceedings of the 19th ACM international conference on Multimedia

Pages 73 - 82

https://doi.org/10.1145/2072298.2072310

Published: 28 November 2011 Publication History

Abstract

The traditional text-based visual search has not been sufficiently improved over the years to accommodate the new emerging demand of mobile users. While on the go, searching on one's phone is becoming pervasive. This paper presents an innovative application for mobile phone users to facilitate their visual search experience. By taking advantage of smart phone functionalities such as multi-modal and multi-touch interactions, users can more conveniently formulate their search intent, and thus search performance can be significantly improved. The system, called JIGSAW (Joint search with ImaGe, Speech, And Words), represents one of the first attempts to create an interactive and multi-modal mobile visual search application. The key of JIGSAW is the composition of an exemplary image query generated from the raw speech via multi-touch user interaction, as well as the visual search based on the exemplary image. Through JIGSAW, users can formulate their search intent in a natural way like playing a jigsaw puzzle on the phone screen: 1) a user speaks a natural sentence as the query, 2) the speech is recognized and transferred to text which is further decomposed to keywords through entity extraction, 3) the user selects preferred exemplary images that can visually represent his/her intent and composes a query image via multi-touch, and 4) the composite image is then used as a visual query to search similar images. We have deployed JIGSAW on a real-world phone system, evaluated the performance on one million images, and demonstrated that it is an effective complement to existing mobile visual search applications.

References

[1]

http://www.pwc.com/gx/en/communications /review/features/mobile-data.jhtml.

[2]

H. Bay, T. Tuytelaars, and L. Van Gool. SURF: speeded-up robust features. In Proc. of ECCV, pages 346--359, 2008.

Digital Library

[3]

Y. Cao, H. Wang, C. Wang, Z. Li, L. Zhang, and L. Zhang. MindFinder: interactive sketch-based image search on millions of images. In Proc. of ACM International Conference on Multimedia, pages 1605--1608, 2010.

Digital Library

[4]

V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod. CHoG: Compressed histogram of gradients--a low bit-rate feature descriptor. Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pages 2504--2511, 2009.

[5]

V. R. Chandrasekhar, S. S. Tsai, G. Takacs, D. M. Chen, N. M. Cheung, Y. Reznik, R. Vedantham, R. Grzeszczuk, and B. Girod. Low latency image retrieval with progressive transmission of CHoG descriptors. In Proc. of ACM Multimedia Workshop on Mobile Cloud Media Computing, pages 41--46, 2010.

Digital Library

[6]

K. Church, B. Smyth, P. Cotter, and K. Bradley. Mobile information access: A study of emerging search behavior on the mobile internet. ACM Transactions on the Web, 1(1), May 2007.

Digital Library

[7]

J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. Imagenet: a large-scale hierarchical image database. 2009.

[8]

Digimarc Discover. "https://www.digimarc.com/discover/".

[9]

L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1):59--70, 2007.

Digital Library

[10]

B. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315(5814):972--976, 2007.

[11]

GazoPa. http://www.gazopa.com/.

[12]

Google Goggles. http://www.google.com/mobile/goggles/.

[13]

G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. 2007.

[14]

M. Jia, X. Fan, X. Xie, M. Li, and W. Ma. Photo-to-Search: Using camera phones to inquire of the surrounding world. In Proc. of Mobile Data Management, 2006.

Digital Library

[15]

X. Li. Understanding the semantic structure of noun phrase queries. In Proc. of Annual Meeting of the Association for Computational Linguistics, pages 1337--1345, 2010.

Digital Library

[16]

LinkMe Mobile. http://www.snapnow.co.uk/.

[17]

Y. Liu, T. Mei, and X.-S. Hua. CrowdReranking: exploring multiple search engines for visual search reranking. In Proc. of ACM SIGIR conference on Research and Development in Information Retrieval, pages 500--507, 2009.

Digital Library

[18]

D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004.

Digital Library

[19]

Y.-F. Ma and H.-J. Zhang. Contrast-based image attention analysis by using fuzzy growing. In Proc. of ACM Multimedia, pages 374--381, Nov 2003.

Digital Library

[20]

G. Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39--41, 1995.

Digital Library

[21]

D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pages 2161--2168, 2006.

Digital Library

[22]

NOKIA Point and Find. http://pointandfind.nokia.com/.

[23]

L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proc. of the IEEE, 77(2):257--286, 1989.

[24]

S. Sekine, K. Sudo, and C. Nobata. Extended named entity hierarchy. In Proc. of LREC-2002, 2002.

[25]

Smart VisualTM Kooaba. http://www.kooaba.com/.

[26]

SnapTell. http://www.snaptell.com/.

[27]

G. Takacs, Y. Xiong, R. Grzeszczuk, and et al. Outdoors augmented reality on mobile phone using loxel-based visual feature organization. In Proc. of ACM International Conference on Multimedia Information Retrieval, pages 427--434, 2008.

Digital Library

[28]

TinEye. http://www.tineye.com/.

[29]

J. Wang and X.-S. Hua. Interactive image search by color map. ACM Trans. on Intelligent Systems and Technology, 3(1), 2012.

Digital Library

[30]

Xcavator. http://www.xcavator.net/.

[31]

H. Xu, J. Wang, X. Hua, and S. Li. Image search by concept map. In Proc. of ACM SIGIR conference on Research and development in information retrieval, pages 275--282, 2010.

Digital Library

[32]

W.-K. Yang, A. Cho, D.-S. Jeong, and W.-G. Oh. Image description and matching scheme for identical image searching. Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns, Computation World, pages 669--674, 2009.

Digital Library

Cited By

Meng MLiu CHuang ZWang X(2024)Consumer Usage of Mobile Visual Search in ChinaJournal of Global Information Management10.4018/JGIM.34973132:1(1-29)Online publication date: 23-Jul-2024
https://doi.org/10.4018/JGIM.349731
Zhao XPei LLi TZhang Z(2020)Online social image ranking in diversified preferencesEURASIP Journal on Image and Video Processing10.1186/s13640-020-00540-42020:1Online publication date: 23-Nov-2020
https://doi.org/10.1186/s13640-020-00540-4
Jaber RMcMillan DTorres MSchlögl SClark LPorcheron M(2020)Conversational User Interfaces on Mobile DevicesProceedings of the 2nd Conference on Conversational User Interfaces10.1145/3405755.3406130(1-11)Online publication date: 22-Jul-2020
https://dl.acm.org/doi/10.1145/3405755.3406130
Show More Cited By

Index Terms

JIGSAW: interactive mobile visual search with multimodal queries
1. Human-centered computing
  1. Interaction design
    1. Interaction design process and methods
      1. User centered design

Recommendations

Intelligent query formulation for mobile visual search
MM '11: Proceedings of the 19th ACM international conference on Multimedia

While much progress is being made in mobile visual search, most efforts are on how to improve search performance (precision, recall, speed) given queries. How to help the user form a good query has generally left unexplored. Successful mobile search ...
A Preliminary Examination of the User Behavior in Query-by-Drawing Portrait Painting Search on Mobile Devices
MoMM 2015: Proceedings of the 13th International Conference on Advances in Mobile Computing and Multimedia

Although many researchers have studied the user behavior of using text-based information search engine, less is known about search pattern for mobile content-based image search. We developed a Query-by-Drawing (QbD) mobile application, and conducted a ...
Feature grouping and local soft match for mobile visual search

More powerful mobile devices stimulate mobile visual search to become a popular and unique image retrieval application. A number of challenges come up with such application, resulting from appearance variations in mobile images. Performance of state-of-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '11: Proceedings of the 19th ACM international conference on Multimedia

November 2011

944 pages

ISBN:9781450306164

DOI:10.1145/2072298

General Chairs:
K. Selçuk Candan
Arizona State University, USA
,
Sethuraman Panchanathan
Arizona State University, USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
,
Program Chairs:
Hari Sundaram
Arizona State University, USA
,
Wu-Chi Feng
Portland State University, USA
,
Nicu Sebe
University of Trento, Italy

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 November 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '11

Sponsor:

SIGMM

MM '11: ACM Multimedia Conference

November 28 - December 1, 2011

Arizona, Scottsdale, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
572
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Meng MLiu CHuang ZWang X(2024)Consumer Usage of Mobile Visual Search in ChinaJournal of Global Information Management10.4018/JGIM.34973132:1(1-29)Online publication date: 23-Jul-2024
https://doi.org/10.4018/JGIM.349731
Zhao XPei LLi TZhang Z(2020)Online social image ranking in diversified preferencesEURASIP Journal on Image and Video Processing10.1186/s13640-020-00540-42020:1Online publication date: 23-Nov-2020
https://doi.org/10.1186/s13640-020-00540-4
Jaber RMcMillan DTorres MSchlögl SClark LPorcheron M(2020)Conversational User Interfaces on Mobile DevicesProceedings of the 2nd Conference on Conversational User Interfaces10.1145/3405755.3406130(1-11)Online publication date: 22-Jul-2020
https://dl.acm.org/doi/10.1145/3405755.3406130
(2019)Visual Arts Search on Mobile DevicesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/332633615:2s(1-23)Online publication date: 3-Jul-2019
https://dl.acm.org/doi/10.1145/3326336
Boin JAraujo ABallan LGirod B(2017)Effective Fisher vector aggregation for 3D object retrieval2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952456(1747-1751)Online publication date: Mar-2017
https://doi.org/10.1109/ICASSP.2017.7952456
Çalışır FBaştan MUlusoy ÖGüdükbay U(2017)Mobile multi-view object image searchMultimedia Tools and Applications10.1007/s11042-016-3659-976:10(12433-12456)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1007/s11042-016-3659-9
Tan WYan BLi KTian Q(2016)Image Retargeting for Preserving Robust Local Feature: Application to Mobile Visual SearchIEEE Transactions on Multimedia10.1109/TMM.2015.250072718:1(128-137)Online publication date: Jan-2016
https://doi.org/10.1109/TMM.2015.2500727
Guan TWang YDuan LJi R(2015)On-Device Mobile Landmark Recognition Using Binarized Descriptor with Multifeature FusionACM Transactions on Intelligent Systems and Technology10.1145/27952347:1(1-29)Online publication date: 7-Oct-2015
https://dl.acm.org/doi/10.1145/2795234
Wu YMei TXu YYu NLi S(2015)MoVieUp: Automatic Mobile Video MashupIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2015.241655425:12(1941-1954)Online publication date: Dec-2015
https://doi.org/10.1109/TCSVT.2015.2416554
Mei TRui YLi STian Q(2014)Multimedia search rerankingACM Computing Surveys10.1145/253679846:3(1-38)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1145/2536798
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten