research-article

Using large-scale web data to facilitate textual query based retrieval of consumer photos

Authors:

Jiebo LuoAuthors Info & Claims

MM '09: Proceedings of the 17th ACM international conference on Multimedia

Pages 55 - 64

https://doi.org/10.1145/1631272.1631283

Published: 19 October 2009 Publication History

Abstract

The rapid popularization of digital cameras and mobile phone cameras has lead to an explosive growth of consumer photo collections. In this paper, we present a (quasi) real-time textual query based personal photo retrieval system by leveraging millions of web images and their associated rich textual descriptions (captions, categories, etc.). After a user provides a textual query (e.g., "pool"), our system exploits the inverted file method to automatically find the positive web images that are related to the textual query "pool" as well as the negative web images which are irrelevant to the textual query. Based on these automatically retrieved relevant and irrelevant web images, we employ two simple but effective classification methods, k Nearest Neighbor (kNN) and decision stumps, to rank personal consumer photos. To further improve the photo retrieval performance, we propose three new relevance feedback methods via cross-domain learning. These methods effectively utilize both the web images and the consumer images. In particular, our proposed cross-domain learning methods can learn robust classifiers with only a very limited amount of labeled consumer photos from the user by leveraging the pre-learned decision stumps at interactive response time. Extensive experiments on both consumer and professional stock photo datasets demonstrated the effectiveness and efficiency of our system, which is also inherently not limited by any predefined lexicon.

References

[1]

J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, 2007.

[2]

L. Cao, J. Luo, and T. S. Huang. Annotating photo collections by label propagation according to multiple similarity cues. In ACM MM, 2008.

Digital Library

[3]

S.-F. Chang et al. Large-scale multimodal semantic concept detection for consumer video. In ACM SIGMM Workshop on MIR, 2007.

Digital Library

[4]

S.-F. Chang et al. Columbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search. In NIST TRECVID Workshop, 2008.

[5]

T.-S. Chua et al. NUS-WIDE: A real-world web image database from national university of singapore. In CIVR, 2009.

Digital Library

[6]

R. Datta et al. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys,1--60, 2008.

Digital Library

[7]

H. Daume III. Frustratingly easy domain adaptation. In ACL, 2007.

[8]

L. Duan et al. Domain Transfer SVM for Video Concept Detection. In CVPR, 2009.

[9]

L. Duan et al. Domain Adaptation from Multiple Sources via Auxiliary Classifiers. In ICML, 2009.

Digital Library

[10]

P. Duygulu et al. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV, 2002.

Digital Library

[11]

C. Fellbaum. WordNet: An Electronic Lexical Database. Bradford Books, 1998.

[12]

R. Fergus, P. Perona,and A. Zisserman. A Visual Category Filter for Google Images. In ECCV, 2004.

[13]

J. He et al. Manifold-ranking based image retrieval. In ACM MM, 2004.

Digital Library

[14]

X. He. Incremental semi-supervised subspace learning for image retrieval. In ACM MM, 2004.

Digital Library

[15]

S. Hoi et al. Semi-supervised svm batch mode active learning for image retrieval. In CVPR, 2008.

[16]

J. Jia, N. Yu, and X.-S. Hua. Annotating personal albums via web mining. In ACM MM, 2008.

Digital Library

[17]

W. Jiang et al. Cross-domain learning methods for high-level visual concept classification. In ICIP, 2008.

[18]

J. Li and J. Z. Wang. Real-time computerized annotation of pictures. T-PAMI, 985--1002, 2008.

Digital Library

[19]

X. Li et al. Image annotation by large-scale content-based image retrieval. In ACM MM, 2006.

Digital Library

[20]

A. Loui et al. Kodak's consumer video benchmark data set: concept definition and annotation. In ACM Workshop on MIR, 2007.

Digital Library

[21]

Y. Rui, T. S. Huang, and S. Mehrotra. Content--based image retrieval with relevance feedback in mars. In ICIP, 1997.

[22]

A. Smeulders et al. Content-based image retrieval at the end of the early years. T-PAMI,1349--1380, 2000.

Digital Library

[23]

D. Tao et al. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. T-PAMI, 1088--1099, 2006.

Digital Library

[24]

S. Tong and E. Chang. Support vector machine active learning for image retrieval. In ACM MM, 2001.

Digital Library

[25]

A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: a large dataset for non-parametric object and scene recognition. T-PAMI, 1958--1970, 2008.

Digital Library

[26]

A. Torralba, R. Fergus, and Y. Weiss. Small codes and large databases for recognition. In CVPR, 2008.

[27]

P. Viola and M. Jones. Robust real-time face detection. IJCV, 137--154, 2004.

Digital Library

[28]

C. Wang et al. Content-based image annotation refinement. In CVPR, 2007.

[29]

C. Wang, L. Zhang, and H. Zhang. Learning to reduce the semantic gap in web image retrieval and annotation. In SIGIR, 2008.

Digital Library

[30]

J. Z. Wang, J. Li, and G. Wiederhold. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. T-PAMI, 947--963, 2001.

Digital Library

[31]

X. Wang et al. AnnoSearch: Image auto-annotation by search. In CVPR, 2006.

Digital Library

[32]

X. Wang et al. Annotating images by mining image search results. T-PAMI, 1919--1932, 2008.

Digital Library

[33]

Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008.

Digital Library

[34]

I. H. Witten, A. Moffat, and T. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Kaufmann Publishers, 1999.

Digital Library

[35]

P. Wu and T. G. Dietterich. Improving SVM accuracy by training on auxiliary data sources. In ICML, 2004.

Digital Library

[36]

J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain video concept detection using adaptive SVMs. In ACM MM, 2007.

Digital Library

[37]

L. Zhang, F. Lin, and B. Zhang. Support vector machine learning for image retrieval. In ICIP, 2001.

[38]

X. Zhou and T. Huang. Small sample learning during multimedia retrieval using biasmap. In CVPR, 2001.

Cited By

Wang XLi ZTang J(2018)Visual understanding by mining social mediaFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6377-112:3(406-422)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11704-017-6377-1
Zhang WHu HHu H(2018)Training Visual-Semantic Embedding Network for Boosting Automatic Image AnnotationNeural Processing Letters10.1007/s11063-017-9753-948:3(1503-1519)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1007/s11063-017-9753-9
Zhen YGao YYeung DZha HLi X(2016)Spectral Multimodal Hashing and Its Application to Multimedia RetrievalIEEE Transactions on Cybernetics10.1109/TCYB.2015.239205246:1(27-38)Online publication date: Jan-2016
https://doi.org/10.1109/TCYB.2015.2392052
Show More Cited By

Index Terms

Using large-scale web data to facilitate textual query based retrieval of consumer photos
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Textual Query of Personal Photos Facilitated by Large-Scale Web Data

The rapid popularization of digital cameras and mobile phone cameras has led to an explosive growth of personal photo collections by consumers. In this paper, we present a real-time textual query-based personal photo retrieval system by leveraging ...
T-IRS: textual query based image retrieval system for consumer photos
MM '09: Proceedings of the 17th ACM international conference on Multimedia

In this demonstration, we present a (quasi) real-time textual query based image retrieval system (T-IRS) for consumer photos by leveraging millions of web images and their associated rich textual descriptions (captions, categories, etc.). After a user ...
Large-scale cross-media retrieval of WikipediaMM images with textual and visual query expansion
CLEF'08: Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access

In this paper, we present our approaches for the WikipediaMM task at ImageCLEF 2008. We first experimented with a text-based image retrieval approach with query expansion, where the extension terms were automatically selected from a knowledge base that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '09: Proceedings of the 17th ACM international conference on Multimedia

October 2009

1202 pages

ISBN:9781605586083

DOI:10.1145/1631272

General Chairs:
Wen Gao
Peking University, China
,
Yong Rui
Microsoft, China
,
Alan Hanjalic
Delft University of Technology, The Netherlands
,
Program Chairs:
Changsheng Xu
Institute of Automation, Chinese Academy of Sciences, China
,
Eckehard Steinbach
Technical University of Munich, Germany
,
Abdulmotaleb El Saddik
University of Ottawa, Canada
,
Michelle Zhou
IBM T. J. Watson Research Center, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM09

Sponsor:

SIGMM

MM09: ACM Multimedia Conference

October 19 - 24, 2009

Beijing, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
492
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang XLi ZTang J(2018)Visual understanding by mining social mediaFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6377-112:3(406-422)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11704-017-6377-1
Zhang WHu HHu H(2018)Training Visual-Semantic Embedding Network for Boosting Automatic Image AnnotationNeural Processing Letters10.1007/s11063-017-9753-948:3(1503-1519)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1007/s11063-017-9753-9
Zhen YGao YYeung DZha HLi X(2016)Spectral Multimodal Hashing and Its Application to Multimedia RetrievalIEEE Transactions on Cybernetics10.1109/TCYB.2015.239205246:1(27-38)Online publication date: Jan-2016
https://doi.org/10.1109/TCYB.2015.2392052
Zhong SLiu YLi BLong J(2015)Query-oriented unsupervised multi-document summarization via deep learning modelExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.05.03442:21(8146-8155)Online publication date: 30-Nov-2015
https://dl.acm.org/doi/10.1016/j.eswa.2015.05.034
Chatfield KArandjelović RParkhi OZisserman A(2015)On-the-fly learning for visual search of large-scale image and video datasetsInternational Journal of Multimedia Information Retrieval10.1007/s13735-015-0077-04:2(75-93)Online publication date: 22-Mar-2015
https://doi.org/10.1007/s13735-015-0077-0
Chatfield KSimonyan KZisserman A(2015)Efficient On-the-fly Category Retrieval Using ConvNets and GPUsComputer Vision -- ACCV 201410.1007/978-3-319-16865-4_9(129-145)Online publication date: 16-Apr-2015
https://doi.org/10.1007/978-3-319-16865-4_9
Jhuo IWeng L(2014)Image auto-annotation by exploiting web information2014 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2014.7025617(3052-3056)Online publication date: Oct-2014
https://doi.org/10.1109/ICIP.2014.7025617
Gong YKe QIsard MLazebnik S(2014)A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their SemanticsInternational Journal of Computer Vision10.1007/s11263-013-0658-4106:2(210-233)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1007/s11263-013-0658-4
(2012)Front MatterMultimedia Image and Video Processing, Second Edition10.1201/b11716-1(i-lvii)Online publication date: 5-Mar-2012
https://doi.org/10.1201/b11716-1
Ewerth RBallafkir KMuhling MSeiler DFreisleben B(2012)Long-Term Incremental Web-Supervised Learning of Visual Concepts via Random SavannasIEEE Transactions on Multimedia10.1109/TMM.2012.218695614:4(1008-1020)Online publication date: 1-Aug-2012
https://dl.acm.org/doi/10.1109/TMM.2012.2186956
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten