skip to main content
10.1145/1631272.1631283acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Using large-scale web data to facilitate textual query based retrieval of consumer photos

Published: 19 October 2009 Publication History

Abstract

The rapid popularization of digital cameras and mobile phone cameras has lead to an explosive growth of consumer photo collections. In this paper, we present a (quasi) real-time textual query based personal photo retrieval system by leveraging millions of web images and their associated rich textual descriptions (captions, categories, etc.). After a user provides a textual query (e.g., "pool"), our system exploits the inverted file method to automatically find the positive web images that are related to the textual query "pool" as well as the negative web images which are irrelevant to the textual query. Based on these automatically retrieved relevant and irrelevant web images, we employ two simple but effective classification methods, k Nearest Neighbor (kNN) and decision stumps, to rank personal consumer photos. To further improve the photo retrieval performance, we propose three new relevance feedback methods via cross-domain learning. These methods effectively utilize both the web images and the consumer images. In particular, our proposed cross-domain learning methods can learn robust classifiers with only a very limited amount of labeled consumer photos from the user by leveraging the pre-learned decision stumps at interactive response time. Extensive experiments on both consumer and professional stock photo datasets demonstrated the effectiveness and efficiency of our system, which is also inherently not limited by any predefined lexicon.

References

[1]
J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, 2007.
[2]
L. Cao, J. Luo, and T. S. Huang. Annotating photo collections by label propagation according to multiple similarity cues. In ACM MM, 2008.
[3]
S.-F. Chang et al. Large-scale multimodal semantic concept detection for consumer video. In ACM SIGMM Workshop on MIR, 2007.
[4]
S.-F. Chang et al. Columbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search. In NIST TRECVID Workshop, 2008.
[5]
T.-S. Chua et al. NUS-WIDE: A real-world web image database from national university of singapore. In CIVR, 2009.
[6]
R. Datta et al. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys,1--60, 2008.
[7]
H. Daume III. Frustratingly easy domain adaptation. In ACL, 2007.
[8]
L. Duan et al. Domain Transfer SVM for Video Concept Detection. In CVPR, 2009.
[9]
L. Duan et al. Domain Adaptation from Multiple Sources via Auxiliary Classifiers. In ICML, 2009.
[10]
P. Duygulu et al. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV, 2002.
[11]
C. Fellbaum. WordNet: An Electronic Lexical Database. Bradford Books, 1998.
[12]
R. Fergus, P. Perona,and A. Zisserman. A Visual Category Filter for Google Images. In ECCV, 2004.
[13]
J. He et al. Manifold-ranking based image retrieval. In ACM MM, 2004.
[14]
X. He. Incremental semi-supervised subspace learning for image retrieval. In ACM MM, 2004.
[15]
S. Hoi et al. Semi-supervised svm batch mode active learning for image retrieval. In CVPR, 2008.
[16]
J. Jia, N. Yu, and X.-S. Hua. Annotating personal albums via web mining. In ACM MM, 2008.
[17]
W. Jiang et al. Cross-domain learning methods for high-level visual concept classification. In ICIP, 2008.
[18]
J. Li and J. Z. Wang. Real-time computerized annotation of pictures. T-PAMI, 985--1002, 2008.
[19]
X. Li et al. Image annotation by large-scale content-based image retrieval. In ACM MM, 2006.
[20]
A. Loui et al. Kodak's consumer video benchmark data set: concept definition and annotation. In ACM Workshop on MIR, 2007.
[21]
Y. Rui, T. S. Huang, and S. Mehrotra. Content--based image retrieval with relevance feedback in mars. In ICIP, 1997.
[22]
A. Smeulders et al. Content-based image retrieval at the end of the early years. T-PAMI,1349--1380, 2000.
[23]
D. Tao et al. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. T-PAMI, 1088--1099, 2006.
[24]
S. Tong and E. Chang. Support vector machine active learning for image retrieval. In ACM MM, 2001.
[25]
A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: a large dataset for non-parametric object and scene recognition. T-PAMI, 1958--1970, 2008.
[26]
A. Torralba, R. Fergus, and Y. Weiss. Small codes and large databases for recognition. In CVPR, 2008.
[27]
P. Viola and M. Jones. Robust real-time face detection. IJCV, 137--154, 2004.
[28]
C. Wang et al. Content-based image annotation refinement. In CVPR, 2007.
[29]
C. Wang, L. Zhang, and H. Zhang. Learning to reduce the semantic gap in web image retrieval and annotation. In SIGIR, 2008.
[30]
J. Z. Wang, J. Li, and G. Wiederhold. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. T-PAMI, 947--963, 2001.
[31]
X. Wang et al. AnnoSearch: Image auto-annotation by search. In CVPR, 2006.
[32]
X. Wang et al. Annotating images by mining image search results. T-PAMI, 1919--1932, 2008.
[33]
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008.
[34]
I. H. Witten, A. Moffat, and T. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Kaufmann Publishers, 1999.
[35]
P. Wu and T. G. Dietterich. Improving SVM accuracy by training on auxiliary data sources. In ICML, 2004.
[36]
J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain video concept detection using adaptive SVMs. In ACM MM, 2007.
[37]
L. Zhang, F. Lin, and B. Zhang. Support vector machine learning for image retrieval. In ICIP, 2001.
[38]
X. Zhou and T. Huang. Small sample learning during multimedia retrieval using biasmap. In CVPR, 2001.

Cited By

View all
  • (2018)Visual understanding by mining social mediaFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6377-112:3(406-422)Online publication date: 1-Jun-2018
  • (2018)Training Visual-Semantic Embedding Network for Boosting Automatic Image AnnotationNeural Processing Letters10.1007/s11063-017-9753-948:3(1503-1519)Online publication date: 1-Dec-2018
  • (2016)Spectral Multimodal Hashing and Its Application to Multimedia RetrievalIEEE Transactions on Cybernetics10.1109/TCYB.2015.239205246:1(27-38)Online publication date: Jan-2016
  • Show More Cited By

Index Terms

  1. Using large-scale web data to facilitate textual query based retrieval of consumer photos

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '09: Proceedings of the 17th ACM international conference on Multimedia
    October 2009
    1202 pages
    ISBN:9781605586083
    DOI:10.1145/1631272
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross domain learning
    2. large-scale web data
    3. textual query based consumer photo retrieval

    Qualifiers

    • Research-article

    Conference

    MM09
    Sponsor:
    MM09: ACM Multimedia Conference
    October 19 - 24, 2009
    Beijing, China

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Visual understanding by mining social mediaFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6377-112:3(406-422)Online publication date: 1-Jun-2018
    • (2018)Training Visual-Semantic Embedding Network for Boosting Automatic Image AnnotationNeural Processing Letters10.1007/s11063-017-9753-948:3(1503-1519)Online publication date: 1-Dec-2018
    • (2016)Spectral Multimodal Hashing and Its Application to Multimedia RetrievalIEEE Transactions on Cybernetics10.1109/TCYB.2015.239205246:1(27-38)Online publication date: Jan-2016
    • (2015)Query-oriented unsupervised multi-document summarization via deep learning modelExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.05.03442:21(8146-8155)Online publication date: 30-Nov-2015
    • (2015)On-the-fly learning for visual search of large-scale image and video datasetsInternational Journal of Multimedia Information Retrieval10.1007/s13735-015-0077-04:2(75-93)Online publication date: 22-Mar-2015
    • (2015)Efficient On-the-fly Category Retrieval Using ConvNets and GPUsComputer Vision -- ACCV 201410.1007/978-3-319-16865-4_9(129-145)Online publication date: 16-Apr-2015
    • (2014)Image auto-annotation by exploiting web information2014 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2014.7025617(3052-3056)Online publication date: Oct-2014
    • (2014)A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their SemanticsInternational Journal of Computer Vision10.1007/s11263-013-0658-4106:2(210-233)Online publication date: 1-Jan-2014
    • (2012)Front MatterMultimedia Image and Video Processing, Second Edition10.1201/b11716-1(i-lvii)Online publication date: 5-Mar-2012
    • (2012)Long-Term Incremental Web-Supervised Learning of Visual Concepts via Random SavannasIEEE Transactions on Multimedia10.1109/TMM.2012.218695614:4(1008-1020)Online publication date: 1-Aug-2012
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media