Article

Learning the semantics of multimedia queries and concepts from a small number of examples

Authors:
Apostol (Paul) Natsev

IBM Watson Research Center, Hawthorne, NY

IBM Watson Research Center, Hawthorne, NY
View Profile

,
Milind R. Naphade

IBM Watson Research Center, Hawthorne, NY

IBM Watson Research Center, Hawthorne, NY
View Profile

,
Jelena TešiĆ

IBM Watson Research Center, Hawthorne, NY

IBM Watson Research Center, Hawthorne, NY
View Profile

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on MultimediaNovember 2005Pages 598–607https://doi.org/10.1145/1101149.1101288

Published:06 November 2005Publication History

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

Pages 598–607

ABSTRACT

In this paper we unify two supposedly distinct tasks in multimedia retrieval. One task involves answering queries with a few examples. The other involves learning models for semantic concepts, also with a few examples. In our view these two tasks are identical with the only differentiation being the number of examples that are available for training. Once we adopt this unified view, we then apply identical techniques for solving both problems and evaluate the performance using the NIST TRECVID benchmark evaluation data [15]. We propose a combination hypothesis of two complementary classes of techniques, a nearest neighbor model using only positive examples and a discriminative support vector machine model using both positive and negative examples. In case of queries, where negative examples are rarely provided to seed the search, we create pseudo-negative samples. We then combine the ranked lists generated by evaluating the test database using both methods, to create a final ranked list of retrieved multimedia items. We evaluate this approach for rare concept and query topic modeling using the NIST TRECVID video corpus.In both tasks we find that applying the combination hypothesis across both modeling techniques and a variety of features results in enhanced performance over any of the baseline models, as well as in improved robustness with respect to training examples and visual features. In particular, we observe an improvement of 6% for rare concept detection and 17% for the search task.

References

TREC Video Retrieval. National Institute of Standards and Technology, http://www-nlpir.nist.gov/projects/trecvid/.]]Google Scholar
K. Chakrabarti, K. Porkaew, and S. Mehrotra. Efficient query refinement in multimedia databases. In Proc. 16th Intl. Conf. on Data Engineering (ICDE'00), San Diego, CA, Feb. 28--Mar 3 2000.]] Google ScholarDigital Library
T. S. Chua, S.-Y. Neo, K.-Y. Li, G. Wang, R. Shi, M. Zhao, and H. Xu. TREC VID 2004 search and feature extraction task by NUSPRIS. In TRECVID 2004 Workshop, Gaithersburg, MD, Nov. 2004.]]Google Scholar
A. Gupta, T. E. Weymouth, and R. Jain. Semantic queries with pictures: the VIMSYS model. In Intl. Conf. on Very Large Databases (VLDB), pages 69--70, Sep. 1991.]] Google ScholarDigital Library
A. Hauptmann and M. Christel. Successful approaches in the T R E C video retrieval evaluations. In A C M Multimedia, New York, NY, Nov 2004.]] Google ScholarDigital Library
Y. Ishikawa, R. Subramanya, and C. Faloutsos. Mind R eader: Querying databases through multiple examples. In Proc. of the 24th Intl. Conference on Very Large Databases (VLDB'98), pages 218--227, 1998.]] Google ScholarDigital Library
L. Kennedy, A. Natsev, and S.-F. Chang. Automatic discovery of query-class-dependent models for multimodal search. In ACM Multimedia 2005, Singapore, Nov. 2005.]] Google ScholarDigital Library
C. Lin, B. Tseng, and J. Smith. Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. In Proc. Text Retrieval Conference (TREC), Gaithersburg, MD, Nov 2003.]]Google Scholar
M. Naphade, J. Smith, and F. Souvannavong. On the detection of semantic concepts at TREC VID. In A C M Multimedia, New York, NY, Nov 2004.]] Google ScholarDigital Library
M. R. Naphade, S. Basu, J. Smith, C. Y. Lin, and B. Tseng. Modeling semantic concepts to support query by keywords in video. In Proc. IEEE Intl. Conference on Image Processing (ICIP'02), Rochester, NY, Sep. 2002.]]Google ScholarCross Ref
S. Nepal and M. V. Ramakrishna. Single feature query by multi examples in image databases. In Proc. SPIE Photonics East Intl. Symposium on Voice, Data and Communications, volume 4210, pages 424--435, 2000.]]Google Scholar
K. Porkaew, S. Mehrotra, M. Ortega, and K. Chakrabarti. Similarity search using multiple examples in MARS. In Intl. Conf. on Visual Information Systems (VISUAL'99), pages 68--75, 1999.]] Google ScholarDigital Library
Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra. Relevance feedback: A power tool for interactive content-based image retrieval. IEEE Trans. on Circuits and Systems for Video Technology, 8:644--656, Sep. 1998.]]Google ScholarDigital Library
R. Singh and R. Kothari. Relevance feedback algorithm based on learning from labeled and unlabeled data. In IEEE ICME 2003, Baltimore, MD, July 2003.]] Google ScholarDigital Library
A. Smeaton, P. Over, and W. Kraaij. TRECVID evaluating the effectiveness of information retrieval tasks on digital video. In ACM Multimedia, New York, NY, Nov 2004.]] Google ScholarDigital Library
D. M. J. Tax. One-Class Classification: Concept-Learning in the Absence of Counter-Examples. PhD thesis, Delft University of Technology, June 2001.]]Google Scholar
S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proc. ACM Intl. Conf. on Multimedia, Oct. 2001.]] Google ScholarDigital Library
V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.]] Google ScholarCross Ref
J. Wang and J. Li. Learning-based linguistic indexing of pictures with 2-D MHMMs. In ACM Intl. Conf. Multimedia (ACMMM), Juan Les Pin, France, Dec. 2002.]] Google ScholarDigital Library
T. Westerveld and A. P. de Vries. Multimedia retrieval using multiple examples. In CIVR, pages 344--352, 2004.]]Google ScholarCross Ref
R. Yan and A. Hauptmann. Negative pseudo-relevance feedback in content based video retrieval. In ACM Multimedia, Berkeley, CA, Nov 2003.]] Google ScholarDigital Library
R. Yan, J. Yang, and A. Hauptmann. Learning query class-dependent weights in automatic video retrieval. In ACM Multimedia 2004, New York, NY, Oct. 2004.]] Google ScholarDigital Library

Index Terms

Learning the semantics of multimedia queries and concepts from a small number of examples
1. Information systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Multiview Semi-Supervised Learning with Consensus

Obtaining high-quality and up-to-date labeled data can be difficult in many real-world machine learning applications. Semi-supervised learning aims to improve the performance of a classifier trained with limited number of labeled data by utilizing the ...
Read More
A pairwise ranking based approach to learning with positive and unlabeled examples
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

A large fraction of binary classification problems arising in web applications are of the type where the positive class is well defined and compact while the negative class comprises everything else in the distribution for which the classifier is ...
Read More
News video retrieval by learning multimodal semantic information
VISUAL'07: Proceedings of the 9th international conference on Advances in visual information systems

With the explosion of multimedia data especially that of video data, requirement of efficient video retrieval has becoming more and more important. Years of TREC Video Retrieval Evaluation (TRECVID) research gives benchmark for video search task. The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia
November 2005
1110 pages
ISBN:1595930442
DOI:10.1145/1101149
General Chairs:
Hongjiang Zhang
Microsoft Research Asia, China
,
Tat-Seng Chua
National University of Singapore, Singapore
,
Program Chairs:
Ralf Steinmetz
Technische Universitat Darmstadt, Germany
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Lynn Wilcox
FXPAL
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MECBR
TRECVID
semantics
support vector machines
Qualifiers
- Article
Conference

Acceptance Rates
MULTIMEDIA '05 Paper Acceptance Rate49of312submissions,16%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 98
  Total Citations
  View Citations
- 1,074
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning the semantics of multimedia queries and concepts from a small number of examples

MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multiview Semi-Supervised Learning with Consensus

A pairwise ranking based approach to learning with positive and unlabeled examples

News video retrieval by learning multimodal semantic information