research-article

Relevance filtering meets active learning: improving web-based concept detectors

Authors:

Thomas M. BreuelAuthors Info & Claims

MIR '10: Proceedings of the international conference on Multimedia information retrieval

Pages 25 - 34

https://doi.org/10.1145/1743384.1743397

Published: 29 March 2010 Publication History

Abstract

We address the challenge of training visual concept detectors on web video as available from portals such as YouTube. In contrast to high-quality but small manually acquired training sets, this setup permits us to scale up concept detection to very large training sets and concept vocabularies. On the downside, web tags are only weak indicators of concept presence, and web video training data contains lots of non-relevant content.

So far, there are two general strategies to overcome this label noise problem, both targeted at discarding non-relevant training content: (1) a manual refinement supported by active learning sample selection, (2) an automatic refinement using relevance filtering. In this paper, we present a highly efficient approach combining these two strategies in an interleaved setup: manually refined samples are directly used to improve relevance filtering, which again provides a good basis for the next active learning sample selection.

Our results demonstrate that the proposed combination -- called active relevance filtering -- outperforms both a purely automatic filtering and a manual one based on active learning. For example, by using 50 manual labels per concept, an improvement of 5% over an automatic filtering is achieved, and 6% over active learning. By annotating only 25% of weak positive samples in the training set, a performance comparable to training on ground truth labels is reached.

References

[1]

S. Ayache and G. Quenot. Evaluation of active learning strategies for video indexing. Signal Processing: Image Communication, 22(7--8):692--704, 2007.

Digital Library

[2]

S. Ayache and G. Quenot. TRECVID 2007 Collaborative Annotation using Active Learning. In Proc. TRECVID Workshop, November 2007.

[3]

S. Ayache and G. Quenot. Video Corpus Annotation using Active Learning. In Proc. Europ. Conf. on Information Retrieval, pages 187--198, March 2008.

Digital Library

[4]

M. Campbell, A. Haubold, M. Liu, A. Natsev, J. Smith, J. Tesic, L. Xie, R. Yan, and J. Yang. IBM Research TRECVID-2007 Video Retrieval System. In Proc. TRECVID Workshop, November 2007.

[5]

M. Chen, M. Christel, A. Hauptmann, and H. Wactlar. Putting Active Learning into Multimedia Applications: Dynamic Definition and Refinement of Concept Classifiers. In Proc. Int. Conf. on Multimedia, pages 902--911, November 2005.

Digital Library

[6]

A. Dempster, N. Laird, and D. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1--38, 1977.

[7]

R. Duda, P. Hart, and D. Stork. Pattern Classification (2nd Edition). Wiley-Interscience, 2000.

Digital Library

[8]

R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning Object Categories from Google's Image Search. Computer Vision, 2:1816--1823, 2005.

Digital Library

[9]

U. Gargi and J. Yagnik. Solving the Label Resolution Problem in Supervised Video Content Classification. In Proc. Int. Conf. on Multimedia Retrieval, pages 276--282, October 2008.

Digital Library

[10]

L. Kennedy, S.-F. Chang, and I. Kozintsev. To Search or to Label?: Predicting the Performance of Search-based Automatic Image Classifiers. In Int. Workshop Multimedia Information Retrieval, pages 249--258, October 2006.

Digital Library

[11]

D. Lewis and W. Gale. A Sequential Algorithm for Training Text Classifiers. In Proc. Int. Conf. Research and Development in Information Retrieval, pages 3--12, July 1994.

Digital Library

[12]

L.-J. Li, G. Wang, and L. Fei-Fei. OPTIMOL: automatic Object Picture collecTion via Incremental MOdel Learning. In Proc. Int. Conf. Computer Vision and Pattern Recognition, pages 57--64, June 2007.

[13]

X. Li, C. Snoek, and M. Worring. Learning Tag Relevance by Neighbor Voting for Social Image Retrieval. In Proc. Int. Conf. on Multimedia Information Retrieval, pages 180--187, October 2008.

Digital Library

[14]

D. Lowe. Object Recognition from Local Scale-Invariant Features. In Int. Conf. Computer Vision, pages 1150--1157, September 1999.

Digital Library

[15]

P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez, and T. Tuytelaars. A Thousand Words in a Scene. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(9):1575--1589, 2007.

Digital Library

[16]

G. Salton and C. Buckley. Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Science, 41(4):288--297, 1990.

[17]

B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001.

Digital Library

[18]

F. Schroff, A. Criminisi, and A. Zisserman. Harvesting Image Databases from the Web. In Proc. Int. Conf. Computer Vision, pages 1--8, October 2007.

[19]

B. Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, 2009.

[20]

A. Setz and C. Snoek. Can Social Tagged Images Aid Concept-Based Video Search? In Proc. Int. Conf. on Multimedia and Expo, pages 1460--1463, 2009.

Digital Library

[21]

J. Sivic and A. Zisserman. Video Google: Efficient Visual Search of Videos. In Toward Category-Level Object Recognition, pages 127--144. Springer-Verlag New York, Inc., 2006.

[22]

A. Smeulders, M. Worring, S. Santini, and A. Gupta R. Jain. Content-Based Image Retrieval at the End of the Early Years. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000.

Digital Library

[23]

C. Snoek and M. Worring. Concept-based Video Retrieval. Foundations and Trends in Information Retrieval, 4(2):215--322, 2009.

Digital Library

[24]

Y. Sun, S. Shimada, Y. Taniguchi, and A. Kojima. A Novel Region-based Approach to Visual Concept Modeling using Web Images. In Int. Conf. Multimedia, pages 635--638, October 2008.

Digital Library

[25]

A. Ulges, M. Koch, C. Schulze, and T. Breuel. Learning TRECVID'08 High-level Features from YouTubeTM. In Proc. TRECVID Workshop, November 2008.

[26]

A. Ulges, C. Schulze, D. Keysers, and T. Breuel. Identifying Relevant Frames in Weakly Labeled Videos for Training Concept Detectors. In Proc. Int. Conf. Image and Video Retrieval, pages 9--16, July 2008.

Digital Library

[27]

A. Ulges, C. Schulze, M. Koch, and T. Breuel. Learning Automatic Concept Detectors from Online Video. Comp. Vis. Img. Underst., 2009.

[28]

YouTube Serves up 100 Million Videos a Day Online. in USA Today (Garnnett Company, Inc.); available from http://www.usatoday.com/tech/news/2006-07-16-youtube-views_x.htm (retrieved: Sep'08), July 2006.

[29]

K. van de Sande, T. Gevers, and C. Snoek. A Comparison of Color Features for Visual Concept Classification. In Proc. Int. Conf. Image and Video Retrieval, pages 141--150, July 2008.

Digital Library

[30]

D. Wang, X. Liu, L. Luo, J. Li, and B. Zhang. Video Diver: Generic Video Indexing with Diverse Features. In Proc. Int. Workshop Multimedia Information Retrieval, pages 61--70, September 2007.

Digital Library

[31]

M. Wang, X.-S. Hua, Y. Song, X. Yuan, S. Li, and H.-J. Zhang. Automatic Video Annotation by Semi-supervised Learning with Kernel Density Estimation. In Proc. Int. Conf. on Multimedia, pages 967--976, October 2006.

Digital Library

[32]

K. Wnuk and S. Soatto. Filtering Internet Image Search Results Towards Keyword Based Category Recognition. In Proc. Int. Conf. Computer Vision and Pattern Recognition, pages 1--8, June 2008.

[33]

A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts. Technical report, Columbia University, 2007.

[34]

K. Yanai and K. Barnard. Probabilistic Web Image Gathering. In Int. Workshop on Multimedia Inf. Retrieval, pages 57--64, November 2005.

Digital Library

[35]

A. Yavlinsky, E. Schofield, and S. Rüger. Automated Image Annotation using Global Features and Robust Nonparametric Density Estimation. In Proc. Int. Conf. Image and Video Retrieval, pages 507--517, July 2005.

Digital Library

Cited By

Blandfort PKarayil TBorth DDengel AAlameda-Pineda XRedi MSoleymani MSebe NChang SGosling S(2017)Image Captioning in the WildProceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes10.1145/3132515.3132522(21-29)Online publication date: 27-Oct-2017
https://dl.acm.org/doi/10.1145/3132515.3132522
Li HYi LLiu BWang Y(2014)Localizing relevant frames in web videos using topic model and relevance filteringMachine Vision and Applications10.1007/s00138-013-0537-625:7(1661-1670)Online publication date: 1-Oct-2014
https://dl.acm.org/doi/10.1007/s00138-013-0537-6
Yi LLi HNeo S(2013)Combining Topic Model and Relevance Filtering to Localize Relevant Frames in Web VideosAdvances in Multimedia Modeling10.1007/978-3-642-35728-2_20(206-216)Online publication date: 2013
https://doi.org/10.1007/978-3-642-35728-2_20
Show More Cited By

Index Terms

Relevance filtering meets active learning: improving web-based concept detectors
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Learning automatic concept detectors from online video

Concept detection is targeted at automatically labeling video content with semantic concepts appearing in it, like objects, locations, or activities. While concept detectors have become key components in many research prototypes for content-based video ...
Adapting appearance models of semantic concepts to particular videos via transductive learning
MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrieval

The detection of high-level concepts in video data is an essential processing step of a video retrieval system. The meaning and the appearance of certain events or concepts are strongly related to contextual information. For example, the appearance of ...
Transfer active learning
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Active learning traditionally assumes that labeled and unlabeled samples are subject to the same distributions and the goal of an active learner is to label the most informative unlabeled samples. In reality, situations may exist that we may not have ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MIR '10: Proceedings of the international conference on Multimedia information retrieval

March 2010

600 pages

ISBN:9781605588155

DOI:10.1145/1743384

General Chairs:
James Z. Wang
The Pennsylvania State University, USA
,
Nozha Boujemaa
INRIA, France
,
Program Chairs:
Nuria Oliver Ramirez
Telefonica Research, Spain
,
Apostol Natsev
IBM Research, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MIR '10

Sponsor:

SIGMM

MIR '10: International Conference on Multimedia Information Retrieval

March 29 - 31, 2010

Pennsylvania, Philadelphia, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
263
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Blandfort PKarayil TBorth DDengel AAlameda-Pineda XRedi MSoleymani MSebe NChang SGosling S(2017)Image Captioning in the WildProceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes10.1145/3132515.3132522(21-29)Online publication date: 27-Oct-2017
https://dl.acm.org/doi/10.1145/3132515.3132522
Li HYi LLiu BWang Y(2014)Localizing relevant frames in web videos using topic model and relevance filteringMachine Vision and Applications10.1007/s00138-013-0537-625:7(1661-1670)Online publication date: 1-Oct-2014
https://dl.acm.org/doi/10.1007/s00138-013-0537-6
Yi LLi HNeo S(2013)Combining Topic Model and Relevance Filtering to Localize Relevant Frames in Web VideosAdvances in Multimedia Modeling10.1007/978-3-642-35728-2_20(206-216)Online publication date: 2013
https://doi.org/10.1007/978-3-642-35728-2_20
Li XSnoek CWorring MSmeulders AIp HRui Y(2012)Fusing concept detection and geo context for visual searchProceedings of the 2nd ACM International Conference on Multimedia Retrieval10.1145/2324796.2324801(1-8)Online publication date: 5-Jun-2012
https://dl.acm.org/doi/10.1145/2324796.2324801
Zhu SNgo CJiang Y(2012)Sampling and Ontologically Pooling Web Images for Visual Concept LearningIEEE Transactions on Multimedia10.1109/TMM.2012.219038714:4(1068-1078)Online publication date: 1-Aug-2012
https://dl.acm.org/doi/10.1109/TMM.2012.2190387
Borth DUlges ABreuel TCandan KPanchanathan SPrabhakaran BSundaram HFeng WSebe N(2011)Automatic concept-to-query mapping for web-based concept detector trainingProceedings of the 19th ACM international conference on Multimedia10.1145/2072298.2072038(1453-1456)Online publication date: 28-Nov-2011
https://dl.acm.org/doi/10.1145/2072298.2072038
Weilong Yang Toderici G(2011)Discriminative tag learning on YouTube videos with latent sub-tagsProceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2011.5995402(3217-3224)Online publication date: 20-Jun-2011
https://dl.acm.org/doi/10.1109/CVPR.2011.5995402

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents