skip to main content
10.1145/1743384.1743397acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Relevance filtering meets active learning: improving web-based concept detectors

Published: 29 March 2010 Publication History

Abstract

We address the challenge of training visual concept detectors on web video as available from portals such as YouTube. In contrast to high-quality but small manually acquired training sets, this setup permits us to scale up concept detection to very large training sets and concept vocabularies. On the downside, web tags are only weak indicators of concept presence, and web video training data contains lots of non-relevant content.
So far, there are two general strategies to overcome this label noise problem, both targeted at discarding non-relevant training content: (1) a manual refinement supported by active learning sample selection, (2) an automatic refinement using relevance filtering. In this paper, we present a highly efficient approach combining these two strategies in an interleaved setup: manually refined samples are directly used to improve relevance filtering, which again provides a good basis for the next active learning sample selection.
Our results demonstrate that the proposed combination -- called active relevance filtering -- outperforms both a purely automatic filtering and a manual one based on active learning. For example, by using 50 manual labels per concept, an improvement of 5% over an automatic filtering is achieved, and 6% over active learning. By annotating only 25% of weak positive samples in the training set, a performance comparable to training on ground truth labels is reached.

References

[1]
S. Ayache and G. Quenot. Evaluation of active learning strategies for video indexing. Signal Processing: Image Communication, 22(7--8):692--704, 2007.
[2]
S. Ayache and G. Quenot. TRECVID 2007 Collaborative Annotation using Active Learning. In Proc. TRECVID Workshop, November 2007.
[3]
S. Ayache and G. Quenot. Video Corpus Annotation using Active Learning. In Proc. Europ. Conf. on Information Retrieval, pages 187--198, March 2008.
[4]
M. Campbell, A. Haubold, M. Liu, A. Natsev, J. Smith, J. Tesic, L. Xie, R. Yan, and J. Yang. IBM Research TRECVID-2007 Video Retrieval System. In Proc. TRECVID Workshop, November 2007.
[5]
M. Chen, M. Christel, A. Hauptmann, and H. Wactlar. Putting Active Learning into Multimedia Applications: Dynamic Definition and Refinement of Concept Classifiers. In Proc. Int. Conf. on Multimedia, pages 902--911, November 2005.
[6]
A. Dempster, N. Laird, and D. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1--38, 1977.
[7]
R. Duda, P. Hart, and D. Stork. Pattern Classification (2nd Edition). Wiley-Interscience, 2000.
[8]
R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning Object Categories from Google's Image Search. Computer Vision, 2:1816--1823, 2005.
[9]
U. Gargi and J. Yagnik. Solving the Label Resolution Problem in Supervised Video Content Classification. In Proc. Int. Conf. on Multimedia Retrieval, pages 276--282, October 2008.
[10]
L. Kennedy, S.-F. Chang, and I. Kozintsev. To Search or to Label?: Predicting the Performance of Search-based Automatic Image Classifiers. In Int. Workshop Multimedia Information Retrieval, pages 249--258, October 2006.
[11]
D. Lewis and W. Gale. A Sequential Algorithm for Training Text Classifiers. In Proc. Int. Conf. Research and Development in Information Retrieval, pages 3--12, July 1994.
[12]
L.-J. Li, G. Wang, and L. Fei-Fei. OPTIMOL: automatic Object Picture collecTion via Incremental MOdel Learning. In Proc. Int. Conf. Computer Vision and Pattern Recognition, pages 57--64, June 2007.
[13]
X. Li, C. Snoek, and M. Worring. Learning Tag Relevance by Neighbor Voting for Social Image Retrieval. In Proc. Int. Conf. on Multimedia Information Retrieval, pages 180--187, October 2008.
[14]
D. Lowe. Object Recognition from Local Scale-Invariant Features. In Int. Conf. Computer Vision, pages 1150--1157, September 1999.
[15]
P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez, and T. Tuytelaars. A Thousand Words in a Scene. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(9):1575--1589, 2007.
[16]
G. Salton and C. Buckley. Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Science, 41(4):288--297, 1990.
[17]
B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001.
[18]
F. Schroff, A. Criminisi, and A. Zisserman. Harvesting Image Databases from the Web. In Proc. Int. Conf. Computer Vision, pages 1--8, October 2007.
[19]
B. Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, 2009.
[20]
A. Setz and C. Snoek. Can Social Tagged Images Aid Concept-Based Video Search? In Proc. Int. Conf. on Multimedia and Expo, pages 1460--1463, 2009.
[21]
J. Sivic and A. Zisserman. Video Google: Efficient Visual Search of Videos. In Toward Category-Level Object Recognition, pages 127--144. Springer-Verlag New York, Inc., 2006.
[22]
A. Smeulders, M. Worring, S. Santini, and A. Gupta R. Jain. Content-Based Image Retrieval at the End of the Early Years. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000.
[23]
C. Snoek and M. Worring. Concept-based Video Retrieval. Foundations and Trends in Information Retrieval, 4(2):215--322, 2009.
[24]
Y. Sun, S. Shimada, Y. Taniguchi, and A. Kojima. A Novel Region-based Approach to Visual Concept Modeling using Web Images. In Int. Conf. Multimedia, pages 635--638, October 2008.
[25]
A. Ulges, M. Koch, C. Schulze, and T. Breuel. Learning TRECVID'08 High-level Features from YouTubeTM. In Proc. TRECVID Workshop, November 2008.
[26]
A. Ulges, C. Schulze, D. Keysers, and T. Breuel. Identifying Relevant Frames in Weakly Labeled Videos for Training Concept Detectors. In Proc. Int. Conf. Image and Video Retrieval, pages 9--16, July 2008.
[27]
A. Ulges, C. Schulze, M. Koch, and T. Breuel. Learning Automatic Concept Detectors from Online Video. Comp. Vis. Img. Underst., 2009.
[28]
YouTube Serves up 100 Million Videos a Day Online. in USA Today (Garnnett Company, Inc.); available from http://www.usatoday.com/tech/news/2006-07-16-youtube-views_x.htm (retrieved: Sep'08), July 2006.
[29]
K. van de Sande, T. Gevers, and C. Snoek. A Comparison of Color Features for Visual Concept Classification. In Proc. Int. Conf. Image and Video Retrieval, pages 141--150, July 2008.
[30]
D. Wang, X. Liu, L. Luo, J. Li, and B. Zhang. Video Diver: Generic Video Indexing with Diverse Features. In Proc. Int. Workshop Multimedia Information Retrieval, pages 61--70, September 2007.
[31]
M. Wang, X.-S. Hua, Y. Song, X. Yuan, S. Li, and H.-J. Zhang. Automatic Video Annotation by Semi-supervised Learning with Kernel Density Estimation. In Proc. Int. Conf. on Multimedia, pages 967--976, October 2006.
[32]
K. Wnuk and S. Soatto. Filtering Internet Image Search Results Towards Keyword Based Category Recognition. In Proc. Int. Conf. Computer Vision and Pattern Recognition, pages 1--8, June 2008.
[33]
A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts. Technical report, Columbia University, 2007.
[34]
K. Yanai and K. Barnard. Probabilistic Web Image Gathering. In Int. Workshop on Multimedia Inf. Retrieval, pages 57--64, November 2005.
[35]
A. Yavlinsky, E. Schofield, and S. Rüger. Automated Image Annotation using Global Features and Robust Nonparametric Density Estimation. In Proc. Int. Conf. Image and Video Retrieval, pages 507--517, July 2005.

Cited By

View all
  • (2017)Image Captioning in the WildProceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes10.1145/3132515.3132522(21-29)Online publication date: 27-Oct-2017
  • (2014)Localizing relevant frames in web videos using topic model and relevance filteringMachine Vision and Applications10.1007/s00138-013-0537-625:7(1661-1670)Online publication date: 1-Oct-2014
  • (2013)Combining Topic Model and Relevance Filtering to Localize Relevant Frames in Web VideosAdvances in Multimedia Modeling10.1007/978-3-642-35728-2_20(206-216)Online publication date: 2013
  • Show More Cited By

Index Terms

  1. Relevance filtering meets active learning: improving web-based concept detectors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MIR '10: Proceedings of the international conference on Multimedia information retrieval
      March 2010
      600 pages
      ISBN:9781605588155
      DOI:10.1145/1743384
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 29 March 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. concept detection
      2. content-based video retrieval

      Qualifiers

      • Research-article

      Conference

      MIR '10
      Sponsor:
      MIR '10: International Conference on Multimedia Information Retrieval
      March 29 - 31, 2010
      Pennsylvania, Philadelphia, USA

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 21 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Image Captioning in the WildProceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes10.1145/3132515.3132522(21-29)Online publication date: 27-Oct-2017
      • (2014)Localizing relevant frames in web videos using topic model and relevance filteringMachine Vision and Applications10.1007/s00138-013-0537-625:7(1661-1670)Online publication date: 1-Oct-2014
      • (2013)Combining Topic Model and Relevance Filtering to Localize Relevant Frames in Web VideosAdvances in Multimedia Modeling10.1007/978-3-642-35728-2_20(206-216)Online publication date: 2013
      • (2012)Fusing concept detection and geo context for visual searchProceedings of the 2nd ACM International Conference on Multimedia Retrieval10.1145/2324796.2324801(1-8)Online publication date: 5-Jun-2012
      • (2012)Sampling and Ontologically Pooling Web Images for Visual Concept LearningIEEE Transactions on Multimedia10.1109/TMM.2012.219038714:4(1068-1078)Online publication date: 1-Aug-2012
      • (2011)Automatic concept-to-query mapping for web-based concept detector trainingProceedings of the 19th ACM international conference on Multimedia10.1145/2072298.2072038(1453-1456)Online publication date: 28-Nov-2011
      • (2011)Discriminative tag learning on YouTube videos with latent sub-tagsProceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2011.5995402(3217-3224)Online publication date: 20-Jun-2011

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media