Skip to main content

Visual Concept Learning from Weakly Labeled Web Videos

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 287))

Abstract

Concept detection is a core component of video database search, concerned with the automatic recognition of visually diverse categories of objects (“airplane”), locations (“desert”), or activities (“interview”). The task poses a difficult challenge as the amount of accurately labeled data available for supervised training is limited and coverage of concept classes is poor. In order to overcome these problems, we describe the use of videos found on the web as training data for concept detectors, using tagging and folksonomies as annotation sources. This permits us to scale up training to very large data sets and concept vocabularies.

In order to take advantage of user-supplied tags on the web, we need to overcome problems of label weakness; web tags are context-dependent, unreliable and coarse. Our approach to addressing this problem is to automatically identify and filter non-relevant material. We demonstrate on a large database of videos retrieved from the web that this approach - called relevance filtering - leads to significant improvements over supervised learning techniques for categorization. In addition, we show how the approach can be combined with active learning to achieve additional performance improvements at moderate annotation cost.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ayache, S., Quenot, G.: Evaluation of active learning strategies for video indexing. Signal Processing: Image Communication 22(7-8), 692–704 (2007)

    Article  Google Scholar 

  2. Ayache, S., Quenot, G.: Video Corpus Annotation using Active Learning. In: Proc. Europ. Conf. on Information Retrieval, pp. 187–198 (March 2008)

    Google Scholar 

  3. Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., Jordan, M.: Matching Words and Pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)

    Article  MATH  Google Scholar 

  4. Berg, T., Forsyth, D.: Animals on the Web. In: Proc. Int. Conf. Computer Vision and Pattern Recognition, pp. 1463–1470 (June 2006)

    Google Scholar 

  5. Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-training. In: Proc. Ann. Conf. on Computational Learning Theory, pp. 92–100 (July 1998)

    Google Scholar 

  6. Snoek, C., et al.: The MediaMill TRECVID 2007 Semantic Video Search Engine. In: Proc. TRECVID Workshop (unreviewed workshop paper) (November 2007)

    Google Scholar 

  7. Campbell, M., Haubold, A., Liu, M., Natsev, A., Smith, J., Tesic, J., Xie, L., Yan, R., Yang, J.: IBM Research TRECVID-2007 Video Retrieval System. In: Proc. TRECVID Workshop (unreviewed workshop paper) (November 2007)

    Google Scholar 

  8. Chang, C.-C., Lin, C.-J. (LIBSVM): A Library for Support Vector Machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  9. Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-supervised Learning. MIT Press, Cambridge (2006)

    Google Scholar 

  10. Chen, M., Christel, M., Hauptmann, A., Wactlar, H.: Putting Active Learning into Multimedia Applications: Dynamic Definition and Refinement of Concept Classifiers. In: Proc. Int. Conf. on Multimedia, pp. 902–911 ( November 2005)

    Google Scholar 

  11. Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  12. Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)

    Google Scholar 

  13. Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results (October 2008)

    Google Scholar 

  14. Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning Object Categories from Google’s Image Search. Computer Vision 2, 1816–1823 (2005)

    Google Scholar 

  15. Gargi, U., Yagnik, J.: Solving the Label Resolution Problem in Supervised Video Content Classification. In: Proc. Int. Conf. on Multimedia Retrieval, pp. 276–282 (October 2008)

    Google Scholar 

  16. Gu, Z., Mei, T., Hua, X.-S., Tang, J., Wu, X.: Multi-layer Multi-instance Kernel for Video Concept Detection. In: Proc. Int. Conf. on Multimedia, pp. 349–352 (September 2007)

    Google Scholar 

  17. Hauptmann, A., Yan, R., Lin, W.: How many High-Level Concepts will Fill the Semantic Gap in News Video Retrieval? In: Proc. Int. Conf. Image and Video Retrieval, pp. 627–634 (July 2007)

    Google Scholar 

  18. Hofmann, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning 42, 177–196 (2001)

    Article  MATH  Google Scholar 

  19. Yuan, J., et al.: THU and ICRC at TRECVID 2007. In: Proc. TRECVID Workshop (unreviewed workshop paper) (November 2007)

    Google Scholar 

  20. Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: Int. Conf. Machine Learning, pp. 200–209 (June 1999)

    Google Scholar 

  21. Kennedy, L., Chang, S.-F., Kozintsev, I.: To Search or to Label?: Predicting the Performance of Search-based Automatic Image Classifiers. In: Int. Workshop Multimedia Information Retrieval, pp. 249–258 (October 2006)

    Google Scholar 

  22. Kraaij, W., Over, P.: TRECVID-2007 High-Level Feature Task: Overview. In: Proc. TRECVID Workshop (November 2007)

    Google Scholar 

  23. Lewis, D., Gale, W.: A Sequential Algorithm for Training Text Classifiers. In: Proc. Int. Conf. Research and Development in Information Retrieval, pp. 3–12 (July 1994)

    Google Scholar 

  24. Li, L.-J., Wang, G., Fei-Fei, L.: OPTIMOL: automatic Object Picture collecTion via Incremental MOdel Learning. In: Proc. Int. Conf. Computer Vision and Pattern Recognition, pp. 57–64 (June 2007)

    Google Scholar 

  25. Lowe, D.: Object Recognition from Local Scale-Invariant Features. In: Int. Conf. Computer Vision, pp. 1150–1157 (September 1999)

    Google Scholar 

  26. Morsillo, N., Pal, C., Nelson, R.: Semi-supervised Visual Scene and Object Analysis from Web Images and Text. In: Scene Understanding Symposium (February 2008)

    Google Scholar 

  27. Naphade, M., Smith, J., Tesic, J., Chang, S., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: Large-Scale Concept Ontology for Multimedia. IEEE MultiMedia 13(3), 86–91 (2006)

    Article  Google Scholar 

  28. Paredes, R., Perez-Cortes, A.: Local Representations and a Direct Voting Scheme for Face Recognition. In: Proc. Workshop on Pattern Rec. and Inf. Systems, pp. 71–79 (July 2001)

    Google Scholar 

  29. Salton, G., Buckley, C.: Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Science 41(4), 288–297 (1990)

    Article  Google Scholar 

  30. Schölkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)

    Google Scholar 

  31. Schroff, F., Criminisi, A., Zisserman, A.: Harvesting Image Databases from the Web. In: Proc. Int. Conf. Computer Vision, pp. 1–8 (October 2007)

    Google Scholar 

  32. Settles, B.: Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison (2009)

    Google Scholar 

  33. Sivic, J., Zisserman, A.: Video Google: Efficient Visual Search of Videos. In: Toward Category-Level Object Recognition, pp. 127–144. Springer, New York, Inc. (2006)

    Google Scholar 

  34. Smeaton, A.: Techniques Used and Open Challenges to the Analysis, Indexing and Retrieval of Digital Video. Inf. Syst. 32(4), 545–559 (2007)

    Article  Google Scholar 

  35. Smeaton, A., Over, P., Kraaij, W.: Evaluation Campaigns and TRECVID. In: Int. Workshop Multimedia Information Retrieval, pp. 321–330 (October 2006)

    Google Scholar 

  36. Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-Based Image Retrieval at the End of the Early Years. IEEE Trans. Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000)

    Article  Google Scholar 

  37. Snoek, C., Worring, M.: Concept-based Video Retrieval. Foundations and Trends in Information Retrieval 4(2), 215–322 (2009)

    Google Scholar 

  38. Snoek, C., Worring, M., de Rooij, O., van de Sande, K., Yan, R., Hauptmann, A.: VideOlympics: Real-Time Evaluation of Multimedia Retrieval Systems. IEEE MultiMedia 15(1), 86–91 (2008)

    Article  Google Scholar 

  39. Snoek, C., Worring, M., Huurnink, B., van Gemert, J., van de Sande, K., Koelma, D., de Rooij, O.: MediaMill: Video Search using a Thesaurus of 500 Machine Learned Concepts. In: 1st Int. Conf. Sem. Dig. Media Techn (Posters and Demos.) (2006)

    Google Scholar 

  40. Sun, Y., Shimada, S., Taniguchi, Y., Kojima, A.: A Novel Region-based Approach to Visual Concept Modeling using Web Images. In: Int. Conf. Multimedia, pp. 635–638 (2008)

    Google Scholar 

  41. Tong, S., Chang, E.: Support Vector Machine Active Learning for Image Retrieval. In: Proc. Int. Conf. on Multimedia, pp. 107–118 (September 2001)

    Google Scholar 

  42. Turlach, B.: Bandwidth Selection in Kernel Density Estimation: A Review. In: CORE and Institut de Statistique, pp. 23–49 (1993)

    Google Scholar 

  43. Ulges, A., Schulze, C., Keysers, D., Breuel, T.: Identifying Relevant Frames in Weakly Labeled Videos for Training Concept Detectors. In: Proc. Int. Conf. Image and Video Retrieval, pp. 9–16 (July 2008)

    Google Scholar 

  44. YouTube Serves up 100 Million Videos a Day Online. USA Today (Garnnett Company, Inc.) (July 2006), http://www.usatoday.com/tech/news/2006-07-16-youtube-views_x.htm (retrieved, September 2008)

  45. van de Sande, K., Gevers, T., Snoek, C.: A Comparison of Color Features for Visual Concept Classification. In: Proc. Int. Conf. Image and Video Retrieval, pp. 141–150 (July 2008)

    Google Scholar 

  46. Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video Diver: Generic Video Indexing with Diverse Features. In: Proc. Int. Workshop Multimedia Information Retrieval, pp. 61–70 (September 2007)

    Google Scholar 

  47. Wang, M., Hua, X.-S., Song, Y., Yuan, X., Li, S., Zhang, H.-J.: Automatic Video Annotation by Semi-supervised Learning with Kernel Density Estimation. In: Proc. Int. Conf. on Multimedia, October 2006, pp. 967–976 (2006)

    Google Scholar 

  48. Wnuk, K., Soatto, S.: Filtering Internet Image Search Results Towards Keyword Based Category Recognition. In: Proc. Int. Conf. Computer Vision and Pattern Recognition, pp. 1–8 (June 2008)

    Google Scholar 

  49. Yanagawa, A., Chang, S.-F., Kennedy, L., Hsu, W.: Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts. Technical report, Columbia University (2007)

    Google Scholar 

  50. Yanai, K., Barnard, K.: Probabilistic Web Image Gathering. In: Int. Workshop on Multimedia Inf. Retrieval, November 2005, pp. 57–64 (2005)

    Google Scholar 

  51. Yang, J., Hauptmann, A.: (Un)Reliability of Video Concept Detection. In: Proc. Int. Conf. Image and Video Retrieval, July 2008, pp. 85–94 (2008)

    Google Scholar 

  52. Zhu, X.: Semi-supervised Learning Literature Survey. Technical Report 1530, Computer Sciences, University of Wisconsin, Madison (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ulges, A., Borth, D., Breuel, T.M. (2010). Visual Concept Learning from Weakly Labeled Web Videos. In: Schonfeld, D., Shan, C., Tao, D., Wang, L. (eds) Video Search and Mining. Studies in Computational Intelligence, vol 287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12900-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12900-1_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12899-8

  • Online ISBN: 978-3-642-12900-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics