Visual Concept Learning from Weakly Labeled Web Videos

Ulges, Adrian; Borth, Damian; Breuel, Thomas M.

doi:10.1007/978-3-642-12900-1_8

Adrian Ulges⁶,
Damian Borth⁷ &
Thomas M. Breuel⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 287))

940 Accesses

Abstract

Concept detection is a core component of video database search, concerned with the automatic recognition of visually diverse categories of objects (“airplane”), locations (“desert”), or activities (“interview”). The task poses a difficult challenge as the amount of accurately labeled data available for supervised training is limited and coverage of concept classes is poor. In order to overcome these problems, we describe the use of videos found on the web as training data for concept detectors, using tagging and folksonomies as annotation sources. This permits us to scale up training to very large data sets and concept vocabularies.

In order to take advantage of user-supplied tags on the web, we need to overcome problems of label weakness; web tags are context-dependent, unreliable and coarse. Our approach to addressing this problem is to automatically identify and filter non-relevant material. We demonstrate on a large database of videos retrieved from the web that this approach - called relevance filtering - leads to significant improvements over supervised learning techniques for categorization. In addition, we show how the approach can be combined with active learning to achieve additional performance improvements at moderate annotation cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Known-Item Search in Video Databases with Textual Queries

Improving video event retrieval by user feedback

Article Open access 12 May 2017

Fast Re-ranking of Visual Search Results by Example Selection

References

Ayache, S., Quenot, G.: Evaluation of active learning strategies for video indexing. Signal Processing: Image Communication 22(7-8), 692–704 (2007)
Article Google Scholar
Ayache, S., Quenot, G.: Video Corpus Annotation using Active Learning. In: Proc. Europ. Conf. on Information Retrieval, pp. 187–198 (March 2008)
Google Scholar
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., Jordan, M.: Matching Words and Pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)
Article MATH Google Scholar
Berg, T., Forsyth, D.: Animals on the Web. In: Proc. Int. Conf. Computer Vision and Pattern Recognition, pp. 1463–1470 (June 2006)
Google Scholar
Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-training. In: Proc. Ann. Conf. on Computational Learning Theory, pp. 92–100 (July 1998)
Google Scholar
Snoek, C., et al.: The MediaMill TRECVID 2007 Semantic Video Search Engine. In: Proc. TRECVID Workshop (unreviewed workshop paper) (November 2007)
Google Scholar
Campbell, M., Haubold, A., Liu, M., Natsev, A., Smith, J., Tesic, J., Xie, L., Yan, R., Yang, J.: IBM Research TRECVID-2007 Video Retrieval System. In: Proc. TRECVID Workshop (unreviewed workshop paper) (November 2007)
Google Scholar
Chang, C.-C., Lin, C.-J. (LIBSVM): A Library for Support Vector Machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-supervised Learning. MIT Press, Cambridge (2006)
Google Scholar
Chen, M., Christel, M., Hauptmann, A., Wactlar, H.: Putting Active Learning into Multimedia Applications: Dynamic Definition and Refinement of Concept Classifiers. In: Proc. Int. Conf. on Multimedia, pp. 902–911 ( November 2005)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)
Google Scholar
Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results (October 2008)
Google Scholar
Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning Object Categories from Google’s Image Search. Computer Vision 2, 1816–1823 (2005)
Google Scholar
Gargi, U., Yagnik, J.: Solving the Label Resolution Problem in Supervised Video Content Classification. In: Proc. Int. Conf. on Multimedia Retrieval, pp. 276–282 (October 2008)
Google Scholar
Gu, Z., Mei, T., Hua, X.-S., Tang, J., Wu, X.: Multi-layer Multi-instance Kernel for Video Concept Detection. In: Proc. Int. Conf. on Multimedia, pp. 349–352 (September 2007)
Google Scholar
Hauptmann, A., Yan, R., Lin, W.: How many High-Level Concepts will Fill the Semantic Gap in News Video Retrieval? In: Proc. Int. Conf. Image and Video Retrieval, pp. 627–634 (July 2007)
Google Scholar
Hofmann, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning 42, 177–196 (2001)
Article MATH Google Scholar
Yuan, J., et al.: THU and ICRC at TRECVID 2007. In: Proc. TRECVID Workshop (unreviewed workshop paper) (November 2007)
Google Scholar
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: Int. Conf. Machine Learning, pp. 200–209 (June 1999)
Google Scholar
Kennedy, L., Chang, S.-F., Kozintsev, I.: To Search or to Label?: Predicting the Performance of Search-based Automatic Image Classifiers. In: Int. Workshop Multimedia Information Retrieval, pp. 249–258 (October 2006)
Google Scholar
Kraaij, W., Over, P.: TRECVID-2007 High-Level Feature Task: Overview. In: Proc. TRECVID Workshop (November 2007)
Google Scholar
Lewis, D., Gale, W.: A Sequential Algorithm for Training Text Classifiers. In: Proc. Int. Conf. Research and Development in Information Retrieval, pp. 3–12 (July 1994)
Google Scholar
Li, L.-J., Wang, G., Fei-Fei, L.: OPTIMOL: automatic Object Picture collecTion via Incremental MOdel Learning. In: Proc. Int. Conf. Computer Vision and Pattern Recognition, pp. 57–64 (June 2007)
Google Scholar
Lowe, D.: Object Recognition from Local Scale-Invariant Features. In: Int. Conf. Computer Vision, pp. 1150–1157 (September 1999)
Google Scholar
Morsillo, N., Pal, C., Nelson, R.: Semi-supervised Visual Scene and Object Analysis from Web Images and Text. In: Scene Understanding Symposium (February 2008)
Google Scholar
Naphade, M., Smith, J., Tesic, J., Chang, S., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: Large-Scale Concept Ontology for Multimedia. IEEE MultiMedia 13(3), 86–91 (2006)
Article Google Scholar
Paredes, R., Perez-Cortes, A.: Local Representations and a Direct Voting Scheme for Face Recognition. In: Proc. Workshop on Pattern Rec. and Inf. Systems, pp. 71–79 (July 2001)
Google Scholar
Salton, G., Buckley, C.: Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Science 41(4), 288–297 (1990)
Article Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Google Scholar
Schroff, F., Criminisi, A., Zisserman, A.: Harvesting Image Databases from the Web. In: Proc. Int. Conf. Computer Vision, pp. 1–8 (October 2007)
Google Scholar
Settles, B.: Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison (2009)
Google Scholar
Sivic, J., Zisserman, A.: Video Google: Efficient Visual Search of Videos. In: Toward Category-Level Object Recognition, pp. 127–144. Springer, New York, Inc. (2006)
Google Scholar
Smeaton, A.: Techniques Used and Open Challenges to the Analysis, Indexing and Retrieval of Digital Video. Inf. Syst. 32(4), 545–559 (2007)
Article Google Scholar
Smeaton, A., Over, P., Kraaij, W.: Evaluation Campaigns and TRECVID. In: Int. Workshop Multimedia Information Retrieval, pp. 321–330 (October 2006)
Google Scholar
Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-Based Image Retrieval at the End of the Early Years. IEEE Trans. Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000)
Article Google Scholar
Snoek, C., Worring, M.: Concept-based Video Retrieval. Foundations and Trends in Information Retrieval 4(2), 215–322 (2009)
Google Scholar
Snoek, C., Worring, M., de Rooij, O., van de Sande, K., Yan, R., Hauptmann, A.: VideOlympics: Real-Time Evaluation of Multimedia Retrieval Systems. IEEE MultiMedia 15(1), 86–91 (2008)
Article Google Scholar
Snoek, C., Worring, M., Huurnink, B., van Gemert, J., van de Sande, K., Koelma, D., de Rooij, O.: MediaMill: Video Search using a Thesaurus of 500 Machine Learned Concepts. In: 1st Int. Conf. Sem. Dig. Media Techn (Posters and Demos.) (2006)
Google Scholar
Sun, Y., Shimada, S., Taniguchi, Y., Kojima, A.: A Novel Region-based Approach to Visual Concept Modeling using Web Images. In: Int. Conf. Multimedia, pp. 635–638 (2008)
Google Scholar
Tong, S., Chang, E.: Support Vector Machine Active Learning for Image Retrieval. In: Proc. Int. Conf. on Multimedia, pp. 107–118 (September 2001)
Google Scholar
Turlach, B.: Bandwidth Selection in Kernel Density Estimation: A Review. In: CORE and Institut de Statistique, pp. 23–49 (1993)
Google Scholar
Ulges, A., Schulze, C., Keysers, D., Breuel, T.: Identifying Relevant Frames in Weakly Labeled Videos for Training Concept Detectors. In: Proc. Int. Conf. Image and Video Retrieval, pp. 9–16 (July 2008)
Google Scholar
YouTube Serves up 100 Million Videos a Day Online. USA Today (Garnnett Company, Inc.) (July 2006), http://www.usatoday.com/tech/news/2006-07-16-youtube-views_x.htm (retrieved, September 2008)
van de Sande, K., Gevers, T., Snoek, C.: A Comparison of Color Features for Visual Concept Classification. In: Proc. Int. Conf. Image and Video Retrieval, pp. 141–150 (July 2008)
Google Scholar
Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video Diver: Generic Video Indexing with Diverse Features. In: Proc. Int. Workshop Multimedia Information Retrieval, pp. 61–70 (September 2007)
Google Scholar
Wang, M., Hua, X.-S., Song, Y., Yuan, X., Li, S., Zhang, H.-J.: Automatic Video Annotation by Semi-supervised Learning with Kernel Density Estimation. In: Proc. Int. Conf. on Multimedia, October 2006, pp. 967–976 (2006)
Google Scholar
Wnuk, K., Soatto, S.: Filtering Internet Image Search Results Towards Keyword Based Category Recognition. In: Proc. Int. Conf. Computer Vision and Pattern Recognition, pp. 1–8 (June 2008)
Google Scholar
Yanagawa, A., Chang, S.-F., Kennedy, L., Hsu, W.: Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts. Technical report, Columbia University (2007)
Google Scholar
Yanai, K., Barnard, K.: Probabilistic Web Image Gathering. In: Int. Workshop on Multimedia Inf. Retrieval, November 2005, pp. 57–64 (2005)
Google Scholar
Yang, J., Hauptmann, A.: (Un)Reliability of Video Concept Detection. In: Proc. Int. Conf. Image and Video Retrieval, July 2008, pp. 85–94 (2008)
Google Scholar
Zhu, X.: Semi-supervised Learning Literature Survey. Technical Report 1530, Computer Sciences, University of Wisconsin, Madison (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

German Research Center for Artificial Intelligence (DFKI), D-67663, Kaiserslautern, Germany
Adrian Ulges
University of Kaiserslautern, D-67663, Kaiserslautern, Germany
Damian Borth & Thomas M. Breuel

Authors

Adrian Ulges
View author publications
You can also search for this author in PubMed Google Scholar
Damian Borth
View author publications
You can also search for this author in PubMed Google Scholar
Thomas M. Breuel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Multimedia Communications Laboratory Department of Electrical & Computer Engineering, University of Illinois at Chicago, Room 1020 SEO (M/C 154), 851 South Morgan Street, 60607-7053, Chicago, IL, USA
Dan Schonfeld
Philips Research, High-Tech Campus 36, 5656, Eindhoven, AE, The Netherlands
Caifeng Shan
Department of Computing, Hong Kong Polytechnic University, 7/F, Building P, Hung Hom, PQ704, Kowloon,Hong Kong, China
Dacheng Tao
Department of Computer Science, University of Bath, BA2 7AY, United Kingdom
Liang Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ulges, A., Borth, D., Breuel, T.M. (2010). Visual Concept Learning from Weakly Labeled Web Videos. In: Schonfeld, D., Shan, C., Tao, D., Wang, L. (eds) Video Search and Mining. Studies in Computational Intelligence, vol 287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12900-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-12900-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12899-8
Online ISBN: 978-3-642-12900-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics