research-article

Learning heterogeneous data for hierarchical web video classification

Authors:

Qi TianAuthors Info & Claims

MM '11: Proceedings of the 19th ACM international conference on Multimedia

Pages 433 - 442

https://doi.org/10.1145/2072298.2072355

Published: 28 November 2011 Publication History

Abstract

Web videos such as YouTube are hard to obtain sufficient precisely labeled training data and analyze due to the complex ontology. To deal with these problems, we present a hierarchical web video classification framework by learning heterogeneous web data, and construct a bottom-up semantic forest of video concepts by learning from meta-data. The main contributions are two-folds: firstly, analysis about middle-level concepts' distribution is taken based on data collected from web communities, and a concepts redistribution assumption is made to build effective transfer learning algorithm. Furthermore, an AdaBoost-Like transfer learning algorithm is proposed to transfer the knowledge learned from Flickr images to YouTube video domain and thus it facilitates video classification. Secondly, a group of hierarchical taxonomies named Semantic Forest are mined from YouTube and Flickr tags which reflect better user intention on the semantic level. A bottom-up semantic integration is also constructed with the help of semantic forest, in order to analyze video content hierarchically in a novel perspective. A group of experiments are performed on the dataset collected from Flickr and YouTube. Compared with state-of-the-arts, the proposed framework is more robust and tolerant to web noise.

References

[1]

Trecvid 2010 website. http://www-nlpir.nist.gov/projects/tv2010/tv2010.html, 2010.

[2]

A. Amir, J. Argillander, M. Campbell, A. Haubold, G. Iyengar, S. Ebadollahi, F. Kang, M. R. Naphade, A. P. Natsev, J. R. Smith, J. Tesic, and T. Volkmer. Ibm research trecvid-2005 video retrieval system. 2005.

[3]

H. Aradhye, G. Toderici, and J. Yagnik. Video2text: Learning to annotate video content. In IEEE International Conference on Data Mining, pages 144--151, 2009.

Digital Library

[4]

Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. In IEEE Computer Vision and Pattern Recognition, pages 2559--2566, 2010.

[5]

Y.-L. Boureau, J. Ponce, and Y. LeCun. A theoretical analysis of feature pooling in visual recognition. In International Conference on Machine Learning, pages 111--118, 2010.

[6]

M. Campbell. Ibm research trecvid - 2006 video retrieval system. In TREC Video Retrieval Evaluation, 2006.

[7]

S.-F. Chang, W. Hsu, L. Kennedy, L. Xie, A. Yanagawa, E. Zavesky, and D.-Q. Zhang. Columbia university trecvid-2005 video search and high-level feature extraction. In TREC Video Retrieval Evaluation, 2005.

[8]

T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: a real-world web image database from national university of singapore. In Conference on Image and Video Retrieval, pages 1--9, 2009.

Digital Library

[9]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li. Imagenet: A large-scale hierarchical image database. In IEEE Computer Vision and Pattern Recognition, pages 248--255, 2009.

[10]

L. Duan, I. W.-H. Tsang, D. Xu, and S. J. Maybank. Domain transfer svm for video concept detection. In IEEE Computer Vision and Pattern Recognition, pages 1375--1381, 2009.

[11]

L. Duan, D. Xu, I. W.-H. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. In IEEE Computer Vision and Pattern Recognition, pages 1959--1966, 2010.

[12]

Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In European Conference on Computational Learning Theory, pages 23--37, 1995.

Digital Library

[13]

Y.-G. Jiang, C.-W. Ngo, and S.-F. Chang. Semantic context transfer across heterogeneous sources for domain adaptive video search. In ACM International Conference on Multimedia, pages 155--164, 2009.

Digital Library

[14]

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In IEEE Computer Vision and Pattern Recognition, 2008.

[15]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91--110, 2004.

Digital Library

[16]

M. Marszalek and C. Schmid. Semantic hierarchies for visual object recognition. In IEEE Computer Vision and Pattern Recognition, 2007.

[17]

S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22:1345--1359, 2010.

Digital Library

[18]

G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In ACM International Conference on Multimedia, pages 17--26, 2007.

Digital Library

[19]

G.-J. Qi, Y. Rui, Q. Tian, and T. Huang. Towards cross-cateogory knowledge propagation for learning visual concepts. In IEEE Computer Vision and Pattern Recognition, pages 897--904, 2011.

Digital Library

[20]

M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, and B. Schiele. What helps where - and why? semantic relatedness for knowledge transfer. In IEEE Computer Vision and Pattern Recognition, pages 910--917, 2010.

[21]

J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In International Conference on Computer Vision, pages 1470--1477, 2003.

Digital Library

[22]

Y. Song, M. Zhao, J. Yagnik, and X. Wu. Taxonomic classification for web-based videos. In IEEE Computer Vision and Pattern Recognition, pages 871--878, 2010.

[23]

M. Wang and X.-S. Hua. Active learning in multimedia annotation and retrieval: A survey. ACM Transcation on Intelligent System and Technolgy, 2, 2011.

Digital Library

[24]

M. Wang, X.-S. Hua, R. Hong, J. Tang, G.-J. Qi, and Y. Song. Unified video annotation via multi-graph learning. IEEE Transcation on Circuits and System for Video Technology, 19, 2009.

Digital Library

[25]

Z. Wang, M. Zhao, Y. Song, S. Kumar, and B. Li. Youtubecat: Learning to categorize wild web videos. In IEEE Computer Vision and Pattern Recognition, pages 879--886, 2010.

[26]

L. Xie, R. Yan, J. Tesic, A. Natsev, and J. R. Smith. Probabilistic visual concept trees. In ACM International Conference on Multimedia, pages 867--870, 2010.

Digital Library

[27]

A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia university's baseline detectors for 374 lscom semantic visual concepts. Technical report, Columbia University ADVENT, 2007.

[28]

J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain video concept detection using adaptive svms. In ACM International Conference on Multimedia, pages 188--197, 2007.

Digital Library

[29]

Y. Yang, D. Xu, F. Nie, J. Luo, and Y. Zhuang. Ranking with local regression and global alignment for cross media retrieval. In ACM Multimedia Conference, pages 175--184, 2009.

Digital Library

[30]

Y. Yao and G. Doretto. Boosting for transfer learning with multiple sources. In IEEE Computer Vision and Pattern Recognition, pages 1855--1862, 2010.

[31]

J. Yuen, B. C. Russell, C. Liu, and A. Torralba. Labelme video: Building a video database with human annotations. In International Conference on Computer Vision, pages 1451--1458, 2009.

Cited By

Wang CRen ME CGuo LYu XLin YLi S(2023)Online feature selection for hierarchical classification learning based on improved ReliefFConcurrency and Computation: Practice and Experience10.1002/cpe.784435:27Online publication date: 20-Jul-2023
https://doi.org/10.1002/cpe.7844
Liu FXu XQiu SQing CTao D(2016)Simple to Complex Transfer Learning for Action RecognitionIEEE Transactions on Image Processing10.1109/TIP.2015.251210725:2(949-960)Online publication date: Feb-2016
https://doi.org/10.1109/TIP.2015.2512107
Zhang XYang YZhang YLuan HLi JZhang HChua T(2015)Enhancing Video Event Recognition Using Automatically Constructed Semantic-Visual Knowledge BaseIEEE Transactions on Multimedia10.1109/TMM.2015.244966017:9(1562-1575)Online publication date: Sep-2015
https://doi.org/10.1109/TMM.2015.2449660
Show More Cited By

Index Terms

Learning heterogeneous data for hierarchical web video classification
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Dealing with the video tidal wave: the relevance of expertise for video tagging
HT '10: Proceedings of the 21st ACM conference on Hypertext and hypermedia

The vast amounts of video that need to be tagged preclude the exclusive use of professional indexers. Thus a significant amount of video will need to be tagged by non-experts. Are the tags created by experts demonstrably superior to those of non-experts,...
Linking user generated video annotations to the web of data
MMM'12: Proceedings of the 18th international conference on Advances in Multimedia Modeling

In the audiovisual domain tagging games are explored as a method to collect user-generated metadata. For example, the Netherlands Institute for Sound and Vision deployed the video labelling game <em>Waisda?</em> to collect user tags for videos from ...
On the tag localization of web video

Nowadays, numerous social videos have pervaded on the web. Social web videos are characterized with the accompanying rich contextual information which describe the content of videos and thus greatly facilitate video search and browsing. Generally, those ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '11: Proceedings of the 19th ACM international conference on Multimedia

November 2011

944 pages

ISBN:9781450306164

DOI:10.1145/2072298

General Chairs:
K. Selçuk Candan
Arizona State University, USA
,
Sethuraman Panchanathan
Arizona State University, USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
,
Program Chairs:
Hari Sundaram
Arizona State University, USA
,
Wu-Chi Feng
Portland State University, USA
,
Nicu Sebe
University of Trento, Italy

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 November 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '11

Sponsor:

SIGMM

MM '11: ACM Multimedia Conference

November 28 - December 1, 2011

Arizona, Scottsdale, USA

Acceptance Rates

Overall Acceptance Rate 2,038 of 8,033 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
381
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)2

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang CRen ME CGuo LYu XLin YLi S(2023)Online feature selection for hierarchical classification learning based on improved ReliefFConcurrency and Computation: Practice and Experience10.1002/cpe.784435:27Online publication date: 20-Jul-2023
https://doi.org/10.1002/cpe.7844
Liu FXu XQiu SQing CTao D(2016)Simple to Complex Transfer Learning for Action RecognitionIEEE Transactions on Image Processing10.1109/TIP.2015.251210725:2(949-960)Online publication date: Feb-2016
https://doi.org/10.1109/TIP.2015.2512107
Zhang XYang YZhang YLuan HLi JZhang HChua T(2015)Enhancing Video Event Recognition Using Automatically Constructed Semantic-Visual Knowledge BaseIEEE Transactions on Multimedia10.1109/TMM.2015.244966017:9(1562-1575)Online publication date: Sep-2015
https://doi.org/10.1109/TMM.2015.2449660
Cao LLiu XLiu WJi RHuang T(2015)Localizing web videos using social imagesInformation Sciences: an International Journal10.1016/j.ins.2014.08.017302:C(122-131)Online publication date: 1-May-2015
https://dl.acm.org/doi/10.1016/j.ins.2014.08.017
Yang YZha ZGao YZhu XChua T(2014)Exploiting Web Images for Semantic Video Indexing Via Robust Sample-Specific LossIEEE Transactions on Multimedia10.1109/TMM.2014.232301416:6(1677-1689)Online publication date: Oct-2014
https://doi.org/10.1109/TMM.2014.2323014
Yang YYang YShen H(2013)Effective transfer tagging from image to videoACM Transactions on Multimedia Computing, Communications, and Applications10.1145/2457450.24574569:2(1-20)Online publication date: 10-May-2013
https://dl.acm.org/doi/10.1145/2457450.2457456

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten