skip to main content
10.1145/2072298.2072355acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Learning heterogeneous data for hierarchical web video classification

Published: 28 November 2011 Publication History

Abstract

Web videos such as YouTube are hard to obtain sufficient precisely labeled training data and analyze due to the complex ontology. To deal with these problems, we present a hierarchical web video classification framework by learning heterogeneous web data, and construct a bottom-up semantic forest of video concepts by learning from meta-data. The main contributions are two-folds: firstly, analysis about middle-level concepts' distribution is taken based on data collected from web communities, and a concepts redistribution assumption is made to build effective transfer learning algorithm. Furthermore, an AdaBoost-Like transfer learning algorithm is proposed to transfer the knowledge learned from Flickr images to YouTube video domain and thus it facilitates video classification. Secondly, a group of hierarchical taxonomies named Semantic Forest are mined from YouTube and Flickr tags which reflect better user intention on the semantic level. A bottom-up semantic integration is also constructed with the help of semantic forest, in order to analyze video content hierarchically in a novel perspective. A group of experiments are performed on the dataset collected from Flickr and YouTube. Compared with state-of-the-arts, the proposed framework is more robust and tolerant to web noise.

References

[1]
Trecvid 2010 website. http://www-nlpir.nist.gov/projects/tv2010/tv2010.html, 2010.
[2]
A. Amir, J. Argillander, M. Campbell, A. Haubold, G. Iyengar, S. Ebadollahi, F. Kang, M. R. Naphade, A. P. Natsev, J. R. Smith, J. Tesic, and T. Volkmer. Ibm research trecvid-2005 video retrieval system. 2005.
[3]
H. Aradhye, G. Toderici, and J. Yagnik. Video2text: Learning to annotate video content. In IEEE International Conference on Data Mining, pages 144--151, 2009.
[4]
Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. In IEEE Computer Vision and Pattern Recognition, pages 2559--2566, 2010.
[5]
Y.-L. Boureau, J. Ponce, and Y. LeCun. A theoretical analysis of feature pooling in visual recognition. In International Conference on Machine Learning, pages 111--118, 2010.
[6]
M. Campbell. Ibm research trecvid - 2006 video retrieval system. In TREC Video Retrieval Evaluation, 2006.
[7]
S.-F. Chang, W. Hsu, L. Kennedy, L. Xie, A. Yanagawa, E. Zavesky, and D.-Q. Zhang. Columbia university trecvid-2005 video search and high-level feature extraction. In TREC Video Retrieval Evaluation, 2005.
[8]
T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: a real-world web image database from national university of singapore. In Conference on Image and Video Retrieval, pages 1--9, 2009.
[9]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li. Imagenet: A large-scale hierarchical image database. In IEEE Computer Vision and Pattern Recognition, pages 248--255, 2009.
[10]
L. Duan, I. W.-H. Tsang, D. Xu, and S. J. Maybank. Domain transfer svm for video concept detection. In IEEE Computer Vision and Pattern Recognition, pages 1375--1381, 2009.
[11]
L. Duan, D. Xu, I. W.-H. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. In IEEE Computer Vision and Pattern Recognition, pages 1959--1966, 2010.
[12]
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In European Conference on Computational Learning Theory, pages 23--37, 1995.
[13]
Y.-G. Jiang, C.-W. Ngo, and S.-F. Chang. Semantic context transfer across heterogeneous sources for domain adaptive video search. In ACM International Conference on Multimedia, pages 155--164, 2009.
[14]
I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In IEEE Computer Vision and Pattern Recognition, 2008.
[15]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91--110, 2004.
[16]
M. Marszalek and C. Schmid. Semantic hierarchies for visual object recognition. In IEEE Computer Vision and Pattern Recognition, 2007.
[17]
S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22:1345--1359, 2010.
[18]
G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In ACM International Conference on Multimedia, pages 17--26, 2007.
[19]
G.-J. Qi, Y. Rui, Q. Tian, and T. Huang. Towards cross-cateogory knowledge propagation for learning visual concepts. In IEEE Computer Vision and Pattern Recognition, pages 897--904, 2011.
[20]
M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, and B. Schiele. What helps where - and why? semantic relatedness for knowledge transfer. In IEEE Computer Vision and Pattern Recognition, pages 910--917, 2010.
[21]
J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In International Conference on Computer Vision, pages 1470--1477, 2003.
[22]
Y. Song, M. Zhao, J. Yagnik, and X. Wu. Taxonomic classification for web-based videos. In IEEE Computer Vision and Pattern Recognition, pages 871--878, 2010.
[23]
M. Wang and X.-S. Hua. Active learning in multimedia annotation and retrieval: A survey. ACM Transcation on Intelligent System and Technolgy, 2, 2011.
[24]
M. Wang, X.-S. Hua, R. Hong, J. Tang, G.-J. Qi, and Y. Song. Unified video annotation via multi-graph learning. IEEE Transcation on Circuits and System for Video Technology, 19, 2009.
[25]
Z. Wang, M. Zhao, Y. Song, S. Kumar, and B. Li. Youtubecat: Learning to categorize wild web videos. In IEEE Computer Vision and Pattern Recognition, pages 879--886, 2010.
[26]
L. Xie, R. Yan, J. Tesic, A. Natsev, and J. R. Smith. Probabilistic visual concept trees. In ACM International Conference on Multimedia, pages 867--870, 2010.
[27]
A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia university's baseline detectors for 374 lscom semantic visual concepts. Technical report, Columbia University ADVENT, 2007.
[28]
J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain video concept detection using adaptive svms. In ACM International Conference on Multimedia, pages 188--197, 2007.
[29]
Y. Yang, D. Xu, F. Nie, J. Luo, and Y. Zhuang. Ranking with local regression and global alignment for cross media retrieval. In ACM Multimedia Conference, pages 175--184, 2009.
[30]
Y. Yao and G. Doretto. Boosting for transfer learning with multiple sources. In IEEE Computer Vision and Pattern Recognition, pages 1855--1862, 2010.
[31]
J. Yuen, B. C. Russell, C. Liu, and A. Torralba. Labelme video: Building a video database with human annotations. In International Conference on Computer Vision, pages 1451--1458, 2009.

Cited By

View all
  • (2023)Online feature selection for hierarchical classification learning based on improved ReliefFConcurrency and Computation: Practice and Experience10.1002/cpe.784435:27Online publication date: 20-Jul-2023
  • (2016)Simple to Complex Transfer Learning for Action RecognitionIEEE Transactions on Image Processing10.1109/TIP.2015.251210725:2(949-960)Online publication date: Feb-2016
  • (2015)Enhancing Video Event Recognition Using Automatically Constructed Semantic-Visual Knowledge BaseIEEE Transactions on Multimedia10.1109/TMM.2015.244966017:9(1562-1575)Online publication date: Sep-2015
  • Show More Cited By

Index Terms

  1. Learning heterogeneous data for hierarchical web video classification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '11: Proceedings of the 19th ACM international conference on Multimedia
    November 2011
    944 pages
    ISBN:9781450306164
    DOI:10.1145/2072298
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 November 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. heterogeneous data
    2. hierarchical taxonomy
    3. semantic forest
    4. transfer learning
    5. video annotation
    6. video search
    7. web video classification

    Qualifiers

    • Research-article

    Conference

    MM '11
    Sponsor:
    MM '11: ACM Multimedia Conference
    November 28 - December 1, 2011
    Arizona, Scottsdale, USA

    Acceptance Rates

    Overall Acceptance Rate 2,038 of 8,033 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Online feature selection for hierarchical classification learning based on improved ReliefFConcurrency and Computation: Practice and Experience10.1002/cpe.784435:27Online publication date: 20-Jul-2023
    • (2016)Simple to Complex Transfer Learning for Action RecognitionIEEE Transactions on Image Processing10.1109/TIP.2015.251210725:2(949-960)Online publication date: Feb-2016
    • (2015)Enhancing Video Event Recognition Using Automatically Constructed Semantic-Visual Knowledge BaseIEEE Transactions on Multimedia10.1109/TMM.2015.244966017:9(1562-1575)Online publication date: Sep-2015
    • (2015)Localizing web videos using social imagesInformation Sciences: an International Journal10.1016/j.ins.2014.08.017302:C(122-131)Online publication date: 1-May-2015
    • (2014)Exploiting Web Images for Semantic Video Indexing Via Robust Sample-Specific LossIEEE Transactions on Multimedia10.1109/TMM.2014.232301416:6(1677-1689)Online publication date: Oct-2014
    • (2013)Effective transfer tagging from image to videoACM Transactions on Multimedia Computing, Communications, and Applications10.1145/2457450.24574569:2(1-20)Online publication date: 10-May-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media