research-article

Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming

Authors:

Jianping FanAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 4, Issue 1

Article No.: 1, Pages 1 - 25

https://doi.org/10.1145/1324287.1324288

Published: 11 February 2008 Publication History

Abstract

For online medical education purposes, we have developed a novel scheme to incorporate the results of semantic video classification to select the most representative video shots for generating concept-oriented summarization and skimming of surgery education videos. First, salient objects are used as the video patterns for feature extraction to achieve a good representation of the intermediate video semantics. The salient objects are defined as the salient video compounds that can be used to characterize the most significant perceptual properties of the corresponding real world physical objects in a video, and thus the appearances of such salient objects can be used to predict the appearances of the relevant semantic video concepts in a specific video domain. Second, a novel multi-modal boosting algorithm is developed to achieve more reliable video classifier training by incorporating feature hierarchy and boosting to dramatically reduce both the training cost and the size of training samples, thus it can significantly speed up SVM (support vector machine) classifier training. In addition, the unlabeled samples are integrated to reduce the human efforts on labeling large amount of training samples. Finally, the results of semantic video classification are incorporated to enable concept-oriented video summarization and skimming. Experimental results in a specific domain of surgery education videos are provided.

Supplementary Material

JPG File (a1-luo.jpg)

Download
17.75 KB

MOV File (a1-luo.mov)

Download
5.92 MB

References

[1]

Adames, B., Dorai, C., and Venkatesh, S. 2002. Towards automatic extraction of expressive elements of motion pictures: Tempo. IEEE Trans. Multimedia 4, 4, 472--481.

Digital Library

[2]

Adams, W., Iyengar, G., Lin, C.-Y., Naphade, M., Neti, C., Nock, H., and Smith, J. 2003. Semantic indexing of multimedia content using visual, audio and text cues. EURASIP J. Appl. Sig. Proc. 2, 1--16.

[3]

Alatan, A., Onural, L., Wollborn, M., Mech, R., Tuncel, E., and Sikora, T. 1998. Image sequence analysis for emerging interactive multimedia services-the european cost 211 framework. IEEE Trans. Circ. Syst. Video Tech. 8, 7, 802--813.

Digital Library

[4]

Arman, F., Depommier, R., Hsu, A., and Chiu, M. 1994. Content-based browsing of video sequences. In ACM Multimedia. ACM, New York, 97--103.

Digital Library

[5]

Chang, E., Goh, K., Sychay, G., and Wu, G. 2002. Cbsa: Content-based annotation for multimodal image retrieval using bayes point machines. IEEE Trans. Circ. Syst. Video Tech. 13, 1, 26--38.

Digital Library

[6]

Chang, S.-F. 2002. Optimal video adaptation and skimming using a utility-based framework. In Proceedings of the International Tyrrhenian Workshop on Digital Communications.

[7]

Chang, S.-F., Chen, W., and Sundaram, H. 1998. Semantic visual templates: linking visual features to semantics. In Proceedings of the International Conference on Image Processing. Vol. 3. IEEE Computer Society Press, Los Alamitos, CA, 531--535.

[8]

Cohen, I., Sebe, N., Cozman, F., Cirelo, M., and Huang, T. 2004. Semi-supervised learning of classifiers: Theory and algorithms and their applications to human-computer interaction. IEEE Trans. Patt. Anal. Mach. Intel. 26, 12, 1553--1567.

Digital Library

[9]

Correia, P. and Pereira, F. 2004. Classification of video segmentation application scenarios. IEEE Trans. Circ. Syst. Video Tech. 14, 5, 735--741.

Digital Library

[10]

Cristianini, N. and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, MA.

Digital Library

[11]

Deshpande, S. and Hwang, J.-N. 2001. A real-time interactive virtual classroom multimedia distance learning system. IEEE Trans. Multimed. 3, 4, 432--444.

Digital Library

[12]

Dimitrova, N., Agnihotri, L., and Wei, G. 2000. Video classification based on hmm using text and faces. In ACM Multimedia. ACM, New York, 499--500.

[13]

Djeraba, C. 2000. When image indexing meets knowledge discovery. In MDM/KDD. ACM, New York, 73--81.

[14]

Djeraba, C. 2002. Multimedia Mining: A Highway to Intelligent Multimedia Documents. Kluwer.

Digital Library

[15]

Ebadollahi, S., Chang, S.-F., and Wu, H. 2002. Echocardiogram videos: Summarization, temporal segmentation and browsing. In Proceedings of the International Conference on Image Processing. IEEE Computer Society Press, Los Alamitos, CA, I--613--I--616.

[16]

Ekin, A., Tekalp, A., and Mehrotra, R. 2003. Automatic soccer video analysis and summarization. IEEE Trans. Image Process. 12, 796--807.

Digital Library

[17]

Fan, J., Luo, H., and Elmagarmid, A. 2004. Concept-oriented indexing of video database toward more effective retrieval and browsing. IEEE Trans. Image Process. 13, 7, 974--992.

Digital Library

[18]

Fan, J., Yau, D., Elmagarmid, A., and Aref, W. 2001. Image segmentation by integrating color edge detection and seeded region growing. IEEE Trans. Image Process. 10, 1454--1466.

Digital Library

[19]

Fan, R.-E., Chen, P.-H., and Lin, C.-J. 2005. Working set selection using the second order information for training svm. J. Mach. Learn. Res. 6, 1889--1918.

Digital Library

[20]

Fischer, S., Lienhart, R., and Effelsberg, W. 1995. Automatic recognition of film genres. In ACM Multimedia. ACM, New York, 367--368.

Digital Library

[21]

Freund, Y. and Schapire, R. 1996. Experiments with a new boosting algorithm. In Proceedings of the International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 148--156.

[22]

Gatica-Perez, D., Loui, A., and Sun, M.-T. 2003. Finding structure in home videos by probabilistic hierarchical clustering. IEEE Trans. Circ. Syst. Video Tech. 13, 6, 539--548.

Digital Library

[23]

Greenspan, H., Goldberger, J., and Mayer, A. 2004. Probabilistic space-time video modeling via piecewise gmm. IEEE Trans. Patt. Anal. Mach. Intel. 26, 3, 384--396.

Digital Library

[24]

Haering, N., Qian, R., and Sezan, M. 2000. A semantic event-based detection approach and its application to detecting hunts in wildlife video. IEEE Trans. Circ. Syst. Video Tech. 10, 6, 857--868.

Digital Library

[25]

Hanjalic, A., Lagendijk, R., and Biomond, J. 1999. Automated high-level movie segmentation for advanced video retrieval system. IEEE Trans. Circ. Syst. Video Tech. 9, 4, 580--588.

Digital Library

[26]

He, L., Sanocki, E., Gupta, A., and Grudin, J. 1999. Auto-summarization of audio-video presentations. In ACM Multimedia. ACM, New York, 489--498.

Digital Library

[27]

Jaimes, A. and Chang, S. 2001. Learning structured visual detectors from user input at multiple levels. Int. J. Image Graph. 1, 3, 415--444.

[28]

Joachims, T. 1999. Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning. Morgan, Kaufmann, San Francisco, CA, 200--209.

Digital Library

[29]

Kender, J. and Yeo, B.-L. 1998. Video scene segmentation via continuous video coherence. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press, Los Alamitos, CA, 367--373.

Digital Library

[30]

Lew, M. 2001. Principles of Visual Information Retrieval. Springer-Verlag, New York.

Digital Library

[31]

Li, Y., Park, Y., and Dorai, C. 2006. Atomic topical segments detection for instructional videos. In ACM Multimedia. ACM, New York, 53--56.

Digital Library

[32]

Liu, T. and Kender, J. 2004. Lecture videos for e-learning: Current research and challenges. In IEEE International Symposium on Multimedia Software Engineering. IEEE Computer Society Press, Los Alamitos, CA, 574--578.

Digital Library

[33]

Liu, Z., Wang, Y., and Chen, T. 1998. Audio feature extraction and analysis for scene segmentation and classification. J. VLSI Signal Process. Syst. 20, 1, 61--79.

Digital Library

[34]

Luo, H., Fan, J., Gao, Y., and Xu, G. 2004. Multimodal salient objects: General building blocks of semantic video concepts. In Proceedings of the International Conference on Image and Video Retrieval. Springer, Berlin /Heidelberg, Germany, 374--383.

[35]

Ma, Y., Lu, L., Zhang, H., and Li, M. 2002. A user attention model for video summarization. In ACM Multimedia. ACM, New York, 533--542.

Digital Library

[36]

Naphade, M. and Huang, T. 2001. A probabilistic framework for semantic video indexing, filtering, and retrival. IEEE Trans. Multimed. 3, 141--151.

Digital Library

[37]

O'Sullivan, J., Langford, J., and Blum, A. 2000. Featureboost: A meta learning algorithm that improves model robustness. In Proceedings of the International Conference on Machine Learning. Morgan, Kaufmann, San Francisco, CA, 703--710.

Digital Library

[38]

Pfeiffer, S., Lienhart, R., and Effelsberg, W. 1999. Scene determination based on video and audio features. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Vol. 15. IEEE Computer Society Press, Los Alamitos, CA, 685--690.

Digital Library

[39]

Platt, J. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adavances in Large Margin Classifiers, MIT Press, Cambridge, MA.

[40]

Qi, Y., Liu, T., and Hauptmann, A. 2003. Supervised classification of video shot segmentation. In International Conference on Multimedia and Expo. IEEE Computer Society Press, Los Alamitos, CA, II--689--92.

Digital Library

[41]

Sebe, N., Lew, M., and Smeulders, A. 2003. Video retrieval and summarization. Comput. Vision Image Understand. 92, 2, 146--152.

Digital Library

[42]

Smith, M. and Kanade, T. 1995. Video skimming for quick browsing based on audio and image characterization. Tech. rep., CMU: TR-CMU-CS-95-186.

[43]

Snoek, C. and Morring, M. 2003. Multimodal video indexing: A state of the art review. Multimed. Tools Appl. 25, 1, 5--35.

Digital Library

[44]

Sundaram, H. and Chang, S. 2002a. Computable scenes and structures in films. IEEE Trans. Multimed. 4, 482--491.

Digital Library

[45]

Sundaram, H. and Chang, S.-F. 2002b. Video skims: Taxonomies and an optimal generation framework. In Proceedings of the International Conference on Image Processing. IEEE Computer Society Press, Los Alamitos, CA, II--21--II--24.

[46]

Sundaram, H., Xie, L., and Chang, S.-F. 2002. A unility framework for the automatic generation of audio-visual skims. In ACM Multimedia. ACM, New York, 189--198.

Digital Library

[47]

Tieu, K. and Viola, P. 2000. Boosting image retrieval. Int. J. Comput. Vision 56, 1, 17--36.

Digital Library

[48]

Vapnik, V. 1998. Statistical Learning Theory. Wiley-Interscience, New York.

[49]

Xie, L., Xu, P., Chang, S., Divakaran, A., and Sun, H. 2003. Structure analysis of soccer video with domain knowledge and hidden Markov models. Pattern Recognition Letters 24, 767--775.

Digital Library

[50]

Zhang, D. and Nunamaker, J. 2004. A natural language approach to content-based video indexing and retrieval for interactive e-learning. IEEE Trans. Multimed. 6, 3, 450--458.

Digital Library

[51]

Zhang, H., Kankanhalli, A., and Smoliar, S. 1993. Automatic parsing of video. In International Conference on Multimedia Systems. Vol. 1. IEEE Computer Society Press, Los Alamitos, CA, 45--54.

[52]

Zhou, W., Vellaikal, A., and Kuo, C. 2000. Rule-based video classification system for basketball video indexing. In ACM Multimedia. ACM, New York, 213--216.

Digital Library

Cited By

Luo HFan JZhou Y(2018)Multimedia news exploration and retrieval by integrating keywords, relations and visual featuresMultimedia Tools and Applications10.1007/s11042-010-0639-351:2(625-648)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11042-010-0639-3
Rivera ENishihara A(2012)Advanced Mobile Lecture ViewingInternational Journal of Handheld Computing Research10.4018/jhcr.20120401043:2(58-72)Online publication date: Apr-2012
https://doi.org/10.4018/jhcr.2012040104
Gong WZhou YLuo HFan JZhou A(2010)Automatic filtering algorithm for imbalanced classification2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery10.1109/FSKD.2010.5569437(1853-1857)Online publication date: Aug-2010
https://doi.org/10.1109/FSKD.2010.5569437

Index Terms

Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming
1. Computing methodologies
  1. Machine learning

Recommendations

Semantic video classification by integrating unlabeled samples for classifier training
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Semantic video classification has become an active research topic to enable more effective video retrieval and knowledge discovery from large-scale video databases. However, most existing techniques for classifier training require a large number of hand-...
Concept-oriented video skimming via semantic video classification
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

Effective video skimming requires a good understanding of the semantics of video contents. However, more existing systems for content-based video retrieval (CBVR) can only support low-level video analysis, but they have limited effectiveness on ...
Semantic video classification and feature subset selection under context and concept uncertainty
JCDL '04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries

As large collections of videos become one key component of digital libraries, there is an urgent need of semantic video classification and feature subset selection to enable more effective video database organization and retrieval. However, most ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 4, Issue 1

January 2008

197 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/1324287

Issue’s Table of Contents

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2008

Accepted: 01 October 2007

Revised: 01 December 2006

Received: 01 June 2006

Published in TOMM Volume 4, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Division of Information and Intelligent Systems

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
748
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Luo HFan JZhou Y(2018)Multimedia news exploration and retrieval by integrating keywords, relations and visual featuresMultimedia Tools and Applications10.1007/s11042-010-0639-351:2(625-648)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1007/s11042-010-0639-3
Rivera ENishihara A(2012)Advanced Mobile Lecture ViewingInternational Journal of Handheld Computing Research10.4018/jhcr.20120401043:2(58-72)Online publication date: Apr-2012
https://doi.org/10.4018/jhcr.2012040104
Gong WZhou YLuo HFan JZhou A(2010)Automatic filtering algorithm for imbalanced classification2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery10.1109/FSKD.2010.5569437(1853-1857)Online publication date: Aug-2010
https://doi.org/10.1109/FSKD.2010.5569437

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents