skip to main content
research-article

Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming

Published: 11 February 2008 Publication History

Abstract

For online medical education purposes, we have developed a novel scheme to incorporate the results of semantic video classification to select the most representative video shots for generating concept-oriented summarization and skimming of surgery education videos. First, salient objects are used as the video patterns for feature extraction to achieve a good representation of the intermediate video semantics. The salient objects are defined as the salient video compounds that can be used to characterize the most significant perceptual properties of the corresponding real world physical objects in a video, and thus the appearances of such salient objects can be used to predict the appearances of the relevant semantic video concepts in a specific video domain. Second, a novel multi-modal boosting algorithm is developed to achieve more reliable video classifier training by incorporating feature hierarchy and boosting to dramatically reduce both the training cost and the size of training samples, thus it can significantly speed up SVM (support vector machine) classifier training. In addition, the unlabeled samples are integrated to reduce the human efforts on labeling large amount of training samples. Finally, the results of semantic video classification are incorporated to enable concept-oriented video summarization and skimming. Experimental results in a specific domain of surgery education videos are provided.

Supplementary Material

JPG File (a1-luo.jpg)
MOV File (a1-luo.mov)

References

[1]
Adames, B., Dorai, C., and Venkatesh, S. 2002. Towards automatic extraction of expressive elements of motion pictures: Tempo. IEEE Trans. Multimedia 4, 4, 472--481.
[2]
Adams, W., Iyengar, G., Lin, C.-Y., Naphade, M., Neti, C., Nock, H., and Smith, J. 2003. Semantic indexing of multimedia content using visual, audio and text cues. EURASIP J. Appl. Sig. Proc. 2, 1--16.
[3]
Alatan, A., Onural, L., Wollborn, M., Mech, R., Tuncel, E., and Sikora, T. 1998. Image sequence analysis for emerging interactive multimedia services-the european cost 211 framework. IEEE Trans. Circ. Syst. Video Tech. 8, 7, 802--813.
[4]
Arman, F., Depommier, R., Hsu, A., and Chiu, M. 1994. Content-based browsing of video sequences. In ACM Multimedia. ACM, New York, 97--103.
[5]
Chang, E., Goh, K., Sychay, G., and Wu, G. 2002. Cbsa: Content-based annotation for multimodal image retrieval using bayes point machines. IEEE Trans. Circ. Syst. Video Tech. 13, 1, 26--38.
[6]
Chang, S.-F. 2002. Optimal video adaptation and skimming using a utility-based framework. In Proceedings of the International Tyrrhenian Workshop on Digital Communications.
[7]
Chang, S.-F., Chen, W., and Sundaram, H. 1998. Semantic visual templates: linking visual features to semantics. In Proceedings of the International Conference on Image Processing. Vol. 3. IEEE Computer Society Press, Los Alamitos, CA, 531--535.
[8]
Cohen, I., Sebe, N., Cozman, F., Cirelo, M., and Huang, T. 2004. Semi-supervised learning of classifiers: Theory and algorithms and their applications to human-computer interaction. IEEE Trans. Patt. Anal. Mach. Intel. 26, 12, 1553--1567.
[9]
Correia, P. and Pereira, F. 2004. Classification of video segmentation application scenarios. IEEE Trans. Circ. Syst. Video Tech. 14, 5, 735--741.
[10]
Cristianini, N. and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, MA.
[11]
Deshpande, S. and Hwang, J.-N. 2001. A real-time interactive virtual classroom multimedia distance learning system. IEEE Trans. Multimed. 3, 4, 432--444.
[12]
Dimitrova, N., Agnihotri, L., and Wei, G. 2000. Video classification based on hmm using text and faces. In ACM Multimedia. ACM, New York, 499--500.
[13]
Djeraba, C. 2000. When image indexing meets knowledge discovery. In MDM/KDD. ACM, New York, 73--81.
[14]
Djeraba, C. 2002. Multimedia Mining: A Highway to Intelligent Multimedia Documents. Kluwer.
[15]
Ebadollahi, S., Chang, S.-F., and Wu, H. 2002. Echocardiogram videos: Summarization, temporal segmentation and browsing. In Proceedings of the International Conference on Image Processing. IEEE Computer Society Press, Los Alamitos, CA, I--613--I--616.
[16]
Ekin, A., Tekalp, A., and Mehrotra, R. 2003. Automatic soccer video analysis and summarization. IEEE Trans. Image Process. 12, 796--807.
[17]
Fan, J., Luo, H., and Elmagarmid, A. 2004. Concept-oriented indexing of video database toward more effective retrieval and browsing. IEEE Trans. Image Process. 13, 7, 974--992.
[18]
Fan, J., Yau, D., Elmagarmid, A., and Aref, W. 2001. Image segmentation by integrating color edge detection and seeded region growing. IEEE Trans. Image Process. 10, 1454--1466.
[19]
Fan, R.-E., Chen, P.-H., and Lin, C.-J. 2005. Working set selection using the second order information for training svm. J. Mach. Learn. Res. 6, 1889--1918.
[20]
Fischer, S., Lienhart, R., and Effelsberg, W. 1995. Automatic recognition of film genres. In ACM Multimedia. ACM, New York, 367--368.
[21]
Freund, Y. and Schapire, R. 1996. Experiments with a new boosting algorithm. In Proceedings of the International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 148--156.
[22]
Gatica-Perez, D., Loui, A., and Sun, M.-T. 2003. Finding structure in home videos by probabilistic hierarchical clustering. IEEE Trans. Circ. Syst. Video Tech. 13, 6, 539--548.
[23]
Greenspan, H., Goldberger, J., and Mayer, A. 2004. Probabilistic space-time video modeling via piecewise gmm. IEEE Trans. Patt. Anal. Mach. Intel. 26, 3, 384--396.
[24]
Haering, N., Qian, R., and Sezan, M. 2000. A semantic event-based detection approach and its application to detecting hunts in wildlife video. IEEE Trans. Circ. Syst. Video Tech. 10, 6, 857--868.
[25]
Hanjalic, A., Lagendijk, R., and Biomond, J. 1999. Automated high-level movie segmentation for advanced video retrieval system. IEEE Trans. Circ. Syst. Video Tech. 9, 4, 580--588.
[26]
He, L., Sanocki, E., Gupta, A., and Grudin, J. 1999. Auto-summarization of audio-video presentations. In ACM Multimedia. ACM, New York, 489--498.
[27]
Jaimes, A. and Chang, S. 2001. Learning structured visual detectors from user input at multiple levels. Int. J. Image Graph. 1, 3, 415--444.
[28]
Joachims, T. 1999. Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning. Morgan, Kaufmann, San Francisco, CA, 200--209.
[29]
Kender, J. and Yeo, B.-L. 1998. Video scene segmentation via continuous video coherence. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press, Los Alamitos, CA, 367--373.
[30]
Lew, M. 2001. Principles of Visual Information Retrieval. Springer-Verlag, New York.
[31]
Li, Y., Park, Y., and Dorai, C. 2006. Atomic topical segments detection for instructional videos. In ACM Multimedia. ACM, New York, 53--56.
[32]
Liu, T. and Kender, J. 2004. Lecture videos for e-learning: Current research and challenges. In IEEE International Symposium on Multimedia Software Engineering. IEEE Computer Society Press, Los Alamitos, CA, 574--578.
[33]
Liu, Z., Wang, Y., and Chen, T. 1998. Audio feature extraction and analysis for scene segmentation and classification. J. VLSI Signal Process. Syst. 20, 1, 61--79.
[34]
Luo, H., Fan, J., Gao, Y., and Xu, G. 2004. Multimodal salient objects: General building blocks of semantic video concepts. In Proceedings of the International Conference on Image and Video Retrieval. Springer, Berlin /Heidelberg, Germany, 374--383.
[35]
Ma, Y., Lu, L., Zhang, H., and Li, M. 2002. A user attention model for video summarization. In ACM Multimedia. ACM, New York, 533--542.
[36]
Naphade, M. and Huang, T. 2001. A probabilistic framework for semantic video indexing, filtering, and retrival. IEEE Trans. Multimed. 3, 141--151.
[37]
O'Sullivan, J., Langford, J., and Blum, A. 2000. Featureboost: A meta learning algorithm that improves model robustness. In Proceedings of the International Conference on Machine Learning. Morgan, Kaufmann, San Francisco, CA, 703--710.
[38]
Pfeiffer, S., Lienhart, R., and Effelsberg, W. 1999. Scene determination based on video and audio features. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Vol. 15. IEEE Computer Society Press, Los Alamitos, CA, 685--690.
[39]
Platt, J. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adavances in Large Margin Classifiers, MIT Press, Cambridge, MA.
[40]
Qi, Y., Liu, T., and Hauptmann, A. 2003. Supervised classification of video shot segmentation. In International Conference on Multimedia and Expo. IEEE Computer Society Press, Los Alamitos, CA, II--689--92.
[41]
Sebe, N., Lew, M., and Smeulders, A. 2003. Video retrieval and summarization. Comput. Vision Image Understand. 92, 2, 146--152.
[42]
Smith, M. and Kanade, T. 1995. Video skimming for quick browsing based on audio and image characterization. Tech. rep., CMU: TR-CMU-CS-95-186.
[43]
Snoek, C. and Morring, M. 2003. Multimodal video indexing: A state of the art review. Multimed. Tools Appl. 25, 1, 5--35.
[44]
Sundaram, H. and Chang, S. 2002a. Computable scenes and structures in films. IEEE Trans. Multimed. 4, 482--491.
[45]
Sundaram, H. and Chang, S.-F. 2002b. Video skims: Taxonomies and an optimal generation framework. In Proceedings of the International Conference on Image Processing. IEEE Computer Society Press, Los Alamitos, CA, II--21--II--24.
[46]
Sundaram, H., Xie, L., and Chang, S.-F. 2002. A unility framework for the automatic generation of audio-visual skims. In ACM Multimedia. ACM, New York, 189--198.
[47]
Tieu, K. and Viola, P. 2000. Boosting image retrieval. Int. J. Comput. Vision 56, 1, 17--36.
[48]
Vapnik, V. 1998. Statistical Learning Theory. Wiley-Interscience, New York.
[49]
Xie, L., Xu, P., Chang, S., Divakaran, A., and Sun, H. 2003. Structure analysis of soccer video with domain knowledge and hidden Markov models. Pattern Recognition Letters 24, 767--775.
[50]
Zhang, D. and Nunamaker, J. 2004. A natural language approach to content-based video indexing and retrieval for interactive e-learning. IEEE Trans. Multimed. 6, 3, 450--458.
[51]
Zhang, H., Kankanhalli, A., and Smoliar, S. 1993. Automatic parsing of video. In International Conference on Multimedia Systems. Vol. 1. IEEE Computer Society Press, Los Alamitos, CA, 45--54.
[52]
Zhou, W., Vellaikal, A., and Kuo, C. 2000. Rule-based video classification system for basketball video indexing. In ACM Multimedia. ACM, New York, 213--216.

Cited By

View all
  • (2018)Multimedia news exploration and retrieval by integrating keywords, relations and visual featuresMultimedia Tools and Applications10.1007/s11042-010-0639-351:2(625-648)Online publication date: 31-Dec-2018
  • (2012)Advanced Mobile Lecture ViewingInternational Journal of Handheld Computing Research10.4018/jhcr.20120401043:2(58-72)Online publication date: Apr-2012
  • (2010)Automatic filtering algorithm for imbalanced classification2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery10.1109/FSKD.2010.5569437(1853-1857)Online publication date: Aug-2010

Index Terms

  1. Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 4, Issue 1
    January 2008
    197 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/1324287
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 February 2008
    Accepted: 01 October 2007
    Revised: 01 December 2006
    Received: 01 June 2006
    Published in TOMM Volume 4, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Semantic video classification
    2. concept-oriented video skimming
    3. feature hierarchy
    4. multi-modal boosting
    5. salient objects
    6. unlabeled samples

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Multimedia news exploration and retrieval by integrating keywords, relations and visual featuresMultimedia Tools and Applications10.1007/s11042-010-0639-351:2(625-648)Online publication date: 31-Dec-2018
    • (2012)Advanced Mobile Lecture ViewingInternational Journal of Handheld Computing Research10.4018/jhcr.20120401043:2(58-72)Online publication date: Apr-2012
    • (2010)Automatic filtering algorithm for imbalanced classification2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery10.1109/FSKD.2010.5569437(1853-1857)Online publication date: Aug-2010

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media