Movie story intensity representation through audiovisual tempo analysis

Yeh, Chia-Hung; Kuo, Chih-Hung; Liou, Rung-Wen

doi:10.1007/s11042-009-0278-8

Movie story intensity representation through audiovisual tempo analysis

Published: 12 May 2009

Volume 44, pages 205–228, (2009)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chia-Hung Yeh¹,
Chih-Hung Kuo² &
Rung-Wen Liou²

371 Accesses
4 Citations
Explore all metrics

Abstract

A comprehensive method for movie abstraction is developed in this research for applications in fast movie content exploring, indexing, browsing, and skimming, Most current approaches rely heavily on specific domain knowledge or models to identify and extract the determining scenes of a given movie; however, the segments extracted are often isolated, presenting a fragmented outline of the original. Our proposed method fuses simple audiovisual features, and measures the “tempos” of a movie directly, especially that of long-term ones. These tempos form a curve that catches the high-level semantics of a movie, indicating the events of interests named as “story intensity.” Through tempo, the proposed algorithm provides a natural way that segments a movie into manageable parts. As our experimental results demonstrate, the condensed skimming clips efficiently extract semantic content that contains the most interesting and informative parts of the original movie.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Creating Summaries from User Videos

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Article Open access 07 August 2017

Generating Summary Videos Based on Visual and Sound Information from Movies

References

Benini S, Migliorati P, Leonardi R (2007) A statistical framework for video skimming based on logical story units and motion activity. In: Proceedings of international workshop on content-based multimedia Indexing. IEEE, Piscataway, pp 152–156
Chapter Google Scholar
Block B (2001) The visual story: seeing the structure of film, TV, and new media. Focal, Boston
Google Scholar
Fischer S, Lienhart R, Effelsberg W (1995) Automatic recognition of film genres. In: Proceedings of international ACM conference on multimedia. ACM, New York, pp 295–304
Chapter Google Scholar
Gargi U, Kasturi R, Strayer SH (2000) Performance characterization of video-shot-change detection methods. IEEE Trans Circuits Syst Video Technol 10(1):1–13
Article Google Scholar
Gong Y, Sin L-T, Chuan C-H, Zhang H-J, Sakauchi M (1995) Automatic parsing of TV soccer programs. In: Proceedings of the international conference on multimedia computing and systems. IEEE, Piscataway, pp 167–174
Chapter Google Scholar
Gouyon F, Pachet F, Delerue O (2000) On the use of zero-crossing rate for an application of classification of percussive sounds. In: Proceedings of the COST G-6 conference on digital audio effects, Verona, 7–9 December 2000, pp 1–6
Hanjalic A (2003) Generic approach to highlights extraction from a sport video. In: Proceedings of the IEEE international conference on image processing. IEEE, Piscataway, pp 1–4
Google Scholar
Hanjalic A (2003) Multimodal approach to measuring excitement in video. In: Proceedings of the IEEE international conferences on multimedia and expo. IEEE, Piscataway, pp 289–292
Google Scholar
Huang CL, Liao BY (2001) A robust scene-change detection method for video segmentation. IEEE Trans Circuit Syst Video Technol 11(12):1281–1288
Article Google Scholar
Jasinschi RS, Dimitrova N, McGee T, Agnihotri L, Zimmerman J, Li D, Louie J (2002) A probabilistic layered framework for integrating multimedia content and context information. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing. IEEE, Piscataway, pp 2057–2060
Google Scholar
Lee S-H, Yeh C-H, Jay Kuo C-C (2004) Automatic movie skimming with story units via general tempo analysis. In: Proceedings of SPIE electronic image storage and retrieval methods and applications for multimedia, vol 5307. SPIE, Bellingham, pp 396–407
Google Scholar
Li Y (2002) Content-based video analysis, indexing and representation using multimodal information. PhD dissertation, USC
Li Y, Jay Kuo C-C (2004) Video content analysis using multimodal information. Kluwer, Dordrecht
Google Scholar
Li Y, Lee S-H, Yeh C-H, Jay Kuo C-C (2006) Techniques for movie content analysis and skimming. IEEE Signal Process Mag 23(2):79–89
Article MATH Google Scholar
Liu Z, Huang J, Wang Y (1998) Classification of TV programs based on audio information using hidden Markov model. In: Proceedings of IEEE workshop multimedia signal processing. IEEE, Piscataway, pp 27–32
Google Scholar
Ma Y-F, Hua X-S, Lu L, Zhang H-J (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans Circuits Syst Video Technol 7(5):907–919
Google Scholar
Naphade MR, Kozintsev IV, Huang TS (2002) A factor graph framework for semantics video indexing. IEEE Trans Circuit Syst Video Technol 12(1):40–52
Article Google Scholar
Ngo C-W, Ma Y-F, Zhang H-J (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305
Article Google Scholar
Pfeiffer S, Lienhart R, Fischer S, Effelsberg W (1996) Abstracting digital movies automatically. J Vis Commun Image Represent 7(4):345–353
Article Google Scholar
Scheirer ED (1998) Tempo and beat analysis of acoustic musical signals. J Acoust Soc Am 103(1):588–601
Article Google Scholar
Sharff S (1982) The elements of cinema: towards a theory of cinesthetic impact. Columbia University Press, New York
Google Scholar
Smith M, Kanade T (1995) Video skimming for quick browsing based on audio and image characterization. Tech Rep CMU-CS-95-186, 1–12
Smith M, Kanade T (1997) Video skimming and characterization through the combination of image and language understanding techniques. In: Proceedings of the IEEE computer vision and pattern recognition. IEEE, Piscataway, pp 775–781
Chapter Google Scholar
Sundaram H, Chang S-F (2000) Determining computable scenes in films and their structures using audio-visual memory models. In: Proceedings of the eighth ACM international conference on multimedia. ACM, New York, pp 95–104
Chapter Google Scholar
Sundaram H, Chang S-F (2001) Condensing computable scenes using visual complexity and film syntax analysis. In: Proceedings of the IEEE international conference on multimedia and expo. IEEE, Piscataway, pp 389–392
Google Scholar
Sundaram H, Kie L, Chang S-F (2002) A utility framework for the automatic generation of audio-visual skims. In: Proceedings of international ACM conference on multimedia. ACM, New York, pp 189–198
Google Scholar
Toklu C, Liou SP (2000) Automatic keyframe selection for content-based video indexing and access. Proc SPIE 3972:554–563
Article Google Scholar
Wang Y, Liu Z, Huang J-C (2000) Multimedia content analysis: using both audio and visual clues. IEEE Signal Process Mag 17(6):12–36
Article Google Scholar
Yeh C-H, Lee S-H, Jay Kuo C-C (2005) Content-based video analysis for knowledge discovery. In: Chen CH, Wang PSP (eds) Handbook of pattern recognition and computer vision 3th edition version. World Scientific, Singapore. ISBN: 981-256-105-6
Google Scholar
Yeo BL, Liu B (1995) Rapid scene analysis on compressed video. IEEE Trans Circuits Syst Video Technol 5(6):533–544
Article Google Scholar
Zhai S-L, Luo B, Tang J, Zhang C-Y (2007) Video abstraction based on relational graphs. In: Proceedings of the fourth international conference on image and graphics. IEEE, Piscataway, pp 827–832
Chapter Google Scholar
Zhang T, Jay Kuo C-C (1999) Heuristic approach for generic audio data segmentation and annotation. In: Proceedings of the seventh ACM international conference on multimedia. ACM, New York, pp 67–76
Chapter Google Scholar
Zhang HJ, Kankanhalli A, Smoliar SW (1993) Automatic partitioning of full-motion video. Multimedia Syst 1(1):10–28
Article Google Scholar
Zhou W, Dao S, Jay Kuo C-C (2002) On-line knowledge-based and rule-based video classification system for video indexing and dissemination. Inf Syst 27:559–586
Article MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the the National Science Council of the Republic of China for financially supporting this research under Contracts No. NSC95-2218-E-259-047 and NSC96-2628-E-110-020-MY2.

Author information

Authors and Affiliations

Department of Electrical Engineering, National Sun Yat-Sen University, No. 70, Lien-hai Road, Kushan District, Kaohsiung, 80424, Taiwan (R.O.C.)
Chia-Hung Yeh
Department of Electrical Engineering, National Cheng Kung University, No.1, University Road, Tainan, 701, Taiwan (R.O.C.)
Chih-Hung Kuo & Rung-Wen Liou

Authors

Chia-Hung Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Hung Kuo
View author publications
You can also search for this author in PubMed Google Scholar
Rung-Wen Liou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chia-Hung Yeh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yeh, CH., Kuo, CH. & Liou, RW. Movie story intensity representation through audiovisual tempo analysis. Multimed Tools Appl 44, 205–228 (2009). https://doi.org/10.1007/s11042-009-0278-8

Download citation

Published: 12 May 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s11042-009-0278-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Movie story intensity representation through audiovisual tempo analysis

Abstract

Access this article

Similar content being viewed by others

Creating Summaries from User Videos

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Generating Summary Videos Based on Visual and Sound Information from Movies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Movie story intensity representation through audiovisual tempo analysis

Abstract

Access this article

Similar content being viewed by others

Creating Summaries from User Videos

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Generating Summary Videos Based on Visual and Sound Information from Movies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation