Abstract
Enabling a rapid on-the-fly view of the content of a movie requires segmenting the movie and describing the segments in a user-compatible manner. The difficulty resides in extracting relevant semantic information from the audiovisual signal, both for the segmentation and the description. We introduce in this paper audio scenes and chapters in movies and present an algorithm for automatically segmenting a video based on the audio stream only. Audio scenes and chapters are defined as the equivalent of shots and scenes in the visual domain. A tree-like audio-based structure of a video is proposed. A chapter is then classified into different chapter categories. The automatic solution to audio scene and chapter segmentation and classification is evaluated on manually segmented and classified videos
Similar content being viewed by others
References
TREC video retrieval evaluation, [http://www-nlpir.nist.gov/projects/tv2004.].
Gargi, U., Kasturi, R., Strayer, S.: Performance characterization of Video-Shot-Change detection methods. IEEE Trans. Circuits Syst. Video Technol. 10(1):1–13 (2000)
Mahdi, W., Ardabilian, M., Chen, L.: Automatic video scene segmentation based on spatial-temporal clues and rhythm. Netw. Inform. Syst. J. 2(5):1–25 (2000)
Fan, J., Elmagarmid, A., Zhu, X., Aref, W., Wu, L.: ClusterView: Hierarchical Video Shot Classification, indexing and accessing. IEEE Trans. Multimedia 6(1):70–86 (2004)
Huang, C.L., Liao, B.Y.: A robust scene-change detection method for video segmentation. IEEE Trans. Circuits Syst. Video Technol. 11(12):1281–1288 (2001)
Rui, Y., Huang, T., Mehrotra, S.: Constructing table-of-content for videos. Multimedia Syst. 7(5):359–368 (1999)
Sundaram, H., Chang, S.F.: Audio scene segmentation using multiple features, models and time scales. Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP, vol.4, pp. 2441–2444 (2000)
Cao, Y., Tavanapong, W., Kim, K., Oh, J.: Audio assisted scene segmentation for story browsing. Proceedings of International Conference on Imaging and Video Retrieval, Urbana-Chanpaign, IL, USA., pp. 446–455 (2003)
Minami, K., Akutsu, A., Hamada, H., Tomomura, Y.: Video handling with music and speech detection. IEEE Multimedia 5(3):17–25 (1998)
Pfeiffer, S.: Scene determination based on video and audio features. Multimedia Tools Appl. 15(1):59–81 (2001)
Chen, S.C., Shyu, M.L., Liao, W., Zhang, C.: Scene change detection by audio and video clues. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME2002), vol. 3, pp. 365–368 (2002)
Alatan, A., Akansu, A., Wolf, W.: Multi-modal dialog scene detection using hidden markov models for content-based multimedia indexing. Multimedia Tools Appl. 14:137–151 (2001)
Saraceno, C., Leonardi, R.: Indexing audiovisual databases through joint audio and video processing. Int. J. Imaging Syst. Technol. 9(5):320–331 (1998)
Harb, H., Chen, L.: Voice-based gender identification in multimedia applications. Journal of Intelligent Information Systems 24(2–3): 179–198 (2005)
Gold, B., Morgan, N.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley, New York (1999)
Harb, H., Chen, L.: Highlights detection in sports videos based on audio analysis. Proceedings of the Third International Workshop on Content-Based Multimedia Indexing CBMI03, September 22–24, IRISA, Rennes, France, pp. 223–229 (2003)
Harb, H., Chen, L.: A Query by Example Music Retrieval Algorithm. Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services WIAMIS03, University of London, UK, 9–11 April, pp. 122–128 (2003)
Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: DARPA speech recognition workshop (1998)
Cover, T., Thomas, J.: Elements of Information Theory, Wiley Series in Telecommunications. Wiley, New York (1991)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Harb, H., Chen, L. Audio-based description and structuring of videos. Int J Digit Libr 6, 70–81 (2006). https://doi.org/10.1007/s00799-005-0120-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-005-0120-5