Skip to main content
Log in

Audio-based description and structuring of videos

  • Regular Paper
  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

Enabling a rapid on-the-fly view of the content of a movie requires segmenting the movie and describing the segments in a user-compatible manner. The difficulty resides in extracting relevant semantic information from the audiovisual signal, both for the segmentation and the description. We introduce in this paper audio scenes and chapters in movies and present an algorithm for automatically segmenting a video based on the audio stream only. Audio scenes and chapters are defined as the equivalent of shots and scenes in the visual domain. A tree-like audio-based structure of a video is proposed. A chapter is then classified into different chapter categories. The automatic solution to audio scene and chapter segmentation and classification is evaluated on manually segmented and classified videos

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. TREC video retrieval evaluation, [http://www-nlpir.nist.gov/projects/tv2004.].

  2. Gargi, U., Kasturi, R., Strayer, S.: Performance characterization of Video-Shot-Change detection methods. IEEE Trans. Circuits Syst. Video Technol. 10(1):1–13 (2000)

    Article  Google Scholar 

  3. Mahdi, W., Ardabilian, M., Chen, L.: Automatic video scene segmentation based on spatial-temporal clues and rhythm. Netw. Inform. Syst. J. 2(5):1–25 (2000)

    Google Scholar 

  4. Fan, J., Elmagarmid, A., Zhu, X., Aref, W., Wu, L.: ClusterView: Hierarchical Video Shot Classification, indexing and accessing. IEEE Trans. Multimedia 6(1):70–86 (2004)

    Article  Google Scholar 

  5. Huang, C.L., Liao, B.Y.: A robust scene-change detection method for video segmentation. IEEE Trans. Circuits Syst. Video Technol. 11(12):1281–1288 (2001)

    Google Scholar 

  6. Rui, Y., Huang, T., Mehrotra, S.: Constructing table-of-content for videos. Multimedia Syst. 7(5):359–368 (1999)

    Article  Google Scholar 

  7. Sundaram, H., Chang, S.F.: Audio scene segmentation using multiple features, models and time scales. Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP, vol.4, pp. 2441–2444 (2000)

  8. Cao, Y., Tavanapong, W., Kim, K., Oh, J.: Audio assisted scene segmentation for story browsing. Proceedings of International Conference on Imaging and Video Retrieval, Urbana-Chanpaign, IL, USA., pp. 446–455 (2003)

  9. Minami, K., Akutsu, A., Hamada, H., Tomomura, Y.: Video handling with music and speech detection. IEEE Multimedia 5(3):17–25 (1998)

    Article  Google Scholar 

  10. Pfeiffer, S.: Scene determination based on video and audio features. Multimedia Tools Appl. 15(1):59–81 (2001)

    MATH  MathSciNet  Google Scholar 

  11. Chen, S.C., Shyu, M.L., Liao, W., Zhang, C.: Scene change detection by audio and video clues. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME2002), vol. 3, pp. 365–368 (2002)

  12. Alatan, A., Akansu, A., Wolf, W.: Multi-modal dialog scene detection using hidden markov models for content-based multimedia indexing. Multimedia Tools Appl. 14:137–151 (2001)

    Google Scholar 

  13. Saraceno, C., Leonardi, R.: Indexing audiovisual databases through joint audio and video processing. Int. J. Imaging Syst. Technol. 9(5):320–331 (1998)

    Article  Google Scholar 

  14. Harb, H., Chen, L.: Voice-based gender identification in multimedia applications. Journal of Intelligent Information Systems 24(2–3): 179–198 (2005)

    Google Scholar 

  15. Gold, B., Morgan, N.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley, New York (1999)

  16. Harb, H., Chen, L.: Highlights detection in sports videos based on audio analysis. Proceedings of the Third International Workshop on Content-Based Multimedia Indexing CBMI03, September 22–24, IRISA, Rennes, France, pp. 223–229 (2003)

  17. Harb, H., Chen, L.: A Query by Example Music Retrieval Algorithm. Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services WIAMIS03, University of London, UK, 9–11 April, pp. 122–128 (2003)

  18. Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: DARPA speech recognition workshop (1998)

  19. Cover, T., Thomas, J.: Elements of Information Theory, Wiley Series in Telecommunications. Wiley, New York (1991)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hadi Harb.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Harb, H., Chen, L. Audio-based description and structuring of videos. Int J Digit Libr 6, 70–81 (2006). https://doi.org/10.1007/s00799-005-0120-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-005-0120-5

Keywords

Navigation