Audio-based description and structuring of videos

Harb, Hadi; Chen, Liming

doi:10.1007/s00799-005-0120-5

Audio-based description and structuring of videos

Regular Paper
Published: 23 February 2006

Volume 6, pages 70–81, (2006)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

Hadi Harb¹ &
Liming Chen¹

83 Accesses
3 Citations
Explore all metrics

Abstract

Enabling a rapid on-the-fly view of the content of a movie requires segmenting the movie and describing the segments in a user-compatible manner. The difficulty resides in extracting relevant semantic information from the audiovisual signal, both for the segmentation and the description. We introduce in this paper audio scenes and chapters in movies and present an algorithm for automatically segmenting a video based on the audio stream only. Audio scenes and chapters are defined as the equivalent of shots and scenes in the visual domain. A tree-like audio-based structure of a video is proposed. A chapter is then classified into different chapter categories. The automatic solution to audio scene and chapter segmentation and classification is evaluated on manually segmented and classified videos

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

TREC video retrieval evaluation, [http://www-nlpir.nist.gov/projects/tv2004.].
Gargi, U., Kasturi, R., Strayer, S.: Performance characterization of Video-Shot-Change detection methods. IEEE Trans. Circuits Syst. Video Technol. 10(1):1–13 (2000)
Article Google Scholar
Mahdi, W., Ardabilian, M., Chen, L.: Automatic video scene segmentation based on spatial-temporal clues and rhythm. Netw. Inform. Syst. J. 2(5):1–25 (2000)
Google Scholar
Fan, J., Elmagarmid, A., Zhu, X., Aref, W., Wu, L.: ClusterView: Hierarchical Video Shot Classification, indexing and accessing. IEEE Trans. Multimedia 6(1):70–86 (2004)
Article Google Scholar
Huang, C.L., Liao, B.Y.: A robust scene-change detection method for video segmentation. IEEE Trans. Circuits Syst. Video Technol. 11(12):1281–1288 (2001)
Google Scholar
Rui, Y., Huang, T., Mehrotra, S.: Constructing table-of-content for videos. Multimedia Syst. 7(5):359–368 (1999)
Article Google Scholar
Sundaram, H., Chang, S.F.: Audio scene segmentation using multiple features, models and time scales. Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP, vol.4, pp. 2441–2444 (2000)
Cao, Y., Tavanapong, W., Kim, K., Oh, J.: Audio assisted scene segmentation for story browsing. Proceedings of International Conference on Imaging and Video Retrieval, Urbana-Chanpaign, IL, USA., pp. 446–455 (2003)
Minami, K., Akutsu, A., Hamada, H., Tomomura, Y.: Video handling with music and speech detection. IEEE Multimedia 5(3):17–25 (1998)
Article Google Scholar
Pfeiffer, S.: Scene determination based on video and audio features. Multimedia Tools Appl. 15(1):59–81 (2001)
MATH MathSciNet Google Scholar
Chen, S.C., Shyu, M.L., Liao, W., Zhang, C.: Scene change detection by audio and video clues. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME2002), vol. 3, pp. 365–368 (2002)
Alatan, A., Akansu, A., Wolf, W.: Multi-modal dialog scene detection using hidden markov models for content-based multimedia indexing. Multimedia Tools Appl. 14:137–151 (2001)
Google Scholar
Saraceno, C., Leonardi, R.: Indexing audiovisual databases through joint audio and video processing. Int. J. Imaging Syst. Technol. 9(5):320–331 (1998)
Article Google Scholar
Harb, H., Chen, L.: Voice-based gender identification in multimedia applications. Journal of Intelligent Information Systems 24(2–3): 179–198 (2005)
Google Scholar
Gold, B., Morgan, N.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley, New York (1999)
Harb, H., Chen, L.: Highlights detection in sports videos based on audio analysis. Proceedings of the Third International Workshop on Content-Based Multimedia Indexing CBMI03, September 22–24, IRISA, Rennes, France, pp. 223–229 (2003)
Harb, H., Chen, L.: A Query by Example Music Retrieval Algorithm. Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services WIAMIS03, University of London, UK, 9–11 April, pp. 122–128 (2003)
Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: DARPA speech recognition workshop (1998)
Cover, T., Thomas, J.: Elements of Information Theory, Wiley Series in Telecommunications. Wiley, New York (1991)

Download references

Author information

Authors and Affiliations

Département Maths-Info, LIRIS Lab., CNRS FRE 2672, Ecole Centrale de Lyon, France
Hadi Harb & Liming Chen

Authors

Hadi Harb
View author publications
You can also search for this author in PubMed Google Scholar
Liming Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hadi Harb.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Harb, H., Chen, L. Audio-based description and structuring of videos. Int J Digit Libr 6, 70–81 (2006). https://doi.org/10.1007/s00799-005-0120-5

Download citation

Published: 23 February 2006
Issue Date: February 2006
DOI: https://doi.org/10.1007/s00799-005-0120-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audio-based description and structuring of videos

Abstract

Access this article

Similar content being viewed by others

On the Use of Audio Events for Improving Video Scene Segmentation

Text-Based Video Scene Segmentation: A Novel Method to Determine Shot Boundaries

Towards Automatic Textual Summarization of Movies

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Audio-based description and structuring of videos

Abstract

Access this article

Similar content being viewed by others

On the Use of Audio Events for Improving Video Scene Segmentation

Text-Based Video Scene Segmentation: A Novel Method to Determine Shot Boundaries

Towards Automatic Textual Summarization of Movies

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation