Skip to main content

Speaker Change Detection Using Variable Segments for Video Indexing

  • Conference paper
Advances in Multimedia Modeling (MMM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6523))

Included in the following conference series:

  • 1420 Accesses

Abstract

Video indexing based on shots obtained by visual features is useful for content-based video browsing but has more limited success in facilitating semantic search of videos. Meanwhile, recent developments in speech recognition allow the option of surpassing many difficulties associated with the detections of semantic meanings over visual features by operating directly on the verbal content. The use of language based indexing inspires a new video segmentation technique based on speaker change detection. This paper deals with the improvement of existing speaker change detectors by introducing an extra preprocessing step which aligns the audio features with syllables. We investigate the benefits of such synchronization and propose a variable presegmentation scheme that utilizes both magnitude and frequency information to attain such alignment. The experimental results show that the quality of the extracted audio feature is improved, resulting in a better recall rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Mori, K., Nakagawa, S.: Speaker Change Detection and Speaker Clustering using VQ Distortion for Broadcast News Speech Recognition. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2001), Salt Lake City UT USA, vol. 1, pp. 413–416 (2001)

    Google Scholar 

  2. Radhakrishan, R., Xiong, Z., Divakaran, A., Raj, B.: Investigation on Effectiveness of Mid-level Feature Representation for Semantic Boundary Detection in News Video. In: Internet Multimedia Management Systems, Conference, Orlando FL, vol. 5242(4), pp. 74–80 (2003)

    Google Scholar 

  3. Lu, L., Zhang, H.: Real-Time Unsupervised Speaker Change Detection. In: Proc. 16th International Conference on Pattern Recognition 2002, vol. 2, pp. 358–361 (2002)

    Google Scholar 

  4. Paliwal, K., Atal, B.: Frequency Related Representation of Speech. In: Proc. European Conf. Speech Communication and Technology, EUROSPEECH 2003, Geneva Switzerland, pp. 65–68 (2003)

    Google Scholar 

  5. Speech Frequency Components, http://www.smeter.net/daily-facts/4/fact2.php

  6. Yaruss, J.: Converting Between Word and Syllable Counts in Children’s Conversational Speech Samples. Journal of Fluency Disorders 25(4), 305–316 (2000)

    Article  Google Scholar 

  7. Document Readability on My Writer Tools, http://www.mywritertools.com/lightenup.asp

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tam, K.Y., Lay, J., Levy, D. (2011). Speaker Change Detection Using Variable Segments for Video Indexing. In: Lee, KT., Tsai, WH., Liao, HY.M., Chen, T., Hsieh, JW., Tseng, CC. (eds) Advances in Multimedia Modeling. MMM 2011. Lecture Notes in Computer Science, vol 6523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17832-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17832-0_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17831-3

  • Online ISBN: 978-3-642-17832-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics