Speaker Change Detection Using Variable Segments for Video Indexing

Tam, King Yiu; Lay, Jose; Levy, David

doi:10.1007/978-3-642-17832-0_28

King Yiu Tam²¹,
Jose Lay²¹ &
David Levy²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6523))

Included in the following conference series:

International Conference on Multimedia Modeling

1420 Accesses

Abstract

Video indexing based on shots obtained by visual features is useful for content-based video browsing but has more limited success in facilitating semantic search of videos. Meanwhile, recent developments in speech recognition allow the option of surpassing many difficulties associated with the detections of semantic meanings over visual features by operating directly on the verbal content. The use of language based indexing inspires a new video segmentation technique based on speaker change detection. This paper deals with the improvement of existing speaker change detectors by introducing an extra preprocessing step which aligns the audio features with syllables. We investigate the benefits of such synchronization and propose a variable presegmentation scheme that utilizes both magnitude and frequency information to attain such alignment. The experimental results show that the quality of the extracted audio feature is improved, resulting in a better recall rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Efficient audio-driven multimedia indexing through similarity-based speech / music discrimination

Article 10 January 2017

A video indexing and retrieval computational prototype based on transcribed speech

Article 30 August 2021

Spatio-Temporal Video Segmentation

References

Mori, K., Nakagawa, S.: Speaker Change Detection and Speaker Clustering using VQ Distortion for Broadcast News Speech Recognition. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2001), Salt Lake City UT USA, vol. 1, pp. 413–416 (2001)
Google Scholar
Radhakrishan, R., Xiong, Z., Divakaran, A., Raj, B.: Investigation on Effectiveness of Mid-level Feature Representation for Semantic Boundary Detection in News Video. In: Internet Multimedia Management Systems, Conference, Orlando FL, vol. 5242(4), pp. 74–80 (2003)
Google Scholar
Lu, L., Zhang, H.: Real-Time Unsupervised Speaker Change Detection. In: Proc. 16th International Conference on Pattern Recognition 2002, vol. 2, pp. 358–361 (2002)
Google Scholar
Paliwal, K., Atal, B.: Frequency Related Representation of Speech. In: Proc. European Conf. Speech Communication and Technology, EUROSPEECH 2003, Geneva Switzerland, pp. 65–68 (2003)
Google Scholar
Speech Frequency Components, http://www.smeter.net/daily-facts/4/fact2.php
Yaruss, J.: Converting Between Word and Syllable Counts in Children’s Conversational Speech Samples. Journal of Fluency Disorders 25(4), 305–316 (2000)
Article Google Scholar
Document Readability on My Writer Tools, http://www.mywritertools.com/lightenup.asp

Download references

Author information

Authors and Affiliations

School of Electrical and Information Engineering, The University of Sydney, Sydney, Australia
King Yiu Tam, Jose Lay & David Levy

Authors

King Yiu Tam
View author publications
You can also search for this author in PubMed Google Scholar
Jose Lay
View author publications
You can also search for this author in PubMed Google Scholar
David Levy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Taiwan Ocean University, Keelung, Taiwan
Kuo-Tien Lee & Jun-Wei Hsieh &
National Chiao Tung University, Hsinchu, Taiwan
Wen-Hsiang Tsai
Academia Sinica, Taipei, Taiwan
Hong-Yuan Mark Liao
Cornell University, Ithaca, NY, USA
Tsuhan Chen
National Kaohsiung First University of Science and Technology, Kaohsiung, Taiwan
Chien-Cheng Tseng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tam, K.Y., Lay, J., Levy, D. (2011). Speaker Change Detection Using Variable Segments for Video Indexing. In: Lee, KT., Tsai, WH., Liao, HY.M., Chen, T., Hsieh, JW., Tseng, CC. (eds) Advances in Multimedia Modeling. MMM 2011. Lecture Notes in Computer Science, vol 6523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17832-0_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-17832-0_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17831-3
Online ISBN: 978-3-642-17832-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics