Abstract
Video indexing based on shots obtained by visual features is useful for content-based video browsing but has more limited success in facilitating semantic search of videos. Meanwhile, recent developments in speech recognition allow the option of surpassing many difficulties associated with the detections of semantic meanings over visual features by operating directly on the verbal content. The use of language based indexing inspires a new video segmentation technique based on speaker change detection. This paper deals with the improvement of existing speaker change detectors by introducing an extra preprocessing step which aligns the audio features with syllables. We investigate the benefits of such synchronization and propose a variable presegmentation scheme that utilizes both magnitude and frequency information to attain such alignment. The experimental results show that the quality of the extracted audio feature is improved, resulting in a better recall rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mori, K., Nakagawa, S.: Speaker Change Detection and Speaker Clustering using VQ Distortion for Broadcast News Speech Recognition. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2001), Salt Lake City UT USA, vol. 1, pp. 413–416 (2001)
Radhakrishan, R., Xiong, Z., Divakaran, A., Raj, B.: Investigation on Effectiveness of Mid-level Feature Representation for Semantic Boundary Detection in News Video. In: Internet Multimedia Management Systems, Conference, Orlando FL, vol. 5242(4), pp. 74–80 (2003)
Lu, L., Zhang, H.: Real-Time Unsupervised Speaker Change Detection. In: Proc. 16th International Conference on Pattern Recognition 2002, vol. 2, pp. 358–361 (2002)
Paliwal, K., Atal, B.: Frequency Related Representation of Speech. In: Proc. European Conf. Speech Communication and Technology, EUROSPEECH 2003, Geneva Switzerland, pp. 65–68 (2003)
Speech Frequency Components, http://www.smeter.net/daily-facts/4/fact2.php
Yaruss, J.: Converting Between Word and Syllable Counts in Children’s Conversational Speech Samples. Journal of Fluency Disorders 25(4), 305–316 (2000)
Document Readability on My Writer Tools, http://www.mywritertools.com/lightenup.asp
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tam, K.Y., Lay, J., Levy, D. (2011). Speaker Change Detection Using Variable Segments for Video Indexing. In: Lee, KT., Tsai, WH., Liao, HY.M., Chen, T., Hsieh, JW., Tseng, CC. (eds) Advances in Multimedia Modeling. MMM 2011. Lecture Notes in Computer Science, vol 6523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17832-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-17832-0_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17831-3
Online ISBN: 978-3-642-17832-0
eBook Packages: Computer ScienceComputer Science (R0)