Abstract
In this paper, a speaker segmentation method based on log-likelihood ratio score (LLRS) over universal background model (UBM) and a speaker clustering method based on difference of log-likelihood scores between two speaker models are proposed. During the segmentation process, the LLRS between two adjacent speech segments over UBM is used as a distance measure Cwhile during the clustering process Cthe difference of log-likelihood scores between two speaker models is used as a speaker classification criterion. A complete system for NIST 2002 2-speaker task is presented using the methods mentioned above. Experimental results on NIST 2002 Switchboard Cellular speaker segmentation corpus, 1-speaker evaluation corpus and 2- speaker evaluation corpus show the potentiality of the proposed algorithms.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. Series in Computer Science, vol. 15, ch. 3. World Scientific, Singapore (1989)
Chen, S.S., Gopalakrishnan, P.S.: Speaker environment and channel change detection and clustering via the Bayesian Information Criterion. In: DARPA Speech Recognition Workshop (1998)
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. Series in Computer Science, vol. 15, ch. 3. World Scientific, Singapore (1989)
Gish, H., Siu, M.-H., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 873–876 (1991)
Gish, H., Schmidt, M.: Text-independent speaker identification. IEEE Signal Processing Mag. 11, 18–32 (1994)
Siegler, M.A., Jain, U., Raj, B., Stern, R.M.: Automatic segmentation classi®cation and clustering of broadcast news audio. In: DARPA Speech Recognition Workshop, pp. 97–99 (1997)
Campbell Jr., J.P.: Speaker recognition: A tutorial. Proc. IEEE 9(85), 1437–1462 (1997)
Delacourt, P., Wellekens, C.J.: DISTBIC: a speaker-based segmentation for audio data indexing. Speech Communication (32), 111–126 (2000)
Meignier, S., Bonastre, J.-F., Igounet, S.: E-HMM approach for learning and adapting sound models for speaker indexing. In: 2001: A Speaker Odyssey, Chania, Crete, June 2001, pp. 175–180 (2001)
Moraru, D., Meignier, S., Besacier, L., Bonastre, J.-F., Magrin-Chagnolleau, Y.: The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2003), Hong Kong, pp. 89–92 (2003)
Moraru, D., Meignier, S., Fredouille, C., Besacier, L., Bonastre, J.-F.: The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2004), Montreal, Canada (2004)
Wu, T., Lu, L., Chen, K., Zhang, H.: UBM-based real-time speaker segmentation for broadcasting news. In: Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing ICASSP 2003 Hong Kong, China, vol. (2), pp. 193–196 (2003)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing (10), 19–41 (2000)
Xiong, Z., Zheng, T.F., Song, Z., Wu, W.: Combining Selection Tree with Observation Reordering Pruning for Efficient Speaker Identification Using GMM-UBM. In: Proc. ICASSP 2005, pp. 625–628 (2005)
http://www.nist.gov/speech/tests/spk/2002/resource/index.htm
Bonastre, J.-F., Meignier, S., Merlin, T.: Speaker detection using multispeaker audio files for both enrollment and test. In: ICASSP 2003, Hong Kong, China (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Deng, J., Zheng, T.F., Wu, W. (2006). UBM Based Speaker Segmentation and Clustering for 2-Speaker Detection. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_16
Download citation
DOI: https://doi.org/10.1007/11939993_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)