Abstract
In this paper we describe the AMIDA speaker dizarization system as it was submitted to the NIST Rich Transcription evaluation 2007 for conference room data. This is done in the context of the history of this system and other speaker diarization systems. One of the goals of our system is to have as little tunable parameters as possible, while maintaining performance. The system consists of a BIC segmentation/clustering initialization, followed by a combined re-segmentation cluster merging algorithm. The Diarization Error Rate (DER) result of our best system is 17.0 %, accounting for overlapping speech. However, we find that a slight altering of Speech Activity Detection models has a large impact on the speaker DER.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adami, A., Burget, L., Hermansky, H.: Qualcomm-ICSI-OGI noise-robust front end (September 2002), http://www.icsi.berkeley.edu/Speech/papers/qio/
Ajmera, J., McCowan, I., Bourlard, H.: Robust speaker change detection. IEEE Signal Processing Lettres 11(8), 649–651 (2004)
Anguera, X.: BeamformIt, the fast and robust acoustic beamformer (2006), http://www.icsi.berkeley.edu/~xanguera/BeamformIt/
Anguera, X., Wooters, C., Peskin, B., Aguiló, M.: Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system. In: Proc. RT 2005 Meeting Recognition Evaluation Workshop, Edinburgh, July 2005, pp. 26–38 (2005)
Anguera, X., Wooters, C., Peskin, B., Aguiló, M.: Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 402–414. Springer, Heidelberg (2006)
Barras, C., Zhu, X., Meignier, S.: Multistage speaker diarization of broadcast news. IEEE Transactions on Audio, Speech and Language Processing 14(5), 1505–1512 (2006)
Chen, S.S., Gopalakrishnan, P.S.: Clustering via the Baysian Information Criterion with applications in speech recognition. In: Proc. ICASSP (1998)
Chen, S.S., Gopalakrishnan, P.S.: Speaker, environment and channel change detection and clustering via the Bayesian Information Criterion. In: Proceedings of the Darpa Broadcast News Transcription and Understanding Workshop (1998)
Fiscus, J., Radde, N., Garofolo, J.S., Le, A., Ajot, J., Laprun, C.: The rich transcription 2005 spring meeting recognition evaluation. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 369–389. Springer, Heidelberg (2006)
Fredouille, C., Senay, G.: Technical improvements of the e-hmm based speaker diarization system for meeting records. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 359–370. Springer, Heidelberg (2006)
Gauvain, J.-L., Lee, C.-H.: Maximum a posteriori esitimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Processing 2, 291–298 (1994)
Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, special issue on Robust Speech Recognition 2(4), 578–589 (1994)
Hermansky, H.: Perceptual linear predictive (plp) analysis of speech. JASA 87(4), 1738–1752 (1990)
Huijbregts, M., Wooters, C., Ordelman, R.: Filtering the unknown: Speech activity detection in heterogeneous video collections. In: Proc. Interspeech, Antwerpen (accepted for publication, 2007)
Reynolds, D.A., Singer, E., Carlson, B.A.: Blind clustering of speech utterances based on speaker and language characteristics. In: Proceedings of International Conference Spoken Language Processing (ICSLP 1998), November 1998, pp. 3193–3196 (1998)
van Leeuwen, D.A.: The TNO speaker diarization system for NIST rich transcription evaluation 2005 for meeting data. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 400–449. Springer, Heidelberg (2006)
van Leeuwen, D.A., Brümmer, N.: An introduction to application-independent evaluation of speaker recognition systems. In: Müller, C. (ed.) Speaker Classification 2007. LNCS (LNAI), vol. 4343. Springer, Heidelberg (2007)
van Leeuwen, D.A., Huijbregts, M.: The AMI speaker diarization system for NIST RT06s meeting data. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 371–384. Springer, Heidelberg (2006)
Wooters, C., Huijbregts, M.: The ICSI RT07s speaker diarization system. In: Machine Learning for Multimodal Interaction. LNCS. Springer, Heidelberg (2007)
Zhu, X., Barras, C., Lamel, L., Gauvain, J.-L.: Speaker diarization: From broadcast news to lectures. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 396–406. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
van Leeuwen, D.A., Konečný, M. (2008). Progress in the AMIDA Speaker Diarization System for Meeting Data. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-68585-2_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)