Progress in the AMIDA Speaker Diarization System for Meeting Data

van Leeuwen, David A.; Konečný, Matej

doi:10.1007/978-3-540-68585-2_44

David A. van Leeuwen¹ &
Matej Konečný¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4625))

Included in the following conference series:

1232 Accesses
12 Citations

Abstract

In this paper we describe the AMIDA speaker dizarization system as it was submitted to the NIST Rich Transcription evaluation 2007 for conference room data. This is done in the context of the history of this system and other speaker diarization systems. One of the goals of our system is to have as little tunable parameters as possible, while maintaining performance. The system consists of a BIC segmentation/clustering initialization, followed by a combined re-segmentation cluster merging algorithm. The Diarization Error Rate (DER) result of our best system is 17.0 %, accounting for overlapping speech. However, we find that a slight altering of Speech Activity Detection models has a large impact on the speaker DER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adami, A., Burget, L., Hermansky, H.: Qualcomm-ICSI-OGI noise-robust front end (September 2002), http://www.icsi.berkeley.edu/Speech/papers/qio/
Ajmera, J., McCowan, I., Bourlard, H.: Robust speaker change detection. IEEE Signal Processing Lettres 11(8), 649–651 (2004)
Article Google Scholar
Anguera, X.: BeamformIt, the fast and robust acoustic beamformer (2006), http://www.icsi.berkeley.edu/~xanguera/BeamformIt/
Anguera, X., Wooters, C., Peskin, B., Aguiló, M.: Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system. In: Proc. RT 2005 Meeting Recognition Evaluation Workshop, Edinburgh, July 2005, pp. 26–38 (2005)
Google Scholar
Anguera, X., Wooters, C., Peskin, B., Aguiló, M.: Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 402–414. Springer, Heidelberg (2006)
Chapter Google Scholar
Barras, C., Zhu, X., Meignier, S.: Multistage speaker diarization of broadcast news. IEEE Transactions on Audio, Speech and Language Processing 14(5), 1505–1512 (2006)
Article Google Scholar
Chen, S.S., Gopalakrishnan, P.S.: Clustering via the Baysian Information Criterion with applications in speech recognition. In: Proc. ICASSP (1998)
Google Scholar
Chen, S.S., Gopalakrishnan, P.S.: Speaker, environment and channel change detection and clustering via the Bayesian Information Criterion. In: Proceedings of the Darpa Broadcast News Transcription and Understanding Workshop (1998)
Google Scholar
Fiscus, J., Radde, N., Garofolo, J.S., Le, A., Ajot, J., Laprun, C.: The rich transcription 2005 spring meeting recognition evaluation. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 369–389. Springer, Heidelberg (2006)
Chapter Google Scholar
Fredouille, C., Senay, G.: Technical improvements of the e-hmm based speaker diarization system for meeting records. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 359–370. Springer, Heidelberg (2006)
Chapter Google Scholar
Gauvain, J.-L., Lee, C.-H.: Maximum a posteriori esitimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Processing 2, 291–298 (1994)
Article Google Scholar
Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, special issue on Robust Speech Recognition 2(4), 578–589 (1994)
Google Scholar
Hermansky, H.: Perceptual linear predictive (plp) analysis of speech. JASA 87(4), 1738–1752 (1990)
Google Scholar
Huijbregts, M., Wooters, C., Ordelman, R.: Filtering the unknown: Speech activity detection in heterogeneous video collections. In: Proc. Interspeech, Antwerpen (accepted for publication, 2007)
Google Scholar
Reynolds, D.A., Singer, E., Carlson, B.A.: Blind clustering of speech utterances based on speaker and language characteristics. In: Proceedings of International Conference Spoken Language Processing (ICSLP 1998), November 1998, pp. 3193–3196 (1998)
Google Scholar
van Leeuwen, D.A.: The TNO speaker diarization system for NIST rich transcription evaluation 2005 for meeting data. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 400–449. Springer, Heidelberg (2006)
Chapter Google Scholar
van Leeuwen, D.A., Brümmer, N.: An introduction to application-independent evaluation of speaker recognition systems. In: Müller, C. (ed.) Speaker Classification 2007. LNCS (LNAI), vol. 4343. Springer, Heidelberg (2007)
Chapter Google Scholar
van Leeuwen, D.A., Huijbregts, M.: The AMI speaker diarization system for NIST RT06s meeting data. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 371–384. Springer, Heidelberg (2006)
Chapter Google Scholar
Wooters, C., Huijbregts, M.: The ICSI RT07s speaker diarization system. In: Machine Learning for Multimodal Interaction. LNCS. Springer, Heidelberg (2007)
Google Scholar
Zhu, X., Barras, C., Lamel, L., Gauvain, J.-L.: Speaker diarization: From broadcast news to lectures. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 396–406. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

TNO Human Factors, Postbus 23, 3769 ZG, Soesterberg, The Netherlands
David A. van Leeuwen & Matej Konečný

Authors

David A. van Leeuwen
View author publications
You can also search for this author in PubMed Google Scholar
Matej Konečný
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Leeuwen, D.A., Konečný, M. (2008). Progress in the AMIDA Speaker Diarization System for Meeting Data. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_44

Download citation

DOI: https://doi.org/10.1007/978-3-540-68585-2_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics