Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization

Anguera, Xavier; Wooters, Chuck; Hernando, Javier

doi:10.1007/11965152_22

Xavier Anguera^19,20,
Chuck Wooters¹⁹ &
Javier Hernando²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

780 Accesses
5 Citations

Abstract

The goal of speaker diarization is to determine where each participant speaks in a recording. One of the most commonly used technique is agglomerative clustering, where some number of initial models are grouped into the number of present speakers. The choice of complexity, topology, and the number of initial models is vital to the final outcome of the clustering algorithm. In prior systems, these parameters were directly assigned based on development data, and were the same for all recordings. In this paper we present three techniques to select the parameters individually for each case, obtaining a system that is more robust to changes in the data. Although the choice of these values depends on tunable parameters, they are less sensitive to changes in the acoustic data and to how the algorithm distributes data among the different clusters. We show that by using the three techniques, we achieve an improvement up to 8% relative in the development set and 19% relative in the test set over prior systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Evaluating the effects of task design on unfamiliar Francophone listener and automatic speaker identification performance

Article 23 June 2023

Speaker Diarization: A Top-Down Approach Using Syllabic Phonology

Unsupervised adaptation of PLDA models for broadcast diarization

Article Open access 27 December 2019

References

Reynolds, D., Torres-Carrasquillo, P.: Approaches and applications of audio diarization. In: ICASSP 2005, Philadelphia, PA, March 2005, pp. 953–956 (2005)
Google Scholar
Chen, S.S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the bayesian information criterion. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA (February 1998)
Google Scholar
Wooters, C., Fung, J., Peskin, B., Anguera, X.: Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system. In: Fall 2004 Rich Transcription Workshop (RT 2004), Palisades, NY (November 2004)
Google Scholar
Ajmera, J., Wooters, C.: A robust speaker clustering algorithm. In: ASRU 2003, US Virgin Islands, USA (December 2003)
Google Scholar
Anguera, X., Wooters, C., Peskin, B., Aguilo, M.: Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system. In: RT05s Meetings Recognition Evaluation, Edinburgh, Great Brittain (July 2005)
Google Scholar
Ajmera, J., McCowan, I., Bourlard, H.: Robust speaker change detection. IEEE Signal Processing Letters 11(8), 649–651 (2004)
Article Google Scholar
NIST rich transcription evaluations, website: http://www.nist.gov/speech/tests/rt
Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Grezl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C., Zheng, J.: Further progress in meeting recognition: The icsi-sri spring 2005 speech-to-text evaluation system. In: RT05s Meetings Recognition Evaluation, Edinburgh, Great Brittain (July 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

International Computer Science Institute, Berkeley, CA, 94704, USA
Xavier Anguera & Chuck Wooters
Technical University of Catalonia, Barcelona, Spain
Xavier Anguera & Javier Hernando

Authors

Xavier Anguera
View author publications
You can also search for this author in PubMed Google Scholar
Chuck Wooters
View author publications
You can also search for this author in PubMed Google Scholar
Javier Hernando
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anguera, X., Wooters, C., Hernando, J. (2006). Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_22

Download citation

DOI: https://doi.org/10.1007/11965152_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics