Many modern systems for speaker diarization, such as the top-performing JHU system in the DIHARD 2018 challenge, rely on clustering of DNN speaker embeddings followed by HMM resegmentation. Two problems with this approach are that parameters need significant retuning for different applications, and that the DNN contributes only to the clustering task and not the resegmentation. This paper presents two contributions: an improved HMM segment assignment algorithm using leave-one-out Gaussian PLDA scoring, and an approach to training the DNN such that embeddings directly optimize performance of this scoring method with generatively updated PLDA parameters. Initial experiments with this new system are very promising, achieving state-of-the-art performance for two separate tasks (Callhome and DIHARD18) without any task-dependent parameter tuning.
Cite as: McCree, A., Sell, G., Garcia-Romero, D. (2019) Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings. Proc. Interspeech 2019, 381-385, doi: 10.21437/Interspeech.2019-2912
@inproceedings{mccree19_interspeech, author={Alan McCree and Gregory Sell and Daniel Garcia-Romero}, title={{Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings}}, year=2019, booktitle={Proc. Interspeech 2019}, pages={381--385}, doi={10.21437/Interspeech.2019-2912} }