Abstract
This paper describes our experiments with the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) model for development of an online diarization system. For this task several UIS-RNN models based on different speaker embeddings extractors were trained. These systems were evaluated in terms of Diarization Error Rate (DER) metric on public and private test datasets. Also systems were tested on real dialogue data recorded in a bank office. Proposed online models outperform standard offline Agglomerative Hierarchical Clustering (AHC) approach and are compatible with the state-of-the-art Bayesian HMM (VBx) offline method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ardila, R., et al.: Common voice: a massively-multilingual speech corpus. In: LREC (2020)
Bredin, H., et al.: Pyannote.audio: neural building blocks for speaker diarization. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7124–7128 (2020)
Carletta, J., et al.: The ami meeting corpus: a pre-announcement. In: MLMI (2005)
Dawalatabad, N., Ravanelli, M., Grondin, F., Thienpondt, J., Desplanques, B., Na, H.: Ecapa-tdnn embeddings for speaker diarization. In: Interspeech (2021)
DÃez, M., Burget, L., Landini, F., Wang, S., ÄŒernocký, J.H.: Optimizing bayesian hmm based x-vector clustering for the second dihard speech diarization challenge. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6519–6523 (2020)
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: making vgg-style convnets great again. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13728–13737 (2021)
Fini, E., Brutti, A.: Supervised online diarization with sample mean loss for multi-domain data. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7134–7138 (2020)
Fujita, Y., Kanda, N., Horiguchi, S., Nagamatsu, K., Watanabe, S.: End-to-end neural speaker diarization with permutation-free objectives. In: INTERSPEECH (2019)
Horiguchi, S., Fujita, Y., Watanabe, S., Xue, Y., Nagamatsu, K.: End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors. ArXiv abs/2005.09921 (2020)
Horiguchi, S., Watanabe, S., GarcÃa, P., Takashima, Y., Kawaguchi, Y.: Online neural diarization of unlimited numbers of speakers. ArXiv abs/2206.02432 (2022)
Isik, Y.Z., Roux, J.L., Chen, Z., Watanabe, S., Hershey, J.R.: Single-channel multi-speaker separation using deep clustering. In: INTERSPEECH (2016)
Kanda, N., et al.: Guided source separation meets a strong ASR backend: hitachi/paderborn university joint investigation for dinner party ASR. In: Proceedings of Interspeech 2019, pp. 1248–1252 (2019). https://doi.org/10.21437/Interspeech.2019-1167
Landini, F., Profant, J., Diez, M., Burget, L.: Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks. Comput. Speech Lang. 71, 101254 (2022)
Landini, F., Profant, J., DÃez, M., Burget, L.: Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks. ArXiv abs/2012.14952 (2020)
Lavrentyeva, G., et al.: Blind speech signal quality estimation for speaker verification systems. In: INTERSPEECH (2020)
Martin, A.F., Greenberg, C.S.: Nist 2008 speaker recognition evaluation: performance across telephone and room microphone channels. In: INTERSPEECH (2009)
Medennikov, I., et al.: Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario. ArXiv abs/2005.07272 (2020)
Morrone, G., Cornell, S., Raj, D., Zovato, E., Brutti, A., Squartini, S.: Leveraging speech separation for conversational telephone speaker diarization (2022)
Novoselov, S., Lavrentyeva, G., Avdeeva, A., Volokhov, V., Gusev, A.: Robust speaker recognition with transformers using wav2vec 2.0. ArXiv abs/2203.15095 (2022)
Park, T.J., Han, K.J., Kumar, M., Narayanan, S.S.: Auto-tuning spectral clustering for speaker diarization using normalized maximum eigengap. IEEE Signal Process. Lett. 27, 381–385 (2020)
Park, T.J., Kanda, N., Dimitriadis, D., Han, K.J., Watanabe, S., Narayanan, S.S.: A review of speaker diarization: recent advances with deep learning. Comput. Speech Lang. 72, 101317 (2022)
Qin, X., et al.: The ffsvc 2020 evaluation plan. ArXiv abs/2002.00387 (2020)
Ryant, N., et al.: The second dihard diarization challenge: dataset, task, and baselines. In: INTERSPEECH (2019)
Wang, W., Lin, Q., Li, M.: Online target speaker voice activity detection for speaker diarization. ArXiv abs/2207.05920 (2022)
Xue, Y., et al.: Online end-to-end neural diarization handling overlapping speech and flexible numbers of speakers. ArXiv abs/2101.08473 (2021)
Yu, D., Kolbæk, M., Tan, Z., Jensen, J.H.: Permutation invariant training of deep models for speaker-independent multi-talker speech separation. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 241–245 (2017)
Zhang, A., Wang, Q., Zhu, Z., Paisley, J.W., Wang, C.: Fully supervised speaker diarization. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6301–6305 (2019)
Zhang, Y., et al.: Online speaker diarization with graph-based label generation. In: Odyssey (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Avdeeva, A., Novoselov, S. (2022). Deep Speaker Embeddings Based Online Diarization. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-20980-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)