Deep Speaker Embeddings Based Online Diarization

Avdeeva, Anastasia; Novoselov, Sergey

doi:10.1007/978-3-031-20980-2_3

Anastasia Avdeeva¹¹ &
Sergey Novoselov¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

International Conference on Speech and Computer

1123 Accesses

Abstract

This paper describes our experiments with the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) model for development of an online diarization system. For this task several UIS-RNN models based on different speaker embeddings extractors were trained. These systems were evaluated in terms of Diarization Error Rate (DER) metric on public and private test datasets. Also systems were tested on real dialogue data recorded in a bank office. Proposed online models outperform standard offline Agglomerative Hierarchical Clustering (AHC) approach and are compatible with the state-of-the-art Bayesian HMM (VBx) offline method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings

Feature Embedding Representation for Unsupervised Speaker Diarization in Telephone Calls

X-Vector-Based Speaker Diarization Using Bi-LSTM and Interim Voting-Driven Post-processing

References

Ardila, R., et al.: Common voice: a massively-multilingual speech corpus. In: LREC (2020)
Google Scholar
Bredin, H., et al.: Pyannote.audio: neural building blocks for speaker diarization. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7124–7128 (2020)
Google Scholar
Carletta, J., et al.: The ami meeting corpus: a pre-announcement. In: MLMI (2005)
Google Scholar
Dawalatabad, N., Ravanelli, M., Grondin, F., Thienpondt, J., Desplanques, B., Na, H.: Ecapa-tdnn embeddings for speaker diarization. In: Interspeech (2021)
Google Scholar
Díez, M., Burget, L., Landini, F., Wang, S., Černocký, J.H.: Optimizing bayesian hmm based x-vector clustering for the second dihard speech diarization challenge. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6519–6523 (2020)
Google Scholar
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: making vgg-style convnets great again. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13728–13737 (2021)
Google Scholar
Fini, E., Brutti, A.: Supervised online diarization with sample mean loss for multi-domain data. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7134–7138 (2020)
Google Scholar
Fujita, Y., Kanda, N., Horiguchi, S., Nagamatsu, K., Watanabe, S.: End-to-end neural speaker diarization with permutation-free objectives. In: INTERSPEECH (2019)
Google Scholar
Horiguchi, S., Fujita, Y., Watanabe, S., Xue, Y., Nagamatsu, K.: End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors. ArXiv abs/2005.09921 (2020)
Google Scholar
Horiguchi, S., Watanabe, S., García, P., Takashima, Y., Kawaguchi, Y.: Online neural diarization of unlimited numbers of speakers. ArXiv abs/2206.02432 (2022)
Google Scholar
Isik, Y.Z., Roux, J.L., Chen, Z., Watanabe, S., Hershey, J.R.: Single-channel multi-speaker separation using deep clustering. In: INTERSPEECH (2016)
Google Scholar
Kanda, N., et al.: Guided source separation meets a strong ASR backend: hitachi/paderborn university joint investigation for dinner party ASR. In: Proceedings of Interspeech 2019, pp. 1248–1252 (2019). https://doi.org/10.21437/Interspeech.2019-1167
Landini, F., Profant, J., Diez, M., Burget, L.: Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks. Comput. Speech Lang. 71, 101254 (2022)
Article Google Scholar
Landini, F., Profant, J., Díez, M., Burget, L.: Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks. ArXiv abs/2012.14952 (2020)
Google Scholar
Lavrentyeva, G., et al.: Blind speech signal quality estimation for speaker verification systems. In: INTERSPEECH (2020)
Google Scholar
Martin, A.F., Greenberg, C.S.: Nist 2008 speaker recognition evaluation: performance across telephone and room microphone channels. In: INTERSPEECH (2009)
Google Scholar
Medennikov, I., et al.: Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario. ArXiv abs/2005.07272 (2020)
Google Scholar
Morrone, G., Cornell, S., Raj, D., Zovato, E., Brutti, A., Squartini, S.: Leveraging speech separation for conversational telephone speaker diarization (2022)
Google Scholar
Novoselov, S., Lavrentyeva, G., Avdeeva, A., Volokhov, V., Gusev, A.: Robust speaker recognition with transformers using wav2vec 2.0. ArXiv abs/2203.15095 (2022)
Google Scholar
Park, T.J., Han, K.J., Kumar, M., Narayanan, S.S.: Auto-tuning spectral clustering for speaker diarization using normalized maximum eigengap. IEEE Signal Process. Lett. 27, 381–385 (2020)
Article Google Scholar
Park, T.J., Kanda, N., Dimitriadis, D., Han, K.J., Watanabe, S., Narayanan, S.S.: A review of speaker diarization: recent advances with deep learning. Comput. Speech Lang. 72, 101317 (2022)
Article Google Scholar
Qin, X., et al.: The ffsvc 2020 evaluation plan. ArXiv abs/2002.00387 (2020)
Google Scholar
Ryant, N., et al.: The second dihard diarization challenge: dataset, task, and baselines. In: INTERSPEECH (2019)
Google Scholar
Wang, W., Lin, Q., Li, M.: Online target speaker voice activity detection for speaker diarization. ArXiv abs/2207.05920 (2022)
Google Scholar
Xue, Y., et al.: Online end-to-end neural diarization handling overlapping speech and flexible numbers of speakers. ArXiv abs/2101.08473 (2021)
Google Scholar
Yu, D., Kolbæk, M., Tan, Z., Jensen, J.H.: Permutation invariant training of deep models for speaker-independent multi-talker speech separation. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 241–245 (2017)
Google Scholar
Zhang, A., Wang, Q., Zhu, Z., Paisley, J.W., Wang, C.: Fully supervised speaker diarization. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6301–6305 (2019)
Google Scholar
Zhang, Y., et al.: Online speaker diarization with graph-based label generation. In: Odyssey (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

ITMO University, St. Petersburg, Russia
Anastasia Avdeeva
STC Ltd., St. Petersburg, Russia
Sergey Novoselov

Authors

Anastasia Avdeeva
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Novoselov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anastasia Avdeeva .

Editor information

Editors and Affiliations

Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna
St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Avdeeva, A., Novoselov, S. (2022). Deep Speaker Embeddings Based Online Diarization. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-20980-2_3
Published: 10 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics