Speaker Change Detection Using Binary Key Modelling with Contextual Information

Patino, Jose; Delgado, Héctor; Evans, Nicholas

doi:10.1007/978-3-319-68456-7_21

Jose Patino¹⁶,
Héctor Delgado¹⁶ &
Nicholas Evans¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10583))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

768 Accesses

Abstract

Speaker change detection can be of benefit to a number of different speech processing tasks such as speaker diarization, recognition and detection. Current solutions rely either on highly localized data or on training with large quantities of background data. While efficient, the former tend to over-segment. While more stable, the latter are less efficient and need adaptation to mis-matching data. Building on previous work in speaker recognition and diarization, this paper reports a new binary key (BK) modelling approach to speaker change detection which aims to strike a balance between efficiency and segmentation accuracy. The BK approach benefits from training using a controllable degree of contextual data, rather than relying on external background data, and is efficient in terms of computation and speaker discrimination. Experiments on a subset of the standard ETAPE database show that the new approach outperforms the current state-of-the-art methods for speaker change detection and gives an average relative improvement in segment coverage and purity of 18.71% and 4.51% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speaker Verification Systems: A Comprehensive Review

Latent class model with application to speaker diarization

Article Open access 09 July 2019

DANTE Speaker Recognition Module. An Efficient and Robust Automatic Speaker Searching Solution for Terrorism-Related Scenarios

References

Anguera, X., Bonastre, J.F.: A novel speaker binary key derived from anchor models. In: Proceedings of the INTERSPEECH, pp. 2118–2121 (2010)
Google Scholar
Anguera, X., Bonastre, J.F.: Fast speaker diarization based on binary keys. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4428–4431. IEEE (2011)
Google Scholar
Anguera, X., Movellan, E., Ferrarons, M.: Emotions recognition using binary fingerprints. In: Proceedings of the IberSPEECH (2012)
Google Scholar
Barras, C., Zhu, X., Meignier, S., Gauvain, J.L.: Multistage speaker diarization of broadcast news. IEEE Trans. Audio Speech Lang. Process. 14(5), 1505–1512 (2006)
Article Google Scholar
Bonastre, J.F., Miró, X.A., Sierra, G.H., Bousquet, P.M.: Speaker modeling using local binary decisions. In: Proceedings of the INTERSPEECH, pp. 13–16 (2011)
Google Scholar
Bredin, H.: Tristounet: triplet loss for speaker turn embedding. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5430–5434. IEEE (2017)
Google Scholar
Cettolo, M., Vescovi, M.: Efficient audio segmentation algorithms based on the BIC. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 6, pp. VI–537. IEEE (2003)
Google Scholar
Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, vol. 8, pp. 127–132 (1998)
Google Scholar
Cheng, S.S., Wang, H.M., Fu, H.C.: BIC-based speaker segmentation using divide-and-conquer strategies with application to speaker diarization. IEEE Trans. Audio Speech Lang. Process. 18(1), 141–157 (2010)
Article Google Scholar
Delacourt, P., Wellekens, C.J.: DISTBIC: a speaker-based segmentation for audio data indexing. Speech Commun. 32(1), 111–126 (2000)
Article Google Scholar
Delgado, H., Anguera, X., Fredouille, C., Serrano, J.: Improved binary key speaker diarization system. In: Proceedings of the 23rd European Signal Processing Conference (EUSIPCO), pp. 2087–2091 (2015)
Google Scholar
Delgado, H., Anguera, X., Fredouille, C., Serrano, J.: Global speaker clustering towards optimal stopping criterion in binary key speaker diarization. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS, vol. 8854, pp. 59–68. Springer, Cham (2014). doi:10.1007/978-3-319-13623-3_7
Google Scholar
Delgado, H., Anguera, X., Fredouille, C., Serrano, J.: Fast single-and cross-show speaker diarization using binary key speaker modeling. IEEE Trans. Audio Speech Lang. Process. 23(12), 2286–2297 (2015)
Article Google Scholar
Delgado, H., Anguera, X., Fredouille, C., Serrano, J.: Novel clustering selection criterion for fast binary key speaker diarization. In: Proceedings of the INTERSPEECH, pp. 3091–3095, Dresden, Germany (2015)
Google Scholar
Delgado, H., Fredouille, C., Serrano, J.: Towards a complete binary key system for the speaker diarization task. In: Proceedings of the INTERSPEECH, pp. 572–576 (2014)
Google Scholar
Gravier, G., Adda, G., Paulson, N., Carré, M., Giraudel, A., Galibert, O.: The ETAPE corpus for the evaluation of speech-based TV content processing in the French language. In: LREC-Eighth International Conference on Language Resources and Evaluation, p. na (2012)
Google Scholar
Gupta, V.: Speaker change point detection using deep neural nets. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4420–4424. IEEE (2015)
Google Scholar
Hrúz, M., Zajíc, Z.: Convolutional neural network for speaker change detection in telephone speaker diarization system. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949. IEEE (2017)
Google Scholar
Luque, J., Anguera, X.: On the modeling of natural vocal emotion expressions through binary key. In: Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), pp. 1562–1566 (2014)
Google Scholar
Malegaonkar, A.S., Ariyaeeinia, A.M., Sivakumaran, P.: Efficient speaker change detection using adapted Gaussian mixture models. IEEE Trans. Audio Speech Lang. Process. 15(6), 1859–1869 (2007)
Article Google Scholar
Neri, L.V., Pinheiro, H.N., Ren, T.I., Cavalcanti, G.D.D.C., Adami, A.G.: Speaker segmentation using i-vector in meetings domain. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5455–5459. IEEE (2017)
Google Scholar
Patino, J., Delgado, H., Evans, N., Anguera, X.: EURECOM submission to the Albayzin 2016 speaker diarization evaluation. In: Proceedings of the IberSPEECH (2016)
Google Scholar
Wang, R., Gu, M., Li, L., Xu, M., Zheng, T.F.: Speaker segmentation using deep speaker vectors for fast speaker change scenarios. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5420–5424. IEEE (2017)
Google Scholar
Wu, T.Y., Lu, L., Chen, K., Zhang, H.: Universal background models for real-time speaker change detection. In: MMM, pp. 135–149 (2003)
Google Scholar
Zajíc, Z., Kunešová, M., Radová, V.: Investigation of segmentation in i-vector based speaker diarization of telephone speech. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS, vol. 9811, pp. 411–418. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_49
Chapter Google Scholar

Download references

Acknowledgements

This work was supported through funding from the Agence Nationale de la Recherche (French research funding agency) in the context of the ODESSA project (ANR-15-CE39-0010). The authors acknowledge Hervé Bredin’s help in the evaluation of speaker change detection.

Author information

Authors and Affiliations

Department of Digital Security, EURECOM, Sophia Antipolis, France
Jose Patino, Héctor Delgado & Nicholas Evans

Authors

Jose Patino
View author publications
You can also search for this author in PubMed Google Scholar
Héctor Delgado
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Evans
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jose Patino .

Editor information

Editors and Affiliations

University of Le Mans, Le Mans, France
Nathalie Camelin
University of Le Mans, Le Mans, France
Yannick Estève
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patino, J., Delgado, H., Evans, N. (2017). Speaker Change Detection Using Binary Key Modelling with Contextual Information. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-68456-7_21
Published: 27 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68455-0
Online ISBN: 978-3-319-68456-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics