Abstract
Speaker change detection can be of benefit to a number of different speech processing tasks such as speaker diarization, recognition and detection. Current solutions rely either on highly localized data or on training with large quantities of background data. While efficient, the former tend to over-segment. While more stable, the latter are less efficient and need adaptation to mis-matching data. Building on previous work in speaker recognition and diarization, this paper reports a new binary key (BK) modelling approach to speaker change detection which aims to strike a balance between efficiency and segmentation accuracy. The BK approach benefits from training using a controllable degree of contextual data, rather than relying on external background data, and is efficient in terms of computation and speaker discrimination. Experiments on a subset of the standard ETAPE database show that the new approach outperforms the current state-of-the-art methods for speaker change detection and gives an average relative improvement in segment coverage and purity of 18.71% and 4.51% respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anguera, X., Bonastre, J.F.: A novel speaker binary key derived from anchor models. In: Proceedings of the INTERSPEECH, pp. 2118–2121 (2010)
Anguera, X., Bonastre, J.F.: Fast speaker diarization based on binary keys. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4428–4431. IEEE (2011)
Anguera, X., Movellan, E., Ferrarons, M.: Emotions recognition using binary fingerprints. In: Proceedings of the IberSPEECH (2012)
Barras, C., Zhu, X., Meignier, S., Gauvain, J.L.: Multistage speaker diarization of broadcast news. IEEE Trans. Audio Speech Lang. Process. 14(5), 1505–1512 (2006)
Bonastre, J.F., Miró, X.A., Sierra, G.H., Bousquet, P.M.: Speaker modeling using local binary decisions. In: Proceedings of the INTERSPEECH, pp. 13–16 (2011)
Bredin, H.: Tristounet: triplet loss for speaker turn embedding. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5430–5434. IEEE (2017)
Cettolo, M., Vescovi, M.: Efficient audio segmentation algorithms based on the BIC. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 6, pp. VI–537. IEEE (2003)
Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, vol. 8, pp. 127–132 (1998)
Cheng, S.S., Wang, H.M., Fu, H.C.: BIC-based speaker segmentation using divide-and-conquer strategies with application to speaker diarization. IEEE Trans. Audio Speech Lang. Process. 18(1), 141–157 (2010)
Delacourt, P., Wellekens, C.J.: DISTBIC: a speaker-based segmentation for audio data indexing. Speech Commun. 32(1), 111–126 (2000)
Delgado, H., Anguera, X., Fredouille, C., Serrano, J.: Improved binary key speaker diarization system. In: Proceedings of the 23rd European Signal Processing Conference (EUSIPCO), pp. 2087–2091 (2015)
Delgado, H., Anguera, X., Fredouille, C., Serrano, J.: Global speaker clustering towards optimal stopping criterion in binary key speaker diarization. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS, vol. 8854, pp. 59–68. Springer, Cham (2014). doi:10.1007/978-3-319-13623-3_7
Delgado, H., Anguera, X., Fredouille, C., Serrano, J.: Fast single-and cross-show speaker diarization using binary key speaker modeling. IEEE Trans. Audio Speech Lang. Process. 23(12), 2286–2297 (2015)
Delgado, H., Anguera, X., Fredouille, C., Serrano, J.: Novel clustering selection criterion for fast binary key speaker diarization. In: Proceedings of the INTERSPEECH, pp. 3091–3095, Dresden, Germany (2015)
Delgado, H., Fredouille, C., Serrano, J.: Towards a complete binary key system for the speaker diarization task. In: Proceedings of the INTERSPEECH, pp. 572–576 (2014)
Gravier, G., Adda, G., Paulson, N., Carré, M., Giraudel, A., Galibert, O.: The ETAPE corpus for the evaluation of speech-based TV content processing in the French language. In: LREC-Eighth International Conference on Language Resources and Evaluation, p. na (2012)
Gupta, V.: Speaker change point detection using deep neural nets. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4420–4424. IEEE (2015)
Hrúz, M., Zajíc, Z.: Convolutional neural network for speaker change detection in telephone speaker diarization system. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949. IEEE (2017)
Luque, J., Anguera, X.: On the modeling of natural vocal emotion expressions through binary key. In: Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), pp. 1562–1566 (2014)
Malegaonkar, A.S., Ariyaeeinia, A.M., Sivakumaran, P.: Efficient speaker change detection using adapted Gaussian mixture models. IEEE Trans. Audio Speech Lang. Process. 15(6), 1859–1869 (2007)
Neri, L.V., Pinheiro, H.N., Ren, T.I., Cavalcanti, G.D.D.C., Adami, A.G.: Speaker segmentation using i-vector in meetings domain. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5455–5459. IEEE (2017)
Patino, J., Delgado, H., Evans, N., Anguera, X.: EURECOM submission to the Albayzin 2016 speaker diarization evaluation. In: Proceedings of the IberSPEECH (2016)
Wang, R., Gu, M., Li, L., Xu, M., Zheng, T.F.: Speaker segmentation using deep speaker vectors for fast speaker change scenarios. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5420–5424. IEEE (2017)
Wu, T.Y., Lu, L., Chen, K., Zhang, H.: Universal background models for real-time speaker change detection. In: MMM, pp. 135–149 (2003)
Zajíc, Z., Kunešová, M., Radová, V.: Investigation of segmentation in i-vector based speaker diarization of telephone speech. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS, vol. 9811, pp. 411–418. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_49
Acknowledgements
This work was supported through funding from the Agence Nationale de la Recherche (French research funding agency) in the context of the ODESSA project (ANR-15-CE39-0010). The authors acknowledge Hervé Bredin’s help in the evaluation of speaker change detection.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Patino, J., Delgado, H., Evans, N. (2017). Speaker Change Detection Using Binary Key Modelling with Contextual Information. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-68456-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68455-0
Online ISBN: 978-3-319-68456-7
eBook Packages: Computer ScienceComputer Science (R0)