Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news

Karim, Dabbabi; Salah, Hajji; Adnen, Cherif

doi:10.1007/s10772-019-09633-6

Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news

Published: 11 September 2019

Volume 22, pages 893–909, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Dabbabi Karim¹,
Hajji Salah² &
Cherif Adnen¹

190 Accesses
5 Citations
Explore all metrics

Abstract

In this paper, we address the problem of optimal non-hierarchical clustering in the speaker clustering phase for the speaker diarization task of news broadcasts. A new hybridization combining differential evolution (DE) algorithm and K-means algorithm is proposed and tested on TV news database (TVND). To optimize the classification of speakers, two criteria, namely trace within criterion (TRW) and variance ratio criterion (VRC), were used as clustering validity indices, correcting every possible grouping of speakers’ segments. Concerning the encoding of the classification of clusters to be optimized, it is performed by the cluster centers in DE algorithm. Therefore, a problem of rearrangement of centers in the populations can be generated, which cannot ensure an efficient search by applying evolutionary operators. For this purpose, an efficient heuristic was also proposed for this rearrangement. Non-hybrid DE variants were applied with and without the rearrangement of cluster centers, and compared with the corresponding hybrid K-means variants. The experimental results have showed the high-efficiency of hybrid K-means variants with the rearrangement of cluster centers compared with those without the rearrangement of cluster centers and non-hybrid DE variants. Also, the obtained results using hybrid and non-hybrid DE variants with the rearrangement of cluster centers were quite similar using both TWR and VRC criteria. Moreover, the best efficiency was acquired using hybrid DE variants thanks to these two criteria from which a value of 13.05% of DER has been reached by hybrid b6e6rl variant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integration of evolutionary computation algorithms and new AUTO-TLBO technique in the speaker clustering stage for speaker diarization of broadcast news

Article Open access 19 September 2017

Global Speaker Clustering towards Optimal Stopping Criterion in Binary Key Speaker Diarization

An Improved Speaker Identification System Using Automatic Split-Merge Incremental Learning (A-SMILE) of Gaussian Mixture Models

Notes

http://www.technolangue.net/index.php

References

Ajmera, J., Wooters, C. (2003). A robust speaker clustering algorithm. In Proceedings on IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’03) (pp. 411–416).
Anguera, X., Wooters, C., Hernando, J. (2006). Purity algorithms for speaker diarization of meetings data. In Acoustics, speech and signal processing, 2006. ICASSP 2006 Proceedings, 2006 IEEE International Conference.
Barras, C., Zhu, X., Meignier, S., & Gauvain, J. (2006). Multistage speaker diarization of broadcast news. IEEE Transactions on Audio, Speech, and Language Processing,14(5), 1505.
Article Google Scholar
Bozonnet, S., Evans, N., Fredouille, C. (2010). The LIA-EURECOM RT’09 speaker diarization system: enhancements in speaker modelling and cluster purification. In Acoustics speech and signal processing (ICASSP), 2010 IEEE international conference (pp. 4958–4961).
Bozonnet, S., Evans, N.W.D., and Fredouille, C. (2010). The lia-Eurecom RT’09 speaker diarization system: enhancements in speaker Gaussian and cluster purification. In Acoustics speech and signal processing (ICASSP), 2010 IEEE international conference (pp. 4958–4961).
Brest, J., Greiner, S., Boskovic, B., Mernik, M., & Zumer, V. (2006). Self-adapting control parameters in differential evolution: A comparative study on numerical benchmark problems. IEEE Transactions on Evolutionary Computation,10, 646–657.
Article Google Scholar
Brownlee, J. (2011). Clever algorithms nature-inspired programming recipes. Faculty of Information and Communication Technologies Swinburne University of Technology, Melbourne, Australia. First Edition. Lulu.
Carlisle, A. and Doizier, G. (2001). An off-the-shelf PSO. In Proceedings on particle swarm optimization workshop. West Lafayette, School Eng. Technol, Purdue University.
Chen, S., Gopalakrishnan, P. (1998). Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In Proceeding DARPA Broadcast News Transcription and Understanding Workshop (pp. 127–132).
Das, S., Abraham, A., & Konar, A. (2008). Automatic clustering using an improved differential evolution algorithm. IEEE Transactions on Systems Man and Cybernetics Part A-Systems and Humans,28, 218–237.
Article Google Scholar
Das, S., & Sil, S. (2010). Kernel-induced fuzzy clustering of image pixels with an improved differential evolution algorithm. Information Sciences,180, 1237–1256.
Article MathSciNet Google Scholar
Dupuy, G., Meignier, S., Deléglise, P., Estève, Y. (2014). Recent improvements on ILP-based clustering for broadcast news speaker diarization. In The Speaker and Language Recognition Workshop. Joensuu, Finland.
Dupuy, G., Rouvier, M., Meignier, S., and Esteve, Y. (2012). I-vectors and ILP clustering adapted to cross-show speaker diarization. In Proceedings of Interspeech. Portland, Oregon.
Fu, W., Johnston, M., and Zhang, M. (2011). Hybrid particle swarm optimization algorithms based on differential evolution and local search. In Advances in artificial intelligence lecture notes in computer science (vol. 6464, pp. 313–322). Berlin Heidelberg: Springer.
Chapter Google Scholar
Gaithersburg, M.D. (2004). Fall 2004 rich transcription (RT-04F) evaluation plan.
Galibert, O. and Kahn, J. (2013). The first official REPERE evaluation. In Proceedings of Interspeech Satellite Workshop on Speech, Language and Audio in Multimedia (SLAM). Marseille, France.
Galliano, S., Gravier, G., and Chaubard, L. (2009). The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts. In INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association. Brighton, September 6–10, 2009.
Gauvain, J., Lamel, L., Adda, G. (1998). Partitioning and transcription of broadcast news data. In Proceedings on 5th International Conference on Spoken Language Processing (ICSLP’98). Sydney, Australia, paper 0084.
Gupta, V., Boulianne, G., Kenny, P., Ouellet, P. and Dumouchel, P. (2008). Speaker diarization of French broadcast new. Centre de recherché Informatique de Montréal (CRIM).
Jeyakumar, G., & Shunmuga Velayutham, C. (2009). A comparative performance analysis of multiple trial vectors differential evolution and classical differential evolution variants. In: H. Sakai, M. K. Chakraborty, A. E. Hassanien, D. Ślęzak, & W. Zhu (Eds.), Rough sets, fuzzy sets, data mining and granular computing. RSFDGrC 2009. Lecture notes in computer science (vol 5908). Springer, Berlin.
Krink, T., Paterlini, S., & Resti, A. (2007). Using differential evolution to improve the accuracy of bank rating systems. Computational Statistics & Data Analysis,52, 68–87.
Article MathSciNet Google Scholar
Kuo, R.J., Suryani, E., Yasid, A. (2013). Automatic clustering combining differential evolution algorithm and K-means algorithm. In Proceedings of the Institute of Industrial Engineers Asian Conference (pp. 1207–1215).
Chapter Google Scholar
Kwedlo, W. (2011). A clustering method combining differential evolution with the K-means algorithm. Pattern Recognition Letters,32, 1613–1621.
Article Google Scholar
Meignier, S., Bonastre, J., Igounet, S. (2001). E-HMM approach for learning and adapting sound models for speaker indexing. In Proceedings on 2001: A speaker Odyssey—The speaker recognition workshop (Odyssey-2001) (pp. 175–180).
Meignier, S., Moraru, D., Fredouille, C., Bonastre, J., & Besacier, L. (2006). Step-by-step and integrated approaches in broadcast news speaker diarization. Computer Speech & Language,20(2–3), 303–330.
Article Google Scholar
Mirrezaie, S.M. and Ahadi, S.M. (2008). Speaker diarization in a multi-speaker environment using particle swarm optimization and mutual information. In 2008 IEEE International Conference on Multimedia and Expo ICME 2008 Proceedings.
Moraru, D., Besacier, L., Castelli E. (2004). Using a priori information for speaker diarization. In Proceedings on the Speaker and Language Recognitionworkshop (pp. 355–362).
NIST. (2004). Fall 2004 rich transcription (RT-04F) evaluation plan.
Nwankwor, E., Nagar, A. K., & Reid, D. C. (2013). Hybrid differential evolution and particle swarm optimization for optimal well placement. Computational Geosciences,17(2), 249–268.
Article Google Scholar
Pandit, P., & Rao, P. (2015). SpeakerDiarization of broadcast news audios. Bombay: Department of Electrical Engineering, Indian Institute of Technology Bombay.
Google Scholar
Paterlini, S., & Krink, T. (2004). Differential evolution and particle swarm optimisation in partitional clustering. Computational Statistics & Data Analysis,50, 1220–1247.
Article MathSciNet Google Scholar
Reynolds, D., Dunn, R., McLaughlin, J. (2000). The Lincoln Speaker Recognition System: NIST Eval2000. In Proceedings on ICSLP’00 (vol. 2, pp. 470–473).
Reynolds, D., & Torres-Carrasquillo, P. (2005). Approaches and applications of audio diarization. In Proceeding on International Conference Acoustic, Speech, Signal Process (pp. 953–956). Philadelphia.
Robinson, J., & Rahmat-Samii, Y. (2004). Particle swarm optimization in electromagnetics. IEEE Transactions on Antennas and Propagation,52(2), 397–407.
Article MathSciNet Google Scholar
Salcedo-Sanz, S., Gallardo-Antolín, A., Leiva-Murillo, J. M., & Bousoño-Calzón, C. (2006). Offline speaker segmentation using genetic algorithms and mutual information. IEEE Transactions on Evolutionary Computation,10(2), 1.
Article Google Scholar
Siegler, M., Jain, U, Raj, B., Stern, R. (1997). Automatic segmentation, classification and clustering of broadcast news audio. In Proceedings on DARPA Speechrecognition, Workshop (pp. 97–99). Chantilly.
Sierra, L.-M., Cobos, C., & Corrales, J.-C. (2014). Continuous optimization based on a hybridization of differential evolution with K-means. Computer Science Journal. https://doi.org/10.1007/978-3-319-12027-0_31.
Article Google Scholar
Storn, R., & Price, K. (1997). Differential evolution A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization,11, 341–359.
Article MathSciNet Google Scholar
Tan, Y., Tan, G.-Z., & Deng, S.-G. (2013). Hybrid particle swarm optimization with differential evolution and chaotic local search to solve reliability-redundancy allocation problems. Journal of Central South University,20(6), 1572–1581.
Article Google Scholar
Tranter, S.E., Yu, K., Reynolds, D.A., Evermann, G., Kim, D.Y., and Woodland, P.C. (2003) An investigation into the interactions between speaker diarization systems and automatic speech transcription. Eng. Dept., Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR-464.
Tvrdik, J. (2007). Differential evolution with competitive setting of control parameters. Task Quarterly, 10(4), 1001–1011.
Google Scholar
Tvrdik, J. (2009). Self-adaptive variants of differential evolution with exponential crossover (pp. 151–168). Analele of West University Timisoara, Series Mathematics-Informatics. http://www1.osu.cz/~tvrdik/down/global optimization.html.
Tvrdik, J., & Krivy, I. (2005). Hybrid differential evolution algorithm for optimal clustering. Applied Soft Computing Journal,35, 502.
Article Google Scholar
Tvrdik, J., Krivy, I. (2012). Differential evolution with competing strategies applied to partitional clustering. In Swarm and evolutionary computation, vol. 7269 of lecture notes in computer science (pp. 136–144).
Chapter Google Scholar
Tzanetakis, G. (2004). Song-specific bootstrapping of singing voice structure. In IEEE conference: Multimedia and expo (vol. 3).
Vazquez-Machado, C. and Colon-Hernandez, P., Torres-Carrasquillo, P.A. (2016). I-vector speaker and language recognition system on android. In High performance extreme computing conference (HPEC), IEEE.
Wilcox, L., Chen, F., Kimber, D., Balasubramanian, V. (1994). Segmentation of speech using speaker identification. In Proceedings on IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’94) (vol. 1, pp. I/161–I/164).
Yang, X., Liu, G. (2012). Self-adaptive clustering-based differential evolution with new composite trial vector generation strategies. In Proceedings of the 2nd International Congress on Computer Applications and Computational Sciences—Advances in Intelligent and Soft Computing (pp. 261–267). Berlin Heidelberg: Springer.
Chapter Google Scholar
Zelenak, M., Schulz, H., & Hernando, J. (2012). Speaker diarization of broadcast news in albayzin 2010 evaluation campaign. EURASIP Journal on Audio, Speech, and Music Processing,2012(1), 1–9.
Article Google Scholar
Zochova, P., V.Radova, V. (2005). Modified DISTBIC algorithm for speaker change detection. In INTERSPEECH Conference 2005—Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4–8.

Download references

Author information

Authors and Affiliations

Research Unite of Processing and Analysis of Electrical and Energetic Systems, Faculty of Sciences of Tunis, University Tunis El-Manar, 2092, Tunis, Tunisia
Dabbabi Karim & Cherif Adnen
National School of Engineers of Tunis, 3000, Tunis, El-Manar, Tunisia
Hajji Salah

Authors

Dabbabi Karim
View author publications
You can also search for this author in PubMed Google Scholar
Hajji Salah
View author publications
You can also search for this author in PubMed Google Scholar
Cherif Adnen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dabbabi Karim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karim, D., Salah, H. & Adnen, C. Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news. Int J Speech Technol 22, 893–909 (2019). https://doi.org/10.1007/s10772-019-09633-6

Download citation

Received: 16 December 2018
Accepted: 23 August 2019
Published: 11 September 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10772-019-09633-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news

Abstract

Access this article

Similar content being viewed by others

Integration of evolutionary computation algorithms and new AUTO-TLBO technique in the speaker clustering stage for speaker diarization of broadcast news

Global Speaker Clustering towards Optimal Stopping Criterion in Binary Key Speaker Diarization

An Improved Speaker Identification System Using Automatic Split-Merge Incremental Learning (A-SMILE) of Gaussian Mixture Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news

Abstract

Access this article

Similar content being viewed by others

Integration of evolutionary computation algorithms and new AUTO-TLBO technique in the speaker clustering stage for speaker diarization of broadcast news

Global Speaker Clustering towards Optimal Stopping Criterion in Binary Key Speaker Diarization

An Improved Speaker Identification System Using Automatic Split-Merge Incremental Learning (A-SMILE) of Gaussian Mixture Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation