Skip to main content
Log in

Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we address the problem of optimal non-hierarchical clustering in the speaker clustering phase for the speaker diarization task of news broadcasts. A new hybridization combining differential evolution (DE) algorithm and K-means algorithm is proposed and tested on TV news database (TVND). To optimize the classification of speakers, two criteria, namely trace within criterion (TRW) and variance ratio criterion (VRC), were used as clustering validity indices, correcting every possible grouping of speakers’ segments. Concerning the encoding of the classification of clusters to be optimized, it is performed by the cluster centers in DE algorithm. Therefore, a problem of rearrangement of centers in the populations can be generated, which cannot ensure an efficient search by applying evolutionary operators. For this purpose, an efficient heuristic was also proposed for this rearrangement. Non-hybrid DE variants were applied with and without the rearrangement of cluster centers, and compared with the corresponding hybrid K-means variants. The experimental results have showed the high-efficiency of hybrid K-means variants with the rearrangement of cluster centers compared with those without the rearrangement of cluster centers and non-hybrid DE variants. Also, the obtained results using hybrid and non-hybrid DE variants with the rearrangement of cluster centers were quite similar using both TWR and VRC criteria. Moreover, the best efficiency was acquired using hybrid DE variants thanks to these two criteria from which a value of 13.05% of DER has been reached by hybrid b6e6rl variant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.technolangue.net/index.php

References

  • Ajmera, J., Wooters, C. (2003). A robust speaker clustering algorithm. In Proceedings on IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’03) (pp. 411–416).

  • Anguera, X., Wooters, C., Hernando, J. (2006). Purity algorithms for speaker diarization of meetings data. In Acoustics, speech and signal processing, 2006. ICASSP 2006 Proceedings, 2006 IEEE International Conference.

  • Barras, C., Zhu, X., Meignier, S., & Gauvain, J. (2006). Multistage speaker diarization of broadcast news. IEEE Transactions on Audio, Speech, and Language Processing,14(5), 1505.

    Article  Google Scholar 

  • Bozonnet, S., Evans, N., Fredouille, C. (2010). The LIA-EURECOM RT’09 speaker diarization system: enhancements in speaker modelling and cluster purification. In Acoustics speech and signal processing (ICASSP), 2010 IEEE international conference (pp. 4958–4961).

  • Bozonnet, S., Evans, N.W.D., and Fredouille, C. (2010). The lia-Eurecom RT’09 speaker diarization system: enhancements in speaker Gaussian and cluster purification. In Acoustics speech and signal processing (ICASSP), 2010 IEEE international conference (pp. 4958–4961).

  • Brest, J., Greiner, S., Boskovic, B., Mernik, M., & Zumer, V. (2006). Self-adapting control parameters in differential evolution: A comparative study on numerical benchmark problems. IEEE Transactions on Evolutionary Computation,10, 646–657.

    Article  Google Scholar 

  • Brownlee, J. (2011). Clever algorithms nature-inspired programming recipes. Faculty of Information and Communication Technologies Swinburne University of Technology, Melbourne, Australia. First Edition. Lulu.

  • Carlisle, A. and Doizier, G. (2001). An off-the-shelf PSO. In Proceedings on particle swarm optimization workshop. West Lafayette, School Eng. Technol, Purdue University.

  • Chen, S., Gopalakrishnan, P. (1998). Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In Proceeding DARPA Broadcast News Transcription and Understanding Workshop (pp. 127–132).

  • Das, S., Abraham, A., & Konar, A. (2008). Automatic clustering using an improved differential evolution algorithm. IEEE Transactions on Systems Man and Cybernetics Part A-Systems and Humans,28, 218–237.

    Article  Google Scholar 

  • Das, S., & Sil, S. (2010). Kernel-induced fuzzy clustering of image pixels with an improved differential evolution algorithm. Information Sciences,180, 1237–1256.

    Article  MathSciNet  Google Scholar 

  • Dupuy, G., Meignier, S., Deléglise, P., Estève, Y. (2014). Recent improvements on ILP-based clustering for broadcast news speaker diarization. In The Speaker and Language Recognition Workshop. Joensuu, Finland.

  • Dupuy, G., Rouvier, M., Meignier, S., and Esteve, Y. (2012). I-vectors and ILP clustering adapted to cross-show speaker diarization. In Proceedings of Interspeech. Portland, Oregon.

  • Fu, W., Johnston, M., and Zhang, M. (2011). Hybrid particle swarm optimization algorithms based on differential evolution and local search. In Advances in artificial intelligence lecture notes in computer science (vol. 6464, pp. 313–322). Berlin Heidelberg: Springer.

    Chapter  Google Scholar 

  • Gaithersburg, M.D. (2004). Fall 2004 rich transcription (RT-04F) evaluation plan.

  • Galibert, O. and Kahn, J. (2013). The first official REPERE evaluation. In Proceedings of Interspeech Satellite Workshop on Speech, Language and Audio in Multimedia (SLAM). Marseille, France.

  • Galliano, S., Gravier, G., and Chaubard, L. (2009). The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts. In INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association. Brighton, September 6–10, 2009.

  • Gauvain, J., Lamel, L., Adda, G. (1998). Partitioning and transcription of broadcast news data. In Proceedings on 5th International Conference on Spoken Language Processing (ICSLP’98). Sydney, Australia, paper 0084.

  • Gupta, V., Boulianne, G., Kenny, P., Ouellet, P. and Dumouchel, P. (2008). Speaker diarization of French broadcast new. Centre de recherché Informatique de Montréal (CRIM).

  • Jeyakumar, G., & Shunmuga Velayutham, C. (2009). A comparative performance analysis of multiple trial vectors differential evolution and classical differential evolution variants. In: H. Sakai, M. K. Chakraborty, A. E. Hassanien, D. Ślęzak, & W. Zhu (Eds.), Rough sets, fuzzy sets, data mining and granular computing. RSFDGrC 2009. Lecture notes in computer science (vol 5908). Springer, Berlin.

  • Krink, T., Paterlini, S., & Resti, A. (2007). Using differential evolution to improve the accuracy of bank rating systems. Computational Statistics & Data Analysis,52, 68–87.

    Article  MathSciNet  Google Scholar 

  • Kuo, R.J., Suryani, E., Yasid, A. (2013). Automatic clustering combining differential evolution algorithm and K-means algorithm. In Proceedings of the Institute of Industrial Engineers Asian Conference (pp. 1207–1215).

    Chapter  Google Scholar 

  • Kwedlo, W. (2011). A clustering method combining differential evolution with the K-means algorithm. Pattern Recognition Letters,32, 1613–1621.

    Article  Google Scholar 

  • Meignier, S., Bonastre, J., Igounet, S. (2001). E-HMM approach for learning and adapting sound models for speaker indexing. In Proceedings on 2001: A speaker Odyssey—The speaker recognition workshop (Odyssey-2001) (pp. 175–180).

  • Meignier, S., Moraru, D., Fredouille, C., Bonastre, J., & Besacier, L. (2006). Step-by-step and integrated approaches in broadcast news speaker diarization. Computer Speech & Language,20(2–3), 303–330.

    Article  Google Scholar 

  • Mirrezaie, S.M. and Ahadi, S.M. (2008). Speaker diarization in a multi-speaker environment using particle swarm optimization and mutual information. In 2008 IEEE International Conference on Multimedia and Expo ICME 2008 Proceedings.

  • Moraru, D., Besacier, L., Castelli E. (2004). Using a priori information for speaker diarization. In Proceedings on the Speaker and Language Recognitionworkshop (pp. 355–362).

  • NIST. (2004). Fall 2004 rich transcription (RT-04F) evaluation plan.

  • Nwankwor, E., Nagar, A. K., & Reid, D. C. (2013). Hybrid differential evolution and particle swarm optimization for optimal well placement. Computational Geosciences,17(2), 249–268.

    Article  Google Scholar 

  • Pandit, P., & Rao, P. (2015). SpeakerDiarization of broadcast news audios. Bombay: Department of Electrical Engineering, Indian Institute of Technology Bombay.

    Google Scholar 

  • Paterlini, S., & Krink, T. (2004). Differential evolution and particle swarm optimisation in partitional clustering. Computational Statistics & Data Analysis,50, 1220–1247.

    Article  MathSciNet  Google Scholar 

  • Reynolds, D., Dunn, R., McLaughlin, J. (2000). The Lincoln Speaker Recognition System: NIST Eval2000. In Proceedings on ICSLP’00 (vol. 2, pp. 470–473).

  • Reynolds, D., & Torres-Carrasquillo, P. (2005). Approaches and applications of audio diarization. In Proceeding on International Conference Acoustic, Speech, Signal Process (pp. 953–956). Philadelphia.

  • Robinson, J., & Rahmat-Samii, Y. (2004). Particle swarm optimization in electromagnetics. IEEE Transactions on Antennas and Propagation,52(2), 397–407.

    Article  MathSciNet  Google Scholar 

  • Salcedo-Sanz, S., Gallardo-Antolín, A., Leiva-Murillo, J. M., & Bousoño-Calzón, C. (2006). Offline speaker segmentation using genetic algorithms and mutual information. IEEE Transactions on Evolutionary Computation,10(2), 1.

    Article  Google Scholar 

  • Siegler, M., Jain, U, Raj, B., Stern, R. (1997). Automatic segmentation, classification and clustering of broadcast news audio. In Proceedings on DARPA Speechrecognition, Workshop (pp. 97–99). Chantilly.

  • Sierra, L.-M., Cobos, C., & Corrales, J.-C. (2014). Continuous optimization based on a hybridization of differential evolution with K-means. Computer Science Journal. https://doi.org/10.1007/978-3-319-12027-0_31.

    Article  Google Scholar 

  • Storn, R., & Price, K. (1997). Differential evolution A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization,11, 341–359.

    Article  MathSciNet  Google Scholar 

  • Tan, Y., Tan, G.-Z., & Deng, S.-G. (2013). Hybrid particle swarm optimization with differential evolution and chaotic local search to solve reliability-redundancy allocation problems. Journal of Central South University,20(6), 1572–1581.

    Article  Google Scholar 

  • Tranter, S.E., Yu, K., Reynolds, D.A., Evermann, G., Kim, D.Y., and Woodland, P.C. (2003) An investigation into the interactions between speaker diarization systems and automatic speech transcription. Eng. Dept., Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR-464.

  • Tvrdik, J. (2007). Differential evolution with competitive setting of control parameters. Task Quarterly, 10(4), 1001–1011.

    Google Scholar 

  • Tvrdik, J. (2009). Self-adaptive variants of differential evolution with exponential crossover (pp. 151–168). Analele of West University Timisoara, Series Mathematics-Informatics. http://www1.osu.cz/~tvrdik/down/global optimization.html.

  • Tvrdik, J., & Krivy, I. (2005). Hybrid differential evolution algorithm for optimal clustering. Applied Soft Computing Journal,35, 502.

    Article  Google Scholar 

  • Tvrdik, J., Krivy, I. (2012). Differential evolution with competing strategies applied to partitional clustering. In Swarm and evolutionary computation, vol. 7269 of lecture notes in computer science (pp. 136–144).

    Chapter  Google Scholar 

  • Tzanetakis, G. (2004). Song-specific bootstrapping of singing voice structure. In IEEE conference: Multimedia and expo (vol. 3).

  • Vazquez-Machado, C. and Colon-Hernandez, P., Torres-Carrasquillo, P.A. (2016). I-vector speaker and language recognition system on android. In High performance extreme computing conference (HPEC), IEEE.

  • Wilcox, L., Chen, F., Kimber, D., Balasubramanian, V. (1994). Segmentation of speech using speaker identification. In Proceedings on IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’94) (vol. 1, pp. I/161–I/164).

  • Yang, X., Liu, G. (2012). Self-adaptive clustering-based differential evolution with new composite trial vector generation strategies. In Proceedings of the 2nd International Congress on Computer Applications and Computational Sciences—Advances in Intelligent and Soft Computing (pp. 261–267). Berlin Heidelberg: Springer.

    Chapter  Google Scholar 

  • Zelenak, M., Schulz, H., & Hernando, J. (2012). Speaker diarization of broadcast news in albayzin 2010 evaluation campaign. EURASIP Journal on Audio, Speech, and Music Processing,2012(1), 1–9.

    Article  Google Scholar 

  • Zochova, P., V.Radova, V. (2005). Modified DISTBIC algorithm for speaker change detection. In INTERSPEECH Conference 2005—Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4–8.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dabbabi Karim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karim, D., Salah, H. & Adnen, C. Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news. Int J Speech Technol 22, 893–909 (2019). https://doi.org/10.1007/s10772-019-09633-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-019-09633-6

Keywords

Navigation