Skip to main content
Log in

Optimal trained artificial neural network for Telugu speaker diarization

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Speaker indexing or diarization is the process of automatically partitioning the conversation involving multiple speakers into homogeneous segments and grouping together all the segments that correspond to the same speaker. So far, certain works have been done under this aspect; still, the need of accurate partitioning process gets lagged under certain criteria. With this in mind, this paper aims to introduce a new speaker indexing or diarization model (Telugu language) that initially involves Mel Frequency Cepstral coefficient based feature extraction. Subsequently, a new Optimized Artificial Neural Network (ANN) is introduced for clustering process. The novelty behind the clustering process is: the training of ANN takes place through optimization logic that updates the weight of ANN by a hybrid concept of Artificial Bee Colony (ABC) and Lion Algorithm (LA). Thereby, the proposed model is named as ANN-ABC-LA model. Finally, the performance of the proposed ANN-ABC-LA model is compared over the state-of-the-art models with respect to different performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Sahidullah M, Saha G (2013) A novel windowing technique for efficient computation of MFCC for speaker recognition. IEEE Signal Process Lett 20(2):149–152

    Google Scholar 

  2. May T, van de Par S, Kohlrausch A (2012) Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans Audio Speech Lang Process 20(1):108–121

    Google Scholar 

  3. Abrol V, Malhotra J (2013) Data dashboard-integrating data mining with data deduplication. Int J Comput Appl 71(22):28–33

    Google Scholar 

  4. Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675

    Google Scholar 

  5. Stafylakis T, Kenny P, Alam MJ, Kockmann M (2016) Speaker and channel factors in text-dependent speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 24(1):65–78

    Google Scholar 

  6. Cumani S, Laface P (2018) Speaker recognition using e–vectors. IEEE/ACM Trans Audio Speech Lang Process 26(4):736–748

    Google Scholar 

  7. Tang Z, Li L, Wang D, Vipperla R (2017) Collaborative joint training with multitask recurrent model for speech and speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25(3):493–504

    Google Scholar 

  8. Li L, Wang D, Zhang C, Zheng TF (2016) Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Trans Audio Speech Lang Process 24(6):1129–1139

    Google Scholar 

  9. McLaren M, van Leeuwen D (2012) Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources. IEEE/ACM Trans Audio Speech Lang Process 20(3):755–766

    Google Scholar 

  10. Barbari M, Leso L, Rossi G, Simonini S (2013) Use of radio frequency identification active technology to monitor animals in open spaces. Aust J Multi-Discip Eng 10(1):18–25

    Google Scholar 

  11. Mandasari MI, Saeidi R, McLaren M, van Leeuwen DA (2013) Quality measure functions for calibration of speaker recognition systems in various duration conditions. IEEE/ACM Trans Audio Speech Lang Process 21(11):2425–2438

    Google Scholar 

  12. Zhang X, Zou X, Sun M, Zheng TF, Jia C, Wang Y (2019) Noise robust speaker recognition based on adaptive frame weighting in GMM for i-vector extraction. IEEE Access 7:27874–27882

    Google Scholar 

  13. Ferràs M, Madikeri S, Motlicek P, Dey S, Bourlard H (2016) A large-scale open-source acoustic simulator for speaker recognition. IEEE Signal Process Lett 23(4):527–531

    Google Scholar 

  14. Cumani S, Laface P (2014) Large-scale training of pairwise support vector machines for speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 22(11):1590–1600

    Google Scholar 

  15. Sarao V, Veritti D, Furino C, Giancipoli E, Alessio G, Boscia F, Lanzetta P (2017) Dexamethasone implant with fixed or individualized regimen in the treatment of diabetic macular oedema: six-month outcomes of the UDBASA study. Acta Ophthalmol 95(4):e255–e260

    Google Scholar 

  16. Stafylakis T, Alam MJ, Kenny P (2016) Text-dependent speaker recognition with random digit strings. IEEE/ACM Trans Audio Speech Lang Process 24(7):1194–1203

    Google Scholar 

  17. Diez M, Varona A, Penagarikano M, Rodriguez-Fuentes LJ, Bordel G (2014) On the complementarity of phone posterior probabilities for improved speaker recognition. IEEE Signal Process Lett 21(6):649–652

    Google Scholar 

  18. Ferrer L, Nandwana MK, McLaren M, Castan D, Lawson A (2019) Toward fail-safe speaker recognition: trial-based calibration with a reject option. IEEE/ACM Trans Audio Speech Lang Process 27(1):140–153

    Article  Google Scholar 

  19. Remmiya R, Abisha C (2018) Artifacts removal in EEG signal using a NARX model based CS learning algorithm. Multim Res 1(1):1–8

    Google Scholar 

  20. Wagh MB, Gomathi N (2018) Route discovery for vehicular ad hoc networks using modified lion algorithm. Alex Eng J 57(4):3075–3087

    Google Scholar 

  21. Ghahabi O, Hernando J (2017) Deep learning backend for single and multisession i-vector speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25(4):807–817

    Google Scholar 

  22. Yan F, Men A, Yang B, Jiang Z (2016) An improved ranking-based feature enhancement approach for robust speaker recognition. IEEE Access 4:5258–5267

    Google Scholar 

  23. Cumani S, Laface P (2012) Analysis of large-scale SVM training algorithms for language and speaker recognition. IEEE Trans Audio Speech Lang Process 20(5):1585–1596

    Google Scholar 

  24. Liu Z, Wu Z, Li T, Li J, Shen C (2018) GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans Ind Inform 14(7):3244–3252

    Google Scholar 

  25. Jokinen E, Saeidi R, Kinnunen T, Alku P (2019) Vocal effort compensation for MFCC feature extraction in a shouted versus normal speaker recognition task. Comput Speech Lang 53:1–11

    Google Scholar 

  26. Alsulaiman M, Mahmood A, Muhammad G (2017) Speaker recognition based on Arabic phonemes. Speech Commun 86:42–51

    Google Scholar 

  27. Ghahabi O, Hernando J (2018) Restricted Boltzmann machines for vector representation of speech in speaker recognition. Comput Speech Lang 47:16–29

    Google Scholar 

  28. Franco-Pedroso J, Gonzalez-Rodriguez J (2016) Linguistically-constrained formant-based i-vectors for automatic speaker recognition. Speech Commun 76:61–81

    Google Scholar 

  29. You CH, Li H, Lee KA (2015) Relevance factor of maximum a posteriori adaptation for GMM–NAP–SVM in speaker and language recognition. Comput Speech Lang 30(1):116–134

    Google Scholar 

  30. Khosravani A, Homayounpour MM (2017) A PLDA approach for language and text independent speaker recognition. Comput Speech Lang 45:457–474

    Google Scholar 

  31. Mohan Y, Chee SS, Xin DKP, Foong LP (2016) Artificial neural network for classification of depressive and normal in EEG. In: 2016 IEEE EMBS conference on biomedical engineering and sciences (IECBES), Kuala Lumpur, pp 286–290

  32. Boothalingam R (2018) Optimization using lion algorithm: a biological inspiration from lion’s social behavior. Evol Intell 11(1–2):31–52

    Google Scholar 

  33. Xu Y, Fan P, Yuan L (2013) A simple and efficient artificial bee colony algorithm. Math Prob Eng 2013:1–9

    Google Scholar 

  34. https://www.etv.co.in/showsentitys/home/6

  35. Finsterle S, Kowalsky MB (2011) A truncated Levenberg–Marquardt algorithm for the calibration of highly parameterized nonlinear models. Comput Geosci 37(6):731–738

    Google Scholar 

  36. Fister I, Fister I, Yang X-S, Brest J (2013) A comprehensive review of firefly algorithms. Swarm Evol Comput 13:34–46

    Google Scholar 

  37. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Google Scholar 

  38. Pandit P, Rao P (2015) Speaker diarization of broadcast news audios

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Sethuram.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sethuram, V., Prasad, A. & Rao, R.R. Optimal trained artificial neural network for Telugu speaker diarization. Evol. Intel. 13, 631–648 (2020). https://doi.org/10.1007/s12065-020-00378-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-020-00378-9

Keywords

Navigation