Skip to main content
Log in

The identification and localization of speaker using fusion techniques and machine learning techniques

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

The systems of identification and localization of speakers are being used newly in diverse applications such as smart environments, audio conferences, and security, and social robotics which need more accuracy. The objective of this work is to define the localization of the speaker in sealed spaces and identifying the speaker in parallel using sound speaker signals. This work proposed a simulation of speaker localization and identification simultaneously using a feature fusion technique by constructing a feature vector which contains the features of identification and features of localization. The fusion technique has been used in each step of the proposed system such as data, feature, and decision fusion technique. Four Models were proposed for classifying the speaker are the Random Forest, the decision fusion which contains Random Forest and Support Vector Machine, the Restricted Boltzmann Machine which implemented by using the TensorFlow library from Google, and the long short-term memory technique was used which implemented using Keras library. The accuracy of the results was 66.39%, 82.035%, 99.84%, and 99.15% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Zhang Z, Wang L, Kai A, Yamada T, Li W, Iwahashi M (2015) Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification.  J AUDIO SPEECH MUSIC PROC. 2015. https://doi.org/10.1186/s13636-015-0056-7

    Article  Google Scholar 

  2. Borsky M (1970) Robust recognition of strongly distorted speech dissertation. J Eng Math 4(2):186. https://doi.org/10.1007/BF01535092

    Article  Google Scholar 

  3. Markowitz J, Road NS (2008) Speaker identification and verification (SIV) applications and markets. VoiceXML Forum Speaker Biometrics Committee, pp 1–3

  4. Sivasankaran S, Vincent E, Fohr D (2018) Keyword-based speaker localization: localizing a target speaker in a multi-speaker environment. In: Proceedings of annual conference international speech communication association. INTERSPEECH, vol 2018, pp 2703–2707

  5. Friedland G, Vinyals O (2008) Live speaker identification in conversations. In: MM’08—proceedings of the 2008 ACM international conference on multimedia, with co-located symposium and workshops, pp 1017–1018. https://doi.org/10.1145/1459359.1459558

  6. Leite I, Martinho C, Paiva A (2013) Social robots for long-term interaction: a survey. Int J Soc Robot 5(2):291–308. https://doi.org/10.1007/s12369-013-0178-y

    Article  Google Scholar 

  7. May T, Van De Par S, Kohlrausch A (2011) Simultaneous localization and identification of speakers in noisy and reverberant environments. In: Proceedings of forum acusticum, (c), pp 2121–2126

  8. Tómasson H (2012) Speaker localization and identification. Master thesis, Reykjavík University

  9. Byun S-W, Lee S-P (2016) Implementation of speaker identification using speaker localization for conference system. In: Proceedings of the 2nd World Congress on electrical engineering and computer systems and science, pp 1–5. https://doi.org/10.11159/mhci16.110

  10. Youssef K, Itoyama K, Yoshii K (2017) Simultaneous identification and localization of still and mobile speakers based on binaural robot audition. J Robot Mechatron 29(1):59–71. https://doi.org/10.20965/jrm.2017.p0059

    Article  Google Scholar 

  11. El Ayadi M, Hassan AKSO, Abdel-Naby A, Elgendy OA (2017) Text-independent speaker identification using robust statistics estimation. Speech Commun 92:52–63. https://doi.org/10.1016/j.specom.2017.05.005

    Article  Google Scholar 

  12. Rafaely B, Alhaiany K (2018) Speaker localization using direct path dominance test based on sound field directivity. Signal Process 143:42–47. https://doi.org/10.1016/j.sigpro.2017.08.010

    Article  Google Scholar 

  13. Loughran R, Agapitos A, Kattan A, Brabazon A, O’Neill M (2017) Feature selection for speaker verification using genetic programming. Evol Intell. https://doi.org/10.1007/s12065-016-0150-5

    Article  Google Scholar 

  14. Pawar RV, Jalnekar RM, Chitode JS (2018) Review of various stages in speaker recognition system, performance measures and recognition toolkits. Analog Integr Circ Signal Process 94(2):247–257. https://doi.org/10.1007/s10470-017-1069-1

    Article  Google Scholar 

  15. Van Opstal J (2016) The auditory system and human sound-localization behavior. https://doi.org/10.1016/c2014-0-00203-1

  16. Dey N, Ashour AS (2018) Direction of arrival estimation and localization of multi-speech sources. Springer, Cham. https://doi.org/10.1007/978-3-319-73059-2

    Book  Google Scholar 

  17. Beigi H (2011) Speaker recognition. Biometrics. https://doi.org/10.5772/17058

    Article  Google Scholar 

  18. Van Niedek T (2016) Phonetic classification in TensorFlow. Bachelor thesis, Radboud University

  19. Voxforge dataset (2018). https://www.Voxforge.org

  20. Makrem B, Zied L (2016) Structuring visual information for person detection in video: application to VIDTIMIT database. In: 2nd international conference on advanced technologies for signal and image processing, ATSIP 2016, pp 233–237. https://doi.org/10.1109/ATSIP.2016.7523074

  21. Rana M, Miglani S (2014) Performance analysis of MFCC and LPCC techniques in automatic speech recognition. Int J Eng Comput Sci 3(7727):7727–7732

    Google Scholar 

  22. Chithra PL, Aparna R (2015) Performance analysis of windowing techniques in automatic speech signal segmentation. Indian J Sci Technol. https://doi.org/10.17485/ijst/2015/v8i29/83616

    Article  Google Scholar 

  23. Sethuram V, Prasad A, Rao RR (2020) Optimal trained artificial neural network for Telugu speaker diarization. Evol Intell 13(4):631–648. https://doi.org/10.1007/s12065-020-00378-9

    Article  Google Scholar 

  24. Guido RC (2016) A tutorial on signal energy and its applications. Neurocomputing 179:264–282. https://doi.org/10.1016/j.neucom.2015.12.012

    Article  Google Scholar 

  25. Sundararajoo K (2015) Improvement of audio feature extraction techniques in traditional Indian string musical instrument. Master thesis, University Tun Hussein Onn Malaysia

  26. Youssef K, Itoyama K, Yoshii K (2016) Identification and localization of one or two concurrent speakers in a binaural robotic context. In: Proceedings—2015 IEEE international conference on systems, man, and cybernetics, SMC 2015, pp 407–412. https://doi.org/10.1109/SMC.2015.82

  27. Olvera-Guerrero OA, Prieto-Guerrero A, Espinosa-Paredes G (2017) Non-linear boiling water reactor stability with Shannon Entropy. Ann Nucl Energy 108:1–9. https://doi.org/10.1016/j.anucene.2017.04.031

    Article  CAS  Google Scholar 

  28. Abdelsamie A, Janiga G, Thévenin D (2017) Spectral entropy as a flow state indicator. Int J Heat Fluid Flow 68(December):102–113. https://doi.org/10.1016/j.ijheatfluidflow.2017.09.013

    Article  Google Scholar 

  29. Kamarudin N, Al-Haddad SAR, Hashim SJ, Nematollahi MA, Hassan ARB (2014) Feature extraction using spectral centroid and Mel Frequency Cepstral Coefficient for Quranic accent automatic identification. In: 2014 IEEE student conference on research and development, SCOReD 2014, pp 0–5. https://doi.org/10.1109/SCORED.2014.7072945

  30. Furoh T, Fukumori T, Nakayama M, Nishiura T (2014) A study of degraded-speech identification based on spectral centroid. In: INTERNOISE 2014—43rd International Congress on noise control engineering: improving the world through noise control, pp 1–6

  31. McCrary S (2015) Implementing algorithms to measure common statistics. SSRN Electron J. https://doi.org/10.2139/ssrn.2695198

    Article  Google Scholar 

  32. Risoud M, Hanson JN, Gauvrit F, Renard C, Lemesre PE, Bonne NX, Vincent C (2018) Sound source localization. Eur Ann Otorhinolaryngol Head Neck Dis 135(4):259–264. https://doi.org/10.1016/j.anorl.2018.04.009

    Article  CAS  PubMed  Google Scholar 

  33. Shahab SN, Zainun AR, Ahmed Ali H, Hojabri M, Noordin NH (2017) MVDR algorithm based linear antenna array performance assessment for adaptive beamforming application. J Eng Sci Technol 12(5):1366–1385

    Google Scholar 

  34. Huang Q, Hu R, Fang Y (2016) Real-valued MVDR beamforming using spherical arrays with frequency invariant characteristic. Digit Signal Process Rev J 48:239–245. https://doi.org/10.1016/j.dsp.2015.09.021

    Article  MathSciNet  Google Scholar 

  35. Xiao Y, Yin J, Qi H, Yin H, Hua G (2017) MVDR algorithm based on estimated diagonal loading for beamforming. Math Probl Eng. https://doi.org/10.1155/2017/7904356

    Article  Google Scholar 

  36. Patwari A, Reddy GR (2017) 1D direction of arrival estimation using root-MUSIC and ESPRIT for dense uniform linear arrays. In: RTEICT 2017—2nd IEEE international conference on recent trends in electronics, information and communication technology, proceedings, 2018-January, pp 667–672. https://doi.org/10.1109/RTEICT.2017.8256681

  37. Huang L, Chen H, Chen Y, Xin H (2016) Research of DOA estimation based on MUSIC algorithm. 118(Amcce):1057–1061. https://doi.org/10.2991/mmebc-16.2016.432

  38. Wang F, Liu S, Ni W, Xu Z, Qiu Z, Wan Z, Pan Z (2019) Imbalanced data classification algorithm with support vector machine kernel extensions. Evol Intell 12(3):341–347. https://doi.org/10.1007/s12065-018-0182-0

    Article  Google Scholar 

  39. Asskali S (2017) Polyp detection: effect of early and late feature fusion. Master thesis, University of Oslo

  40. Karlsson I, Karlsson I (2017) Order in the Random Forest. Master thesis, Stockholm University

  41. Wagstaff KL, Liu GZ (2018) Automated classification to improve the efficiency of weeding library collections. J Acad Librariansh 44(2):238–247. https://doi.org/10.1016/j.acalib.2018.02.001

    Article  Google Scholar 

  42. Şentaş A, Tashiev İ, Küçükayvaz F, Kul S, Eken S, Sayar A, Becerikli Y (2020) Performance evaluation of support vector machine and convolutional neural network algorithms in real-time vehicle type and color classification. Evol Intell 13(1):83–91. https://doi.org/10.1007/s12065-018-0167-z

    Article  Google Scholar 

  43. Al-wajih E, Ghouti L (2019) Gender recognition using four statistical feature techniques: a comparative study of performance. Evol Intell 12(4):633–646. https://doi.org/10.1007/s12065-019-00264-z

    Article  Google Scholar 

  44. Ramanathan TT, Sharma D (2017) Multiple classification using SVM based multi knowledge based system. Procedia Comput Sci 115:307–311. https://doi.org/10.1016/j.procs.2017.09.139

    Article  Google Scholar 

  45. McClure N (2017) TensorFlow machine learning. Packt. Birmingham. UK. ISBN 978-1-78646-216-9

  46. Hu H, Gao L, Ma Q (2016) Deep restricted Boltzmann networks. Retrieved from http://arxiv.org/abs/1611.07917

  47. Zheng S, Ristovski K, Farahat A, Gupta C (2017) Long short-term memory network for remaining useful life estimation. In: IEEE international conference on prognostics and health management, ICPHM 2017, pp 88–95. https://doi.org/10.1109/ICPHM.2017.7998311

  48. Fischer T, Krauss C (2018) Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res 270(2):654–669

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rasha H. Ali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, R.H., Abdullah, M.N. & Abed, B.F. The identification and localization of speaker using fusion techniques and machine learning techniques. Evol. Intel. 17, 133–149 (2024). https://doi.org/10.1007/s12065-020-00560-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-020-00560-z

Keywords

Navigation