The identification and localization of speaker using fusion techniques and machine learning techniques

Ali, Rasha H.; Abdullah, Mohammed Najm; Abed, Buthainah F.

doi:10.1007/s12065-020-00560-z

The identification and localization of speaker using fusion techniques and machine learning techniques

Special Issue
Published: 10 January 2021

Volume 17, pages 133–149, (2024)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Rasha H. Ali ORCID: orcid.org/0000-0003-3644-5979¹,
Mohammed Najm Abdullah² &
Buthainah F. Abed³

277 Accesses
1 Citation
Explore all metrics

Abstract

The systems of identification and localization of speakers are being used newly in diverse applications such as smart environments, audio conferences, and security, and social robotics which need more accuracy. The objective of this work is to define the localization of the speaker in sealed spaces and identifying the speaker in parallel using sound speaker signals. This work proposed a simulation of speaker localization and identification simultaneously using a feature fusion technique by constructing a feature vector which contains the features of identification and features of localization. The fusion technique has been used in each step of the proposed system such as data, feature, and decision fusion technique. Four Models were proposed for classifying the speaker are the Random Forest, the decision fusion which contains Random Forest and Support Vector Machine, the Restricted Boltzmann Machine which implemented by using the TensorFlow library from Google, and the long short-term memory technique was used which implemented using Keras library. The accuracy of the results was 66.39%, 82.035%, 99.84%, and 99.15% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A random forest guided tour

Article 19 April 2016

A Review on Random Forest: An Ensemble Classifier

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

References

Zhang Z, Wang L, Kai A, Yamada T, Li W, Iwahashi M (2015) Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification. J AUDIO SPEECH MUSIC PROC. 2015. https://doi.org/10.1186/s13636-015-0056-7
Article Google Scholar
Borsky M (1970) Robust recognition of strongly distorted speech dissertation. J Eng Math 4(2):186. https://doi.org/10.1007/BF01535092
Article Google Scholar
Markowitz J, Road NS (2008) Speaker identification and verification (SIV) applications and markets. VoiceXML Forum Speaker Biometrics Committee, pp 1–3
Sivasankaran S, Vincent E, Fohr D (2018) Keyword-based speaker localization: localizing a target speaker in a multi-speaker environment. In: Proceedings of annual conference international speech communication association. INTERSPEECH, vol 2018, pp 2703–2707
Friedland G, Vinyals O (2008) Live speaker identification in conversations. In: MM’08—proceedings of the 2008 ACM international conference on multimedia, with co-located symposium and workshops, pp 1017–1018. https://doi.org/10.1145/1459359.1459558
Leite I, Martinho C, Paiva A (2013) Social robots for long-term interaction: a survey. Int J Soc Robot 5(2):291–308. https://doi.org/10.1007/s12369-013-0178-y
Article Google Scholar
May T, Van De Par S, Kohlrausch A (2011) Simultaneous localization and identification of speakers in noisy and reverberant environments. In: Proceedings of forum acusticum, (c), pp 2121–2126
Tómasson H (2012) Speaker localization and identification. Master thesis, Reykjavík University
Byun S-W, Lee S-P (2016) Implementation of speaker identification using speaker localization for conference system. In: Proceedings of the 2nd World Congress on electrical engineering and computer systems and science, pp 1–5. https://doi.org/10.11159/mhci16.110
Youssef K, Itoyama K, Yoshii K (2017) Simultaneous identification and localization of still and mobile speakers based on binaural robot audition. J Robot Mechatron 29(1):59–71. https://doi.org/10.20965/jrm.2017.p0059
Article Google Scholar
El Ayadi M, Hassan AKSO, Abdel-Naby A, Elgendy OA (2017) Text-independent speaker identification using robust statistics estimation. Speech Commun 92:52–63. https://doi.org/10.1016/j.specom.2017.05.005
Article Google Scholar
Rafaely B, Alhaiany K (2018) Speaker localization using direct path dominance test based on sound field directivity. Signal Process 143:42–47. https://doi.org/10.1016/j.sigpro.2017.08.010
Article Google Scholar
Loughran R, Agapitos A, Kattan A, Brabazon A, O’Neill M (2017) Feature selection for speaker verification using genetic programming. Evol Intell. https://doi.org/10.1007/s12065-016-0150-5
Article Google Scholar
Pawar RV, Jalnekar RM, Chitode JS (2018) Review of various stages in speaker recognition system, performance measures and recognition toolkits. Analog Integr Circ Signal Process 94(2):247–257. https://doi.org/10.1007/s10470-017-1069-1
Article Google Scholar
Van Opstal J (2016) The auditory system and human sound-localization behavior. https://doi.org/10.1016/c2014-0-00203-1
Dey N, Ashour AS (2018) Direction of arrival estimation and localization of multi-speech sources. Springer, Cham. https://doi.org/10.1007/978-3-319-73059-2
Book Google Scholar
Beigi H (2011) Speaker recognition. Biometrics. https://doi.org/10.5772/17058
Article Google Scholar
Van Niedek T (2016) Phonetic classification in TensorFlow. Bachelor thesis, Radboud University
Voxforge dataset (2018). https://www.Voxforge.org
Makrem B, Zied L (2016) Structuring visual information for person detection in video: application to VIDTIMIT database. In: 2nd international conference on advanced technologies for signal and image processing, ATSIP 2016, pp 233–237. https://doi.org/10.1109/ATSIP.2016.7523074
Rana M, Miglani S (2014) Performance analysis of MFCC and LPCC techniques in automatic speech recognition. Int J Eng Comput Sci 3(7727):7727–7732
Google Scholar
Chithra PL, Aparna R (2015) Performance analysis of windowing techniques in automatic speech signal segmentation. Indian J Sci Technol. https://doi.org/10.17485/ijst/2015/v8i29/83616
Article Google Scholar
Sethuram V, Prasad A, Rao RR (2020) Optimal trained artificial neural network for Telugu speaker diarization. Evol Intell 13(4):631–648. https://doi.org/10.1007/s12065-020-00378-9
Article Google Scholar
Guido RC (2016) A tutorial on signal energy and its applications. Neurocomputing 179:264–282. https://doi.org/10.1016/j.neucom.2015.12.012
Article Google Scholar
Sundararajoo K (2015) Improvement of audio feature extraction techniques in traditional Indian string musical instrument. Master thesis, University Tun Hussein Onn Malaysia
Youssef K, Itoyama K, Yoshii K (2016) Identification and localization of one or two concurrent speakers in a binaural robotic context. In: Proceedings—2015 IEEE international conference on systems, man, and cybernetics, SMC 2015, pp 407–412. https://doi.org/10.1109/SMC.2015.82
Olvera-Guerrero OA, Prieto-Guerrero A, Espinosa-Paredes G (2017) Non-linear boiling water reactor stability with Shannon Entropy. Ann Nucl Energy 108:1–9. https://doi.org/10.1016/j.anucene.2017.04.031
Article CAS Google Scholar
Abdelsamie A, Janiga G, Thévenin D (2017) Spectral entropy as a flow state indicator. Int J Heat Fluid Flow 68(December):102–113. https://doi.org/10.1016/j.ijheatfluidflow.2017.09.013
Article Google Scholar
Kamarudin N, Al-Haddad SAR, Hashim SJ, Nematollahi MA, Hassan ARB (2014) Feature extraction using spectral centroid and Mel Frequency Cepstral Coefficient for Quranic accent automatic identification. In: 2014 IEEE student conference on research and development, SCOReD 2014, pp 0–5. https://doi.org/10.1109/SCORED.2014.7072945
Furoh T, Fukumori T, Nakayama M, Nishiura T (2014) A study of degraded-speech identification based on spectral centroid. In: INTERNOISE 2014—43rd International Congress on noise control engineering: improving the world through noise control, pp 1–6
McCrary S (2015) Implementing algorithms to measure common statistics. SSRN Electron J. https://doi.org/10.2139/ssrn.2695198
Article Google Scholar
Risoud M, Hanson JN, Gauvrit F, Renard C, Lemesre PE, Bonne NX, Vincent C (2018) Sound source localization. Eur Ann Otorhinolaryngol Head Neck Dis 135(4):259–264. https://doi.org/10.1016/j.anorl.2018.04.009
Article CAS PubMed Google Scholar
Shahab SN, Zainun AR, Ahmed Ali H, Hojabri M, Noordin NH (2017) MVDR algorithm based linear antenna array performance assessment for adaptive beamforming application. J Eng Sci Technol 12(5):1366–1385
Google Scholar
Huang Q, Hu R, Fang Y (2016) Real-valued MVDR beamforming using spherical arrays with frequency invariant characteristic. Digit Signal Process Rev J 48:239–245. https://doi.org/10.1016/j.dsp.2015.09.021
Article MathSciNet Google Scholar
Xiao Y, Yin J, Qi H, Yin H, Hua G (2017) MVDR algorithm based on estimated diagonal loading for beamforming. Math Probl Eng. https://doi.org/10.1155/2017/7904356
Article Google Scholar
Patwari A, Reddy GR (2017) 1D direction of arrival estimation using root-MUSIC and ESPRIT for dense uniform linear arrays. In: RTEICT 2017—2nd IEEE international conference on recent trends in electronics, information and communication technology, proceedings, 2018-January, pp 667–672. https://doi.org/10.1109/RTEICT.2017.8256681
Huang L, Chen H, Chen Y, Xin H (2016) Research of DOA estimation based on MUSIC algorithm. 118(Amcce):1057–1061. https://doi.org/10.2991/mmebc-16.2016.432
Wang F, Liu S, Ni W, Xu Z, Qiu Z, Wan Z, Pan Z (2019) Imbalanced data classification algorithm with support vector machine kernel extensions. Evol Intell 12(3):341–347. https://doi.org/10.1007/s12065-018-0182-0
Article Google Scholar
Asskali S (2017) Polyp detection: effect of early and late feature fusion. Master thesis, University of Oslo
Karlsson I, Karlsson I (2017) Order in the Random Forest. Master thesis, Stockholm University
Wagstaff KL, Liu GZ (2018) Automated classification to improve the efficiency of weeding library collections. J Acad Librariansh 44(2):238–247. https://doi.org/10.1016/j.acalib.2018.02.001
Article Google Scholar
Şentaş A, Tashiev İ, Küçükayvaz F, Kul S, Eken S, Sayar A, Becerikli Y (2020) Performance evaluation of support vector machine and convolutional neural network algorithms in real-time vehicle type and color classification. Evol Intell 13(1):83–91. https://doi.org/10.1007/s12065-018-0167-z
Article Google Scholar
Al-wajih E, Ghouti L (2019) Gender recognition using four statistical feature techniques: a comparative study of performance. Evol Intell 12(4):633–646. https://doi.org/10.1007/s12065-019-00264-z
Article Google Scholar
Ramanathan TT, Sharma D (2017) Multiple classification using SVM based multi knowledge based system. Procedia Comput Sci 115:307–311. https://doi.org/10.1016/j.procs.2017.09.139
Article Google Scholar
McClure N (2017) TensorFlow machine learning. Packt. Birmingham. UK. ISBN 978-1-78646-216-9
Hu H, Gao L, Ma Q (2016) Deep restricted Boltzmann networks. Retrieved from http://arxiv.org/abs/1611.07917
Zheng S, Ristovski K, Farahat A, Gupta C (2017) Long short-term memory network for remaining useful life estimation. In: IEEE international conference on prognostics and health management, ICPHM 2017, pp 88–95. https://doi.org/10.1109/ICPHM.2017.7998311
Fischer T, Krauss C (2018) Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res 270(2):654–669
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Computer Department, College of Education for Women, University of Baghdad, Baghdad, Iraq
Rasha H. Ali
Department of Computer Engineering, College of Engineering, University of Technology, Baghdad, Iraq
Mohammed Najm Abdullah
University of Information Technology and Communications, Baghdad, Iraq
Buthainah F. Abed

Authors

Rasha H. Ali
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Najm Abdullah
View author publications
You can also search for this author in PubMed Google Scholar
Buthainah F. Abed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rasha H. Ali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ali, R.H., Abdullah, M.N. & Abed, B.F. The identification and localization of speaker using fusion techniques and machine learning techniques. Evol. Intel. 17, 133–149 (2024). https://doi.org/10.1007/s12065-020-00560-z

Download citation

Received: 14 November 2019
Revised: 11 December 2020
Accepted: 26 December 2020
Published: 10 January 2021
Issue Date: February 2024
DOI: https://doi.org/10.1007/s12065-020-00560-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The identification and localization of speaker using fusion techniques and machine learning techniques

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

A Review on Random Forest: An Ensemble Classifier

A comparative analysis of gradient boosting algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The identification and localization of speaker using fusion techniques and machine learning techniques

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

A Review on Random Forest: An Ensemble Classifier

A comparative analysis of gradient boosting algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation