Abstract
The systems of identification and localization of speakers are being used newly in diverse applications such as smart environments, audio conferences, and security, and social robotics which need more accuracy. The objective of this work is to define the localization of the speaker in sealed spaces and identifying the speaker in parallel using sound speaker signals. This work proposed a simulation of speaker localization and identification simultaneously using a feature fusion technique by constructing a feature vector which contains the features of identification and features of localization. The fusion technique has been used in each step of the proposed system such as data, feature, and decision fusion technique. Four Models were proposed for classifying the speaker are the Random Forest, the decision fusion which contains Random Forest and Support Vector Machine, the Restricted Boltzmann Machine which implemented by using the TensorFlow library from Google, and the long short-term memory technique was used which implemented using Keras library. The accuracy of the results was 66.39%, 82.035%, 99.84%, and 99.15% respectively.
Similar content being viewed by others
References
Zhang Z, Wang L, Kai A, Yamada T, Li W, Iwahashi M (2015) Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification. J AUDIO SPEECH MUSIC PROC. 2015. https://doi.org/10.1186/s13636-015-0056-7
Borsky M (1970) Robust recognition of strongly distorted speech dissertation. J Eng Math 4(2):186. https://doi.org/10.1007/BF01535092
Markowitz J, Road NS (2008) Speaker identification and verification (SIV) applications and markets. VoiceXML Forum Speaker Biometrics Committee, pp 1–3
Sivasankaran S, Vincent E, Fohr D (2018) Keyword-based speaker localization: localizing a target speaker in a multi-speaker environment. In: Proceedings of annual conference international speech communication association. INTERSPEECH, vol 2018, pp 2703–2707
Friedland G, Vinyals O (2008) Live speaker identification in conversations. In: MM’08—proceedings of the 2008 ACM international conference on multimedia, with co-located symposium and workshops, pp 1017–1018. https://doi.org/10.1145/1459359.1459558
Leite I, Martinho C, Paiva A (2013) Social robots for long-term interaction: a survey. Int J Soc Robot 5(2):291–308. https://doi.org/10.1007/s12369-013-0178-y
May T, Van De Par S, Kohlrausch A (2011) Simultaneous localization and identification of speakers in noisy and reverberant environments. In: Proceedings of forum acusticum, (c), pp 2121–2126
Tómasson H (2012) Speaker localization and identification. Master thesis, Reykjavík University
Byun S-W, Lee S-P (2016) Implementation of speaker identification using speaker localization for conference system. In: Proceedings of the 2nd World Congress on electrical engineering and computer systems and science, pp 1–5. https://doi.org/10.11159/mhci16.110
Youssef K, Itoyama K, Yoshii K (2017) Simultaneous identification and localization of still and mobile speakers based on binaural robot audition. J Robot Mechatron 29(1):59–71. https://doi.org/10.20965/jrm.2017.p0059
El Ayadi M, Hassan AKSO, Abdel-Naby A, Elgendy OA (2017) Text-independent speaker identification using robust statistics estimation. Speech Commun 92:52–63. https://doi.org/10.1016/j.specom.2017.05.005
Rafaely B, Alhaiany K (2018) Speaker localization using direct path dominance test based on sound field directivity. Signal Process 143:42–47. https://doi.org/10.1016/j.sigpro.2017.08.010
Loughran R, Agapitos A, Kattan A, Brabazon A, O’Neill M (2017) Feature selection for speaker verification using genetic programming. Evol Intell. https://doi.org/10.1007/s12065-016-0150-5
Pawar RV, Jalnekar RM, Chitode JS (2018) Review of various stages in speaker recognition system, performance measures and recognition toolkits. Analog Integr Circ Signal Process 94(2):247–257. https://doi.org/10.1007/s10470-017-1069-1
Van Opstal J (2016) The auditory system and human sound-localization behavior. https://doi.org/10.1016/c2014-0-00203-1
Dey N, Ashour AS (2018) Direction of arrival estimation and localization of multi-speech sources. Springer, Cham. https://doi.org/10.1007/978-3-319-73059-2
Beigi H (2011) Speaker recognition. Biometrics. https://doi.org/10.5772/17058
Van Niedek T (2016) Phonetic classification in TensorFlow. Bachelor thesis, Radboud University
Voxforge dataset (2018). https://www.Voxforge.org
Makrem B, Zied L (2016) Structuring visual information for person detection in video: application to VIDTIMIT database. In: 2nd international conference on advanced technologies for signal and image processing, ATSIP 2016, pp 233–237. https://doi.org/10.1109/ATSIP.2016.7523074
Rana M, Miglani S (2014) Performance analysis of MFCC and LPCC techniques in automatic speech recognition. Int J Eng Comput Sci 3(7727):7727–7732
Chithra PL, Aparna R (2015) Performance analysis of windowing techniques in automatic speech signal segmentation. Indian J Sci Technol. https://doi.org/10.17485/ijst/2015/v8i29/83616
Sethuram V, Prasad A, Rao RR (2020) Optimal trained artificial neural network for Telugu speaker diarization. Evol Intell 13(4):631–648. https://doi.org/10.1007/s12065-020-00378-9
Guido RC (2016) A tutorial on signal energy and its applications. Neurocomputing 179:264–282. https://doi.org/10.1016/j.neucom.2015.12.012
Sundararajoo K (2015) Improvement of audio feature extraction techniques in traditional Indian string musical instrument. Master thesis, University Tun Hussein Onn Malaysia
Youssef K, Itoyama K, Yoshii K (2016) Identification and localization of one or two concurrent speakers in a binaural robotic context. In: Proceedings—2015 IEEE international conference on systems, man, and cybernetics, SMC 2015, pp 407–412. https://doi.org/10.1109/SMC.2015.82
Olvera-Guerrero OA, Prieto-Guerrero A, Espinosa-Paredes G (2017) Non-linear boiling water reactor stability with Shannon Entropy. Ann Nucl Energy 108:1–9. https://doi.org/10.1016/j.anucene.2017.04.031
Abdelsamie A, Janiga G, Thévenin D (2017) Spectral entropy as a flow state indicator. Int J Heat Fluid Flow 68(December):102–113. https://doi.org/10.1016/j.ijheatfluidflow.2017.09.013
Kamarudin N, Al-Haddad SAR, Hashim SJ, Nematollahi MA, Hassan ARB (2014) Feature extraction using spectral centroid and Mel Frequency Cepstral Coefficient for Quranic accent automatic identification. In: 2014 IEEE student conference on research and development, SCOReD 2014, pp 0–5. https://doi.org/10.1109/SCORED.2014.7072945
Furoh T, Fukumori T, Nakayama M, Nishiura T (2014) A study of degraded-speech identification based on spectral centroid. In: INTERNOISE 2014—43rd International Congress on noise control engineering: improving the world through noise control, pp 1–6
McCrary S (2015) Implementing algorithms to measure common statistics. SSRN Electron J. https://doi.org/10.2139/ssrn.2695198
Risoud M, Hanson JN, Gauvrit F, Renard C, Lemesre PE, Bonne NX, Vincent C (2018) Sound source localization. Eur Ann Otorhinolaryngol Head Neck Dis 135(4):259–264. https://doi.org/10.1016/j.anorl.2018.04.009
Shahab SN, Zainun AR, Ahmed Ali H, Hojabri M, Noordin NH (2017) MVDR algorithm based linear antenna array performance assessment for adaptive beamforming application. J Eng Sci Technol 12(5):1366–1385
Huang Q, Hu R, Fang Y (2016) Real-valued MVDR beamforming using spherical arrays with frequency invariant characteristic. Digit Signal Process Rev J 48:239–245. https://doi.org/10.1016/j.dsp.2015.09.021
Xiao Y, Yin J, Qi H, Yin H, Hua G (2017) MVDR algorithm based on estimated diagonal loading for beamforming. Math Probl Eng. https://doi.org/10.1155/2017/7904356
Patwari A, Reddy GR (2017) 1D direction of arrival estimation using root-MUSIC and ESPRIT for dense uniform linear arrays. In: RTEICT 2017—2nd IEEE international conference on recent trends in electronics, information and communication technology, proceedings, 2018-January, pp 667–672. https://doi.org/10.1109/RTEICT.2017.8256681
Huang L, Chen H, Chen Y, Xin H (2016) Research of DOA estimation based on MUSIC algorithm. 118(Amcce):1057–1061. https://doi.org/10.2991/mmebc-16.2016.432
Wang F, Liu S, Ni W, Xu Z, Qiu Z, Wan Z, Pan Z (2019) Imbalanced data classification algorithm with support vector machine kernel extensions. Evol Intell 12(3):341–347. https://doi.org/10.1007/s12065-018-0182-0
Asskali S (2017) Polyp detection: effect of early and late feature fusion. Master thesis, University of Oslo
Karlsson I, Karlsson I (2017) Order in the Random Forest. Master thesis, Stockholm University
Wagstaff KL, Liu GZ (2018) Automated classification to improve the efficiency of weeding library collections. J Acad Librariansh 44(2):238–247. https://doi.org/10.1016/j.acalib.2018.02.001
Şentaş A, Tashiev İ, Küçükayvaz F, Kul S, Eken S, Sayar A, Becerikli Y (2020) Performance evaluation of support vector machine and convolutional neural network algorithms in real-time vehicle type and color classification. Evol Intell 13(1):83–91. https://doi.org/10.1007/s12065-018-0167-z
Al-wajih E, Ghouti L (2019) Gender recognition using four statistical feature techniques: a comparative study of performance. Evol Intell 12(4):633–646. https://doi.org/10.1007/s12065-019-00264-z
Ramanathan TT, Sharma D (2017) Multiple classification using SVM based multi knowledge based system. Procedia Comput Sci 115:307–311. https://doi.org/10.1016/j.procs.2017.09.139
McClure N (2017) TensorFlow machine learning. Packt. Birmingham. UK. ISBN 978-1-78646-216-9
Hu H, Gao L, Ma Q (2016) Deep restricted Boltzmann networks. Retrieved from http://arxiv.org/abs/1611.07917
Zheng S, Ristovski K, Farahat A, Gupta C (2017) Long short-term memory network for remaining useful life estimation. In: IEEE international conference on prognostics and health management, ICPHM 2017, pp 88–95. https://doi.org/10.1109/ICPHM.2017.7998311
Fischer T, Krauss C (2018) Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res 270(2):654–669
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ali, R.H., Abdullah, M.N. & Abed, B.F. The identification and localization of speaker using fusion techniques and machine learning techniques. Evol. Intel. 17, 133–149 (2024). https://doi.org/10.1007/s12065-020-00560-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-020-00560-z