Bangla Speech-Based Person Identification Using LSTM Networks

Khan, Rahad; Hossain, Saddam; Hossain, Akbor; Siddiqui, Fazlul Hasan; Noor, Sabah Binte

doi:10.1007/978-3-031-34619-4_29

Rahad Khan¹⁹,
Saddam Hossain¹⁹,
Akbor Hossain¹⁹,
Fazlul Hasan Siddiqui¹⁹ &
…
Sabah Binte Noor¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 490))

Included in the following conference series:

International Conference on Machine Intelligence and Emerging Technologies

257 Accesses

Abstract

The Iris, face, fingers, conduct, voice, and other things of the human body are employed for security and identification since they are unique in each person. Biometric systems are popular and widely used in Bangladesh to identify people, e.g. in cybercrime. Apart from biometrics, a person can be identified by their voice. Since each person’s speech has a distinct timbre, vocal pattern, and frequency spectrogram. A human can easily identify the voice of a known person, but it is difficult for a machine. As a result, researchers are interested in processing human voices and recognizing them by machines. To predict the human voice, various traditional machine learning models such as GMM, HMM, SVM, and MLP are used. Voice data is a complex time-series signal and massive datasets are required to train ML models. As a result, traditional ML has low accuracy and takes a long time to train. In contrast, LSTM neural networks, which are the branch of ML, require less time to train a model with high accuracy. This paper focuses on an LSTM network for identifying a person based on Bangla speech because the Bangla language has 50 alphabets and their pronunciation differs from other languages such as English and Chinese. We extracted features from Bangla Voice using MFCCs. Our proposed model’s performance is measured using the K-fold validation, accuracy, precision, recall, and F1 score. Experimental results of our proposed model achieved a high recognition accuracy of 99.98%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chakroborty, S., Saha, G.: Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on gaussian filter. Int. J. Signal Process. 5(1), 11–19 (2009)
Google Scholar
colah: Understanding LSTM networks (2015). http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Ertam, F.: An effective gender recognition approach using voice data via deeper LSTM networks. Appl. Acoust. 156, 351–358 (2019)
Article Google Scholar
Karatas, T., Hirsa, A.: Two-stage sector rotation methodology using machine learning and deep learning techniques. arXiv preprint arXiv:2108.02838 (2021)
Krishnamoorthy, P., Jayanna, H., Prasanna, S.M.: Speaker recognition under limited data condition by noise addition. Expert Syst. Appl. 38(10), 13487–13490 (2011)
Article Google Scholar
Livieris, I.E., Pintelas, E., Pintelas, P.: Gender recognition by voice using an improved self-labeled algorithm. Mach. Learn. Knowl. Extract. 1(1), 492–503 (2019)
Article Google Scholar
Lukic, Y., Vogt, C., Dürr, O., Stadelmann, T.: Speaker identification and clustering using convolutional neural networks. In: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2016)
Google Scholar
Olugbenga, T.O.: Deep learning techniques for electrical load forecasting. Ph.D. thesis, University of New Brunswick (2022)
Google Scholar
Pondhu, L.N., Kummari, G.: Performance analysis of machine learning algorithms for gender classification. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 1626–1628. IEEE (2018)
Google Scholar
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Article Google Scholar
Saeidi, R., et al.: Signal-to-signal ratio independent speaker identification for co-channel speech signals. In: 2010 20th International Conference on Pattern Recognition, pp. 4565–4568. IEEE (2010)
Google Scholar
Shahin, I.: Speaker identification in emotional environments (2009)
Google Scholar
Sharma, G., Umapathy, K., Krishnan, S.: Trends in audio signal feature extraction methods. Appl. Acoust. 158, 107020 (2020)
Google Scholar
Shewalkar, A.: Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. J. Artif. Intell. Soft Comput. Res. 9(4), 235–245 (2019)
Article Google Scholar
Stamp, M., Alazab, M., Shalaginov, A.: Malware Analysis Using Artificial Intelligence and Deep Learning. Springer, Heidelberg (2021). https://doi.org/10.1007/978-3-030-62582-5
Book Google Scholar
Tandel, N.H., Prajapati, H.B., Dabhi, V.K.: Voice recognition and voice comparison using machine learning techniques: a survey. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 459–465. IEEE (2020)
Google Scholar
Ye, F., Yang, J.: A deep neural network model for speaker identification. Appl. Sci. 11(8), 3603 (2021)
Article Google Scholar
Zhao, Y., Miao, R.: Network media public opinion and social governance supported by the internet-of-things big data. Secur. Commun. Netw. 2022 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Dhaka University of Engineering and Technology, Gazipur, 1707, Bangladesh
Rahad Khan, Saddam Hossain, Akbor Hossain, Fazlul Hasan Siddiqui & Sabah Binte Noor

Authors

Rahad Khan
View author publications
You can also search for this author in PubMed Google Scholar
Saddam Hossain
View author publications
You can also search for this author in PubMed Google Scholar
Akbor Hossain
View author publications
You can also search for this author in PubMed Google Scholar
Fazlul Hasan Siddiqui
View author publications
You can also search for this author in PubMed Google Scholar
Sabah Binte Noor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rahad Khan .

Editor information

Editors and Affiliations

Noakhali Science and Technology University, Noakhali, Bangladesh
Md. Shahriare Satu
The University of Queensland, St. Lucia, QLD, Australia
Mohammad Ali Moni
Jahangirnagar University, Dhaka, Bangladesh
M. Shamim Kaiser
Daffodil International University, Dhaka, Bangladesh
Mohammad Shamsul Arefin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khan, R., Hossain, S., Hossain, A., Siddiqui, F.H., Noor, S.B. (2023). Bangla Speech-Based Person Identification Using LSTM Networks. In: Satu, M.S., Moni, M.A., Kaiser, M.S., Arefin, M.S. (eds) Machine Intelligence and Emerging Technologies. MIET 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 490. Springer, Cham. https://doi.org/10.1007/978-3-031-34619-4_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-34619-4_29
Published: 11 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34618-7
Online ISBN: 978-3-031-34619-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics