Skip to main content

Bangla Speech-Based Person Identification Using LSTM Networks

  • Conference paper
  • First Online:
Machine Intelligence and Emerging Technologies (MIET 2022)

Abstract

The Iris, face, fingers, conduct, voice, and other things of the human body are employed for security and identification since they are unique in each person. Biometric systems are popular and widely used in Bangladesh to identify people, e.g. in cybercrime. Apart from biometrics, a person can be identified by their voice. Since each person’s speech has a distinct timbre, vocal pattern, and frequency spectrogram. A human can easily identify the voice of a known person, but it is difficult for a machine. As a result, researchers are interested in processing human voices and recognizing them by machines. To predict the human voice, various traditional machine learning models such as GMM, HMM, SVM, and MLP are used. Voice data is a complex time-series signal and massive datasets are required to train ML models. As a result, traditional ML has low accuracy and takes a long time to train. In contrast, LSTM neural networks, which are the branch of ML, require less time to train a model with high accuracy. This paper focuses on an LSTM network for identifying a person based on Bangla speech because the Bangla language has 50 alphabets and their pronunciation differs from other languages such as English and Chinese. We extracted features from Bangla Voice using MFCCs. Our proposed model’s performance is measured using the K-fold validation, accuracy, precision, recall, and F1 score. Experimental results of our proposed model achieved a high recognition accuracy of 99.98%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chakroborty, S., Saha, G.: Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on gaussian filter. Int. J. Signal Process. 5(1), 11–19 (2009)

    Google Scholar 

  2. colah: Understanding LSTM networks (2015). http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  3. Ertam, F.: An effective gender recognition approach using voice data via deeper LSTM networks. Appl. Acoust. 156, 351–358 (2019)

    Article  Google Scholar 

  4. Karatas, T., Hirsa, A.: Two-stage sector rotation methodology using machine learning and deep learning techniques. arXiv preprint arXiv:2108.02838 (2021)

  5. Krishnamoorthy, P., Jayanna, H., Prasanna, S.M.: Speaker recognition under limited data condition by noise addition. Expert Syst. Appl. 38(10), 13487–13490 (2011)

    Article  Google Scholar 

  6. Livieris, I.E., Pintelas, E., Pintelas, P.: Gender recognition by voice using an improved self-labeled algorithm. Mach. Learn. Knowl. Extract. 1(1), 492–503 (2019)

    Article  Google Scholar 

  7. Lukic, Y., Vogt, C., Dürr, O., Stadelmann, T.: Speaker identification and clustering using convolutional neural networks. In: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2016)

    Google Scholar 

  8. Olugbenga, T.O.: Deep learning techniques for electrical load forecasting. Ph.D. thesis, University of New Brunswick (2022)

    Google Scholar 

  9. Pondhu, L.N., Kummari, G.: Performance analysis of machine learning algorithms for gender classification. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 1626–1628. IEEE (2018)

    Google Scholar 

  10. Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)

    Article  Google Scholar 

  11. Saeidi, R., et al.: Signal-to-signal ratio independent speaker identification for co-channel speech signals. In: 2010 20th International Conference on Pattern Recognition, pp. 4565–4568. IEEE (2010)

    Google Scholar 

  12. Shahin, I.: Speaker identification in emotional environments (2009)

    Google Scholar 

  13. Sharma, G., Umapathy, K., Krishnan, S.: Trends in audio signal feature extraction methods. Appl. Acoust. 158, 107020 (2020)

    Google Scholar 

  14. Shewalkar, A.: Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. J. Artif. Intell. Soft Comput. Res. 9(4), 235–245 (2019)

    Article  Google Scholar 

  15. Stamp, M., Alazab, M., Shalaginov, A.: Malware Analysis Using Artificial Intelligence and Deep Learning. Springer, Heidelberg (2021). https://doi.org/10.1007/978-3-030-62582-5

    Book  Google Scholar 

  16. Tandel, N.H., Prajapati, H.B., Dabhi, V.K.: Voice recognition and voice comparison using machine learning techniques: a survey. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 459–465. IEEE (2020)

    Google Scholar 

  17. Ye, F., Yang, J.: A deep neural network model for speaker identification. Appl. Sci. 11(8), 3603 (2021)

    Article  Google Scholar 

  18. Zhao, Y., Miao, R.: Network media public opinion and social governance supported by the internet-of-things big data. Secur. Commun. Netw. 2022 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahad Khan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khan, R., Hossain, S., Hossain, A., Siddiqui, F.H., Noor, S.B. (2023). Bangla Speech-Based Person Identification Using LSTM Networks. In: Satu, M.S., Moni, M.A., Kaiser, M.S., Arefin, M.S. (eds) Machine Intelligence and Emerging Technologies. MIET 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 490. Springer, Cham. https://doi.org/10.1007/978-3-031-34619-4_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34619-4_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34618-7

  • Online ISBN: 978-3-031-34619-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics