Skip to main content

Human Speaker Recognition Based Database Method

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2020)

Abstract

During the last few years, many attempts were accomplished in the field of human sound and speech processing aiming to build speakers identification systems. The basic views of these systems were different, but the accuracy of the final computer process result for the identification depended on varieties of factors. For the intent of human sound extractions several methods in both the time domain and the frequency domain are used. Popular Linear Prediction Encoding (LPC) is used to parameterize voices, and to be used later in voiced/unvoiced separate extraction method functions. In comparison, direct classical methods are used for the extraction of human sound characteristics. Human voices are so much time varying that one recorded voice signal of a short time can never convey to distinguish speaker identification almost (100%).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nagrani, A., Chung, J.S., Xie, W., Zisserman, A.: Voxceleb: large-scale speaker verification in the wild. Comput. Speech Lang. 60, 101027 (2020)

    Article  Google Scholar 

  2. Bachu, R.G., et al.: Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In: American Society for Engineering Education (ASEE) Zone Conference Proceedings (2008)

    Google Scholar 

  3. Antoniou, A.: Digital Signal Processing. McGraw-Hill, New York (2016)

    Google Scholar 

  4. Childers, D.G.: Speech Processing and Synthesis Toolboxes. Tsinghua University Press, Beijing (2004)

    Google Scholar 

  5. Graves, A., Abdel-Rahman, M., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE (2013)

    Google Scholar 

  6. Eray, O., Tokat, S., Iplikci, S.: An application of speech recognition with support vector machines. In: IEEE 2018 6th International Symposium on Digital Forensic and Security (ISDFS) (2018)

    Google Scholar 

  7. Jukic, A., van Waterschoot, T., Gerkmann, T., Doclo, S.: Multi-channel linear prediction-based speech dereverberation with sparse priors. IEEE/ACM Trans. Audio, Speech Lang. Process. 23(9), 1509–1520 (2015). https://doi.org/10.1109/TASLP.2015.2438549

    Article  Google Scholar 

  8. Subramanian, A.S., Wang, X., Baskar, M.K., Watanabe, S., Taniguchi, T., Tran, D., Fujita, Y.: Speech enhancement using end-to-end speech recognition objectives. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 234–238. IEEE, October 2019

    Google Scholar 

  9. Juvela, L., Bollepalli, B., Yamagishi, J., Alku, P.: GELP: GAN-excited linear prediction for speech synthesis from mel-spectrogram (2019). arXiv preprint arXiv:1904.03976

  10. Singhal, S., Passricha, V., Sharma, P., Aggarwal, R.K.: Multi-level region-of-interest CNNs for end to end speech recognition. J. Amb. Intell. Hum. Comput. 10(11), 4615–4624 (2019). https://doi.org/10.1007/s12652-018-1146-z

    Article  Google Scholar 

  11. Wayman, J., Jain, A., Maltoni, D., Maio, D.: An introduction to biometric authentication systems. In: Wayman, J., Jain, A., Maltoni, D., Maio, D. (eds.) Biometric Systems, pp. 1–20. Springer-Verlag, London (2005). https://doi.org/10.1007/1-84628-064-8_1

    Chapter  Google Scholar 

  12. Price, M., Glass, J., Chandrakasan, A.: A low-power speech recognizer and voice activity detector using deep neural networks. IEEE J. Solid-State Circ. 53(1), 66–75 (2018)

    Article  Google Scholar 

  13. Sharifi, M., Moreno, I.L., Schmidt, L.: Speaker identification. U.S. Patent 10,565,996, issued February 18, 2020

    Google Scholar 

  14. Al-Shamma, O., Fadhel, M.A., Hasan, H.S.: Employing FPGA accelerator in real-time speaker identification systems. In: Recent Trends in Signal and Image Processing, pp. 125–134. Springer, Singapore (2019)

    Google Scholar 

  15. Alzubaidi, L., Fadhel, M.A., Al-Shamma, O., Zhang, J., Duan, Y.: Deep learning models for classification of red blood cells in microscopy images to aid in sickle cell anemia diagnosis. Electronics 9(3), 427 (2020)

    Article  Google Scholar 

  16. Hasan, R.I., Yusuf, S.M., Alzubaidi, L.: Review of the state of the art of deep learning for plant diseases: a broad analysis and discussion. Plants 9(10), 1302 (2020)

    Article  Google Scholar 

  17. Alzubaidi, L., Al-Shamma, O., Fadhel, M.A., Farhan, L., Zhang, J., Duan, Y.: Optimizing the performance of breast cancer classification by employing the same domain transfer learning from hybrid deep convolutional neural network model. Electronics 9(3), 445 (2020)

    Article  Google Scholar 

  18. Alzubaidi, L., Fadhel, M.A., Al-Shamma, O., Zhang, J., Santamaría, J., Duan, Y., Oleiwi, S.R.: Towards a better understanding of transfer learning for medical imaging: a case study. Appl. Sci. 10(13), 4523 (2020)

    Article  Google Scholar 

  19. Al-Shamma, O., Fadhel, M.A., Hameed, R.A., Alzubaidi, L., Zhang, J.: Boosting convolutional neural networks performance based on FPGA accelerator, December 2018

    Google Scholar 

  20. Fadhel, M.A., Al-Shamma, O., Oleiwi, S.R., Taher, B.H., Alzubaidi, L.: Real-time PCG diagnosis using FPGA. In: International Conference on Intelligent Systems Design and Applications, pp. 518–529. Springer, Cham, December 2018

    Google Scholar 

  21. Alzubaidi, L., Zhang, J., Humaidi, A.J., et al.: Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8, 53 (2021). https://doi.org/10.1186/s40537-021-00444-8

    Article  Google Scholar 

  22. Alzubaidi, L., Al-Amidie, M., Al-Asadi, A., Humaidi, A.J., Al-Shamma, O., Fadhel, M.A., Zhang, J., Santamaría, J., Duan, Y.: Novel Transfer LearningApproach for Medical Imaging with Limited Labeled Data. Cancers 13, 1590 (2021). https://doi.org/10.3390/cancers13071590

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed A. Fadhel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hatem, A.S., Adulredhi, M.J., Abdulrahman, A.M., Fadhel, M.A. (2021). Human Speaker Recognition Based Database Method. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds) Intelligent Systems Design and Applications. ISDA 2020. Advances in Intelligent Systems and Computing, vol 1351. Springer, Cham. https://doi.org/10.1007/978-3-030-71187-0_106

Download citation

Publish with us

Policies and ethics