Skip to main content
Log in

Cross B-HUB Based RNN with Random Aural-Feature Extraction for Enhanced Speaker Extraction and Speaker Recognition

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Speaker recognition is the identification of a person from characteristics of voices hence various robust training and classification algorithms for automatic voice recognition has been presented previously but they had performance degradation due to contaminated acoustic noise in the actual world. Hence, a novel Random-Aural Feature Extraction had presented to extract robust acoustic features from voice signals in which Discrete Cosine Transform (DCT) and Discrete Fourier Transform is applied to the log of the filter bank output in order to amplify stochastic acoustic characteristics and improve feature extraction accuracy. But while performing optimization for feature selection and classification, the issue of long-term dependency develops. Therefore, the Cross B-HUB based RNN network is proposed which creates the opposition based initialization and reduce computational overhead by the Cross B-HUB algorithm. Then, the classification is done by RNN based classification that performs verification and identification of the speaker with eliminating the issues in long term dependencies. The proposed approach has been found to have the highest sensitivity, specificity, and accuracy of 97, 99, and 97%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Availability of Data and Material

Not applicable.

Code Availability

Not applicable.

References

  1. Zulfiqar, Ali, & Aslam Muhammad, A. M. (2009). Martinez Enriquez, A speaker identification system using MFCC features with VQ technique. In 2009 Third International Symposium on Intelligent Information Technology Application IEEE, 3, 115–118.

  2. Kumar, C., ur Rehman, F., Kumar, S., Mehmood, A., & Shabir, G. (2018). Analysis of MFCC and BFCC in a speaker identification system. In 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET) IEEE, 1–5.

  3. Lukic, Y., Vogt, C., Dürr, O., & Stadelmann, T. (2016). Speaker identification and clustering using convolutional neural networks. In 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP) IEEE, 1–6.

  4. Mamun, Nursadul, Ria Ghosh & John Hansen, H. L. (2019). Quantifying cochlear implant users' ability for speaker identification using CI Auditory Stimuli. arXiv preprint arXiv:1908.00031.

  5. Fan, Lei, Qing-Yuan Jiang, Ya-Qi Yu & Wu-Jun Li. (2019). Deep Hashing for Speaker Identification and Retrieval. In Proceeding Interspeech,(pp. 2908–2912).

  6. Lesso, J. P. (2019). Inventor; Cirrus logic international semiconductor Ltd, assignee. speaker identification. United States patent application US 15/877,(pp. 660).

  7. Amsterdam, J. D., Baughman, A. K., Hammer, S. C. & Marzorati, M. (2019). International business machines corp. Speaker identification assisted by categorical cues. U.S. Patent 10(431), 225.

  8. Jenhi, Mariame, Ahmed Roukhe & Laamari Hlou. (2018). Analysis of speaker’s voice in cepstral domain using MFCC based feature extraction and VQ technique for speaker identification system. In  International conference on advanced intelligent systems for sustainable development springer, Cham, 857–868.

  9. Khoury, E., Lakhdhar, K., Vaughan, A., Sivaraman, G. & Nagarsheth, P. (2019). Pindrop labs’ submission to the first multi-target speaker detection and identification challenge. In Proceeding Interspeech, 1502–1505.

  10. Yadav, S. & Rai, A. (2018). Learning discriminative features for speaker identification and verification. In Interspeech,(PP.2237–2241).

  11. Hansen, J., & H. L. & Taufiq Hasan. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal processing magazine, 32(6), 74–99.

    Article  Google Scholar 

  12. Oshaughnessy, D. (2015). Speaker recognition. IEEE ASSP Magazine, 3, 4–17.

    Article  Google Scholar 

  13. Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007). Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1435–1447.

    Article  Google Scholar 

  14. Matsui, T. & Furui, S. (1993). Concatenated phoneme models for text-variable speaker recognition. In 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2,(pp. 391–394).

  15. Solomonoff, Alex, William Campbell, M. & Ian Boardman. (2005). Advances in channel compensation for SVM speaker recognition. In  Proceedings. (ICASSP'05) IEEE international conference on acoustics, speech, and signal processing IEEE, 1,(PP. I-629).

  16. Soong, F. K., & Rosenberg, A. E. (1988). On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(6), 871–879.

    Article  MATH  Google Scholar 

  17. Bonastre, J. F., Wils, F. & Meignier, S. (2005). a free toolkit for speaker recognition. In Proceedings. (ICASSP'05). In IEEE International Conference on Acoustics, Speech, and Signal Processing IEEE, 1, (PP. I-737).

  18. McLaren, M., Lei, Y., Scheffer, N. & Ferrer, L. (2014). Application of convolutional neural networks to speaker recognition in noisy conditions. In fifteenth annual conference of the international speech communication association.

  19. Nicolson, Aaron & Kuldip Paliwal, K. (2019). Sum-product networks for robust automatic speaker recognition. arXiv preprint arXiv:1910.11969.

  20. Safavi, S., Najafian, M., Hanani, A., Russell, M. J., Jancovic, P. & Carey, M. J. (2016). Speaker recognition for children's speech. arXiv preprint arXiv:1609.07498.

  21. Sadjadi, Seyed Omid, Jason Pelecanos & Sriram Ganapathy. (2016). The ibm speaker recognition system: Recent advances and error analysis. arXiv preprint arXiv:1605.01635.

  22. Drygajlo, Andrzej, Michael Jessen, Stefan Gfroerer, Isolde Wagner, Jos Vermeulen & Tuija Niemi. (2016). Methodological guidelines for best practice in forensic semiautomatic and automatic speaker recognition. Verlag für Polizeiwissenschaft.

  23. Stafylakis, T., Alam, M. J., & Kenny, P. (2016). Text-dependent speaker recognition with random digit strings. IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 24(7), (PP. 1194–1203).

    Article  Google Scholar 

  24. Tang, Zhiyuan, Lantian Li & Dong Wang. (2016). Multi-task recurrent model for speech and speaker recognition. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), (PP. 1–4).

  25. Gupta, M., Bharti, S. S., & Agarwal, S. (2019). Gender-based speaker recognition from speech signals using GMM model. Modern Physics Letters B, 33(35), 1950438.

    Article  MathSciNet  Google Scholar 

  26. Snyder, David, Daniel Garcia-Romero, Gregory Sell, Daniel Povey & Sanjeev Khudanpur. (2018). X-vectors: Robust dnn embeddings for speaker recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),(pp. 5329–5333).

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. S. Subhashini Pedalanka.

Ethics declarations

Conflicts of interest

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pedalanka, P.S.S., Ram, M.S.S. & Rao, D.S. Cross B-HUB Based RNN with Random Aural-Feature Extraction for Enhanced Speaker Extraction and Speaker Recognition. Wireless Pers Commun 129, 2239–2268 (2023). https://doi.org/10.1007/s11277-022-10096-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-022-10096-3

Keywords