Skip to main content

A Feature Level Fusion Scheme for Robust Speaker Identification

  • Conference paper
  • First Online:
Big Data, Cloud and Applications (BDCA 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 872))

Included in the following conference series:

Abstract

For speaker identification purposes, features are first extracted and then compared with those of the training set to find the closest match. So, finding effective and robust features for classifying speakers is beneficial to improve the overall identification performance, especially in the presence of noise. In this paper, a new method of feature extraction based on feature fusion is proposed, where Gammatone Frequency Cepstral Coefficients (GFCC) and wavelet components are extracted and fused for training and testing the Support Vector Machines (SVM) classifier. The performance of the proposed scheme is validated and compared with conventional GFCC using clean and noise corrupted signals from Voxforge database. From the experimental results, it is evident that our algorithm has a higher identification accuracy compared to baseline GFCC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Reynolds, D.A.: An overview of automatic speaker recognition technology. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. IV-4072–IV-4075, May 2002

    Google Scholar 

  2. Faundez-Zanuy, M., Monte-Moreno, E.: State-of-the-art in speaker recognition. IEEE Aerosp. Electron. Syst. Mag. 20(5), 7–12 (2005)

    Article  Google Scholar 

  3. Gish, H., Schmidt, M.: Text-independent speaker identification. IEEE Sig. Process. Mag. 11(4), 18–32 (1994)

    Article  Google Scholar 

  4. Rao, K.S., Sarkar, S.: Robust Speaker Recognition in Noisy Environments. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07130-5

    Book  Google Scholar 

  5. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  6. Prasad, A., Periyasamy, V., Ghosh, P.K.: Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4265–4269, April 2015

    Google Scholar 

  7. Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Robust speaker identification in a meeting with short audio segments. In: Czarnowski, I., Caballero, A.M., Howlett, R.J., Jain, L.C. (eds.) Intelligent Decision Technologies 2016. SIST, vol. 57, pp. 465–477. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39627-9_41

    Chapter  Google Scholar 

  8. Zhao, X., Wang, D.: Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7204–7208, May 2013

    Google Scholar 

  9. Sekkate, S., Khalil, M., Adib, A.: Speaker identification: a way to reduce call-sign confusion events. In: 2017 International Conference on Advanced Technologies for Signal & Image Processing, May 2017

    Google Scholar 

  10. Sadjadi, S., Hansen, J.: Mean hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72(6), 138–148 (2015)

    Article  Google Scholar 

  11. Shao, Y., Srinivasan, S., Wang, D.: Incorporating auditory feature uncertainties in robust speaker identification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2007, Honolulu, Hawaii, USA, 15–20 April, pp. 277–280 (2007)

    Google Scholar 

  12. Wang, J., Johnson, M.T.: Physiologically-motivated feature extraction for speaker identification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1690–1694, May 2014

    Google Scholar 

  13. Wan, V., Campbell, W.M.: Support vector machines for speaker verification and identification. In: Proceedings of the 2000 IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing X (Cat. No. 00TH8501), vol. 2, pp. 775–784 (2000)

    Google Scholar 

  14. Markov, K., Nakagawa, S.: Integrating pitch and LPC-residual information with LPC-cepstrum for text-independent speaker recognition. J. Acoust. Soc. Jpn. 20(01), 281–291 (1999)

    Article  Google Scholar 

  15. Nakagawa, S., Wang, L., Ohtsuka, S.: Speaker identification and verification by combining MFCC and phase information. IEEE Trans. Audio Speech Lang. Process. 20(4), 1085–1095 (2012)

    Article  Google Scholar 

  16. Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker identification by combining MFCC and phase information in noisy environments. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4502–4505, March 2010

    Google Scholar 

  17. Itou, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T., Shikano, K., Itahashi, S.: JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. J. Acoust. Soc. Jpn. (E) 20(3), 199–206 (1999)

    Article  Google Scholar 

  18. Sarangi, S.K., Saha, G.: A novel approach in feature level for robust text-independent speaker identification system. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), pp. 1–5, Dec 2012

    Google Scholar 

  19. Sadjadi, S.O., Hansen, J.H.L.: Robust front-end processing for speaker identification over extremely degraded communication channels. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7214–7218, May 2013

    Google Scholar 

  20. Verma, G.K.: Multi-feature fusion for closed set text independent speaker identification. In: Dua, S., Sahni, S., Goyal, D.P. (eds.) ICISTM 2011. CCIS, vol. 141, pp. 170–179. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19423-8_18

    Chapter  Google Scholar 

  21. Kawakami, Y., Wang, L., Kai, A., Nakagawa, S.: Speaker identification by combining various vocal tract and vocal source features. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 382–389. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10816-2_46

    Chapter  Google Scholar 

  22. Holschneider, M., Kronland-Martinet, R., Morlet, J., Tchamitchian, P.: A real-time algorithm for signal analysis with the help of the wavelet transform. In: Combes, J.M., Grossmann, A., Tchamitchian, P. (eds.) Wavelets, pp. 289–297. Springer, Heidelberg (1990). https://doi.org/10.1007/978-3-642-75988-8_28

    Chapter  Google Scholar 

  23. Walker, J.S.: A Primer on Wavelets and Their Scientific Applications. CRC Press, Boca Raton (2008)

    Book  Google Scholar 

  24. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 1992, New York. ACM, pp. 144–152 (1992)

    Google Scholar 

  25. Vapnik, V.N.: Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York (1998)

    MATH  Google Scholar 

  26. Kressel, U.H.G.: Advances in Kernel Methods, pp. 255–268. MIT Press, Cambridge (1999)

    Google Scholar 

  27. Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)

    Article  Google Scholar 

  28. Voxforge database. Technical report

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sara Sekkate .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sekkate, S., Khalil, M., Adib, A. (2018). A Feature Level Fusion Scheme for Robust Speaker Identification. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96292-4_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96291-7

  • Online ISBN: 978-3-319-96292-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics