A Feature Level Fusion Scheme for Robust Speaker Identification

Sekkate, Sara; Khalil, Mohammed; Adib, Abdellah

doi:10.1007/978-3-319-96292-4_23

Sara Sekkate¹²,
Mohammed Khalil¹² &
Abdellah Adib¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 872))

Included in the following conference series:

International Conference on Big Data, Cloud and Applications

1131 Accesses
3 Citations

Abstract

For speaker identification purposes, features are first extracted and then compared with those of the training set to find the closest match. So, finding effective and robust features for classifying speakers is beneficial to improve the overall identification performance, especially in the presence of noise. In this paper, a new method of feature extraction based on feature fusion is proposed, where Gammatone Frequency Cepstral Coefficients (GFCC) and wavelet components are extracted and fused for training and testing the Support Vector Machines (SVM) classifier. The performance of the proposed scheme is validated and compared with conventional GFCC using clean and noise corrupted signals from Voxforge database. From the experimental results, it is evident that our algorithm has a higher identification accuracy compared to baseline GFCC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Reynolds, D.A.: An overview of automatic speaker recognition technology. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. IV-4072–IV-4075, May 2002
Google Scholar
Faundez-Zanuy, M., Monte-Moreno, E.: State-of-the-art in speaker recognition. IEEE Aerosp. Electron. Syst. Mag. 20(5), 7–12 (2005)
Article Google Scholar
Gish, H., Schmidt, M.: Text-independent speaker identification. IEEE Sig. Process. Mag. 11(4), 18–32 (1994)
Article Google Scholar
Rao, K.S., Sarkar, S.: Robust Speaker Recognition in Noisy Environments. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07130-5
Book Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357–366 (1980)
Article Google Scholar
Prasad, A., Periyasamy, V., Ghosh, P.K.: Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4265–4269, April 2015
Google Scholar
Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Robust speaker identification in a meeting with short audio segments. In: Czarnowski, I., Caballero, A.M., Howlett, R.J., Jain, L.C. (eds.) Intelligent Decision Technologies 2016. SIST, vol. 57, pp. 465–477. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39627-9_41
Chapter Google Scholar
Zhao, X., Wang, D.: Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7204–7208, May 2013
Google Scholar
Sekkate, S., Khalil, M., Adib, A.: Speaker identification: a way to reduce call-sign confusion events. In: 2017 International Conference on Advanced Technologies for Signal & Image Processing, May 2017
Google Scholar
Sadjadi, S., Hansen, J.: Mean hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72(6), 138–148 (2015)
Article Google Scholar
Shao, Y., Srinivasan, S., Wang, D.: Incorporating auditory feature uncertainties in robust speaker identification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2007, Honolulu, Hawaii, USA, 15–20 April, pp. 277–280 (2007)
Google Scholar
Wang, J., Johnson, M.T.: Physiologically-motivated feature extraction for speaker identification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1690–1694, May 2014
Google Scholar
Wan, V., Campbell, W.M.: Support vector machines for speaker verification and identification. In: Proceedings of the 2000 IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing X (Cat. No. 00TH8501), vol. 2, pp. 775–784 (2000)
Google Scholar
Markov, K., Nakagawa, S.: Integrating pitch and LPC-residual information with LPC-cepstrum for text-independent speaker recognition. J. Acoust. Soc. Jpn. 20(01), 281–291 (1999)
Article Google Scholar
Nakagawa, S., Wang, L., Ohtsuka, S.: Speaker identification and verification by combining MFCC and phase information. IEEE Trans. Audio Speech Lang. Process. 20(4), 1085–1095 (2012)
Article Google Scholar
Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker identification by combining MFCC and phase information in noisy environments. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4502–4505, March 2010
Google Scholar
Itou, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T., Shikano, K., Itahashi, S.: JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. J. Acoust. Soc. Jpn. (E) 20(3), 199–206 (1999)
Article Google Scholar
Sarangi, S.K., Saha, G.: A novel approach in feature level for robust text-independent speaker identification system. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), pp. 1–5, Dec 2012
Google Scholar
Sadjadi, S.O., Hansen, J.H.L.: Robust front-end processing for speaker identification over extremely degraded communication channels. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7214–7218, May 2013
Google Scholar
Verma, G.K.: Multi-feature fusion for closed set text independent speaker identification. In: Dua, S., Sahni, S., Goyal, D.P. (eds.) ICISTM 2011. CCIS, vol. 141, pp. 170–179. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19423-8_18
Chapter Google Scholar
Kawakami, Y., Wang, L., Kai, A., Nakagawa, S.: Speaker identification by combining various vocal tract and vocal source features. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 382–389. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10816-2_46
Chapter Google Scholar
Holschneider, M., Kronland-Martinet, R., Morlet, J., Tchamitchian, P.: A real-time algorithm for signal analysis with the help of the wavelet transform. In: Combes, J.M., Grossmann, A., Tchamitchian, P. (eds.) Wavelets, pp. 289–297. Springer, Heidelberg (1990). https://doi.org/10.1007/978-3-642-75988-8_28
Chapter Google Scholar
Walker, J.S.: A Primer on Wavelets and Their Scientific Applications. CRC Press, Boca Raton (2008)
Book Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 1992, New York. ACM, pp. 144–152 (1992)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York (1998)
MATH Google Scholar
Kressel, U.H.G.: Advances in Kernel Methods, pp. 255–268. MIT Press, Cambridge (1999)
Google Scholar
Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)
Article Google Scholar
Voxforge database. Technical report
Google Scholar

Download references

Author information

Authors and Affiliations

Team Networks, Telecoms & Multimedia, LIM@II-FSTM, B.P. 146, 20650, Mohammedia, Morocco
Sara Sekkate, Mohammed Khalil & Abdellah Adib

Authors

Sara Sekkate
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Khalil
View author publications
You can also search for this author in PubMed Google Scholar
Abdellah Adib
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sara Sekkate .

Editor information

Editors and Affiliations

Abdelmalek Essaâdi University, Tétouan, Morocco
Youness Tabii
Abdelmalek Essaâdi University, Tétouan, Morocco
Mohamed Lazaar
Abdelmalek Essaâdi University, Tétouan, Morocco
Mohammed Al Achhab
Université Ibn-Tofail, Tétouan, Morocco
Nourddine Enneya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sekkate, S., Khalil, M., Adib, A. (2018). A Feature Level Fusion Scheme for Robust Speaker Identification. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-96292-4_23
Published: 14 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96291-7
Online ISBN: 978-3-319-96292-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics