Abstract
Recent advances in the field of speaker recognition have proved to highly outperform algorithms. However this performance degrades when limited data are presented. This paper presents examples on how Support Vector Machines (SVM) can improve speaker recognition for short utterance data duration. The main contribution in this approach is the use of new vectors when training and testing data are limited. We show how different kernels function of SVM can be used to validate the new approach with different speakers from different databases.
R. Chakroun---No academic titles or descriptions of academic positions should be included in the addresses. The affiliations should consist of the author’s institution, town, and country.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jain, A., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans. Circ. Syst. Video Technol. 14(1), 4–20 (2004)
Reynolds, D.: An overview of automatic speaker recognition technology. In: Proceedings of IEEE International Conference Acoustics Speech Signal Processing (ICASSP), vol. 4, pp. 4072–4075 (2002)
Togneri, R., Pullella, D.: An overview of speaker identification: accuracy and robustness issues. In: IEEE Circuits And Systems Magazine, vol. 11, no. 2, pp. 23–61 (2011) ISSN: 1531-636X
Snyder, D., Ghahremani, P., Povey, D., Garcia-Romero, D., Carmiel, Y., Khudanpur, S.: Deep neural network-based speaker embeddings for end-to-end speaker verification. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 165–170. IEEE (December 2016)
Zhang, S.X, Chen, Z., Zhao, Y., Li, J., Gong, Y.: End-to-end attention based text-dependent speaker verification. arXiv preprint arXiv:1701.00562 (2017)
Variani, E., Lei, X., McDermott, E., Moreno, I.L., Gonzalez-Dominguez, J.: Deep neural networks for small footprint textdependent speaker verification. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4052–4056. IEEE (2014)
Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-endtext-dependent speaker verification. In: 2016 IEEE international conference on Acoustics, speech and signal processing (ICASSP), pp 5115–5119. IEEE (2016)
Cortes, C., Vapnick, V.: Support vector networks. Mach. Learn. 20, 1–25 (1995)
Kamppari, S.O., Hazen, T. J.: Word and phone level acoustic confidence scoring. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (2000)
Reynolds, D.A., Quatieri, T.F., Dunn, R.: Speaker verification using adapted gaussian mixture models. Digital Signal Process. 10(1–3), 19–41 (2000)
Keshet, J., Bengio, S.: Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods. Wiley, Hoboken (2009)
Louradour, J., Daoudi, K., Bach, F.: Feature space mahalanobis sequence kernels: application to svm speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(8), 2465–2475 (2007)
Campbell, W.M.: Generalized linear discriminant sequence kernels for speaker recognition. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing. pp. 161–164 (2002)
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machine using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)
Chung, J.S., Nagrani, A., Zisserman, A.: Voxceleb2: deepspeaker recognition. In: Proceedings of Interspeech 2018, pp. 1086–1090 (2018)
Reynolds, D.A.: Automatic speaker recognition using gaussian mixture speaker models. Lincoln Lab. J. 8(2), 173–192 (1995)
Atal, B.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoust. Soc. Am. 55, 1304 (1974)
Jourani, R. Reconnaissance automatique du locuteur par des GMM à grande marge”, UT3 Paul Sabatier (2012)
Dehak, R., Dehak, N., Kenny, P., Dumouchel, P.: Linear and non linear kernel GMM supervector machines for speaker verification. In: Proceedings of Interspeech, Antwerp, Belgium, pp. 302–305 (2007)
Mammone, R., Zhang, X., Ramachandran, R.: Robust speaker recognition: a feature-based approach. IEEE Signal Process. Mag. 13(5), 58–71 (1996)
Pitsikalis, V., Maragos, P.: Some advances on speech analysis using generalized dimensions. In: ISCA Tutorial and Research Workshop on Non-Linear Speech Processing (NOLISP) (2003)
Poddar, A., Sahidullah, M., Saha, G.: Speaker verification with short utterances: a review of challenges, trends and opportunities. IET Biometrics 7(2), 91–101 (2017)
Chakroun, R., Frikha, M.: Robust features for text-independent speaker recognition with short utterances. Neural Comput. Appl. 32(17), 13863–13883 (2020). https://doi.org/10.1007/s00521-020-04793-y
Dehak, N., Karam, Z., Reynolds, D., Dehak, R., Campbell, W., Glass, J.: A channel-blind system for speaker verification. In: Proceedings of ICASSP, pp. 4536–4539, Prague, Czech Republic, May 2011
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Zhang, W.Q., Zhao, J., Zhang, W.L., et al.: Multi-scale kernels for short utterance speaker recognition. In: Proceedings of ISCSLP 2014, pp. 414–417
McLaren, M., Matrouf, D., Vogt, R., Bonastre, J.-F.: Applying svms and weight-based factor analysis to unsupervised adaptation for speaker verification. Comput. Speech Lang. 25(2), 327–340 (2011)
Rao, W., Mak, M.W.: Construction of discriminative kernels from known and unknown non-targets for PLDA-SVM scoring. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4012–4016. IEEE (2014 May)
Chakroun, R., Frikha, M.: New approach for short utterance speaker identification. IET Signal Process. 12(7), 873–880 (2018)
Chakroun, R., Frikha, M.: Efficient text-independent speaker recognition with short utterances in both clean and uncontrolled environments. Multimedia Tools Appl. 79, 21279–21298 (2020). https://doi.org/10.1007/s11042-020-08824-7
Kim, C., Stern, R.M.: Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1315–1329 (2016)
Nayana, P. K., Mathew, D., Thomas, A.: Performance comparison of speaker recognition systems using GMM and i-vector methods with PNCC and RASTA PLP features. In: 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), pp. 438–443. IEEE (2017 July)
Al-Kaltakchi, M.T., Woo, W.L., Dlay, S.S., Chambers, J.A.: Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification. In: 2016 4th International Conference on Biometrics and Forensics (IWBF), pp. 1–6. IEEE (March 2016)
Shi, X.Y., Jing, X.X., Zeng, M., Yang, H.Y.: Robust speaker recognition based on improved PNCC and i-vector. Comput. Eng. Des. 4, 42 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Chakroun, R., Frikha, M. (2020). A New Text Independent Speaker Recognition System with Short Utterances Using SVM. In: Themistocleous, M., Papadaki, M., Kamal, M.M. (eds) Information Systems. EMCIS 2020. Lecture Notes in Business Information Processing, vol 402. Springer, Cham. https://doi.org/10.1007/978-3-030-63396-7_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-63396-7_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63395-0
Online ISBN: 978-3-030-63396-7
eBook Packages: Computer ScienceComputer Science (R0)