ABSTRACT
For any system, its reliability and the cost of construction have always been two major determinants of whether it can be used daily. In the field of voiceprint recognition, people are often forced to choose between accuracy and convenience. This paper discusses the performance of two speaker verification models in different environment and whether it is possible to balance between the cost and the result. The Gaussian Mixture Model with universal background model (GMM-UBM) and deep-learning method are selected to represent two common approaches in speaker verification. Through comparison between the two models, we find that the deep-learning method is in greater need of large training datasets to function since it performs poorer than the GMM-UBM model while trained with the same dataset containing only a few samples, while both of these two methods reach nearly 100% accuracy if provided a large enough dataset to train the model. Meanwhile, despite the attempt to yield higher accuracy by configuring the setting of both models, it appears that excellent performance only occurs when large amounts of training data are given, and little noise is present.
- M. Jian and L. Yongmei, “An embedded voiceprint recognition system based on GMM,” 10th Int. Conf. Comput. Sci. Educ. ICCSE 2015, vol. 1, no. Iccse, pp. 38–41, 2015, doi: 10.1109/ICCSE.2015.7250214.Google ScholarCross Ref
- F. Bimbot , “A tutorial on text-independent speaker verification,” EURASIP J. Appl. Signal Processing, vol. 2004, no. 4, pp. 430–451, 2004, doi: 10.1155/S1110865704310024.Google ScholarDigital Library
- M. Slaney, “MSR Identity Toolbox,” 2013. .Google Scholar
- S. Furui, L. Deng, M. Gales, H. Ney, and K. Tokuda, “Fundamental technologies in modern speech recognition,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 16–17, 2012, doi: 10.1109/MSP.2012.2209906.Google ScholarCross Ref
- M. Xu, L. Y. Duan, J. Cai, L. T. Chia, C. Xu, and Q. Tian, “HMM-based audio keyword generation,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3333, pp. 566–574, 2004, doi: 10.1007/978-3-540-30543-9_71.Google ScholarDigital Library
- J. Zhang, “Realization and improvement algorithm of GMM - UBM model in voiceprint recognition,” Proc. 30th Chinese Control Decis. Conf. CCDC 2018, pp. 2989–2992, 2018, doi: 10.1109/CCDC.2018.8407636.Google ScholarCross Ref
- Y. Xue, L. Wang, L. Li, Z. Liu, and J. Liu, “Matlab-based intelligent voiceprint recognition system,” Proc. - 2016 6th Int. Conf. Instrum. Meas. Comput. Commun. Control. IMCCC 2016, pp. 303–306,Google Scholar
Recommendations
A Comparison of MFCC and LPCC with Deep Learning for Speaker Recognition
ICBDC '19: Proceedings of the 4th International Conference on Big Data and ComputingThe biological information includes a fingerprint, an iris, a face, a vein, a voice. Among them, since the voice is not touched directly, the psychological burden on the user at the time of input is small as compared with other biological information. ...
An Automatic Qari Recognition System
ACSAT '12: Proceedings of the 2012 International Conference on Advanced Computer Science Applications and TechnologiesIn this paper, we present an automatic Qari Recognition system based on text-independent speaker recognition technique using Mel-Frequency Cepstral Coefficients. Our test database of 200 samples consisted of recordings by 20 reciters and we achieved ...
In-Set/Out-of-Set Speaker Recognition Under Sparse Enrollment
In this paper, the problem of identifying in-set versus out-of-set speakers using extremely limited enrollment data is addressed. The recognition objective is to form a binary decision regarding an input speaker as being a legitimate member of a set of ...
Comments