Abstract
In this paper we investigated the role of the frame length on the computation of MFCC acoustic parameters in a text-dependent speaker recognition system. Since the vocal characteristics of subjects may vary along the time, the related information conveyed by the MFCCs usually cause a significant degradation on recognition performance. In our ex- periment we tested the use of different frame lengths for the features extraction in the training and the recognition phases for a set of speakers whose speech productions spanned over 3 months. Results show that a suitable choice of the frame lengths combination for training and testing phases can improve the recognition performance reducing the false rejection rate. An expert system driven to look for the best combination of frame lengths in order to obtain the maximum performance level of the HHM engine may help in decreasing the amount of false rejections.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Doddington, G.R.: Speaker Recognition-Identifying People by their Voices. Proceedings of IEEE 73(11), 1651–1664 (1985)
Mammone, R.J., Zhang, X., Ramachandran, R.P.: Robust Speaker Recognition, A Feature- based Approach. IEEE Signal Processing Magazine, 58–71 (1996)
Furui, S.: Digital Speech Frocessing, Synthesis, and Recognition. Marcel Dekker, New York (1989)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 19–41 (2000)
Rabiner, L.R., Juang, B.H.: An Introduction to Hidden Markov Models. IEEE ASSF Magazine 3(1), 4–16 (1986)
Zilca, R.D., Kingsbury, B., Ramaswamy, G.N.: Pseudo Pitch Synchronous Analysis of Speech With Applications to Speaker Recognition. IEEE Transactions on Audio, Speech, and Language Processing 14(2) (2006)
Impedovo, D., Refice, M.: Modular Engineering Prototyping Plan for Speech Recognition in a Visual Object Oriented Environment. Information Science and Applications 2(12), 2228–2234 (2005)
Impedovo, D., Refice, M.: A Fast Prototyping System for Speech Recognition based on a Visual Object Oriented Environment. In: Proceedings of 5th ISCGAV (2005)
Quatieri, T.F., Dunn, R.B., Reynolds, D.A.: On the influence of Rate, Pitch, and Spectrum on Automatic Speaker Recognition Performance. In: Proceedings of ICSLP 2000 (2000)
Kim, S., Eriksson, T., Kang, H.G., Youn, D.H.: A pitch synchronous feature extraction method for speaker recognition. In: Proceedings of ICASSP 2004, pp. II-405 – II-408 (2004)
Sae-Tang, S., Tanprasert, C.: Feature Windowing-Based for Thai Text-Dependent Speaker Identification Using MLP with Backpropagation Algorithm. In: Proceedings of ISCAS 2000 (2000)
Liu, J., Zheng, T.F., Wu, W.: P itch Mean Based Frequency Warping. In: Proceedings of ISCSLP 2006, pp. 87–94 (2006)
Zilca, R.D., Navratil, J., Ramaswamy, G.N.: Depitch and the role of fundamental frequency in speaker recognition. In: Proceedings of ICASSP 2003, pp. II-81 – II-84 (2003)
Impedovo, D., Refice, M.: The Influence of Frame Length on Speaker Identification Performance. In: Proceedings of IAS 2007, Manchester (2007)
Young, S.J.: HTK, Hidden Markov model toolkit V1.4, Technical report. Cambridge University, Speech Group
Rabiner, L.R., Schafer, R.: Digital Frocessing of Speech Signals, ISBN: 0132136031
Parsons, T.: Voice and Speech Frocessing. McGraw-Hill, New York (1987)
Hoppenheim, A.V., Schafer, R.W.: Homomorphic Analysis of Speech. IEEE Transaction On Audio and Electroacustics, vol. AU-16(2), pp. 221–226
Deller, J., Hansen, J., Proakis, J.: Discrete-Time Frocessing of Speech Signals. IEEE Press Classic Reissue (1999) ISBN: 0780353862
Wan, V., Renals, S.: Speaker Verification Using Sequence Discriminant Support Vector Machines. IEEE Transaction on Speech and Audio Processing 13(2) (March 2005)
Baum, L., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 14, 164–171 (1970)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society 39(1), 1–38 (1977)
Nolan, F.: Dynamic Variability in Speech (DyViS). A forensic phonetic study on British English, http://www.ling.cam.ac.uk/dyvis/
Espy-Wilson, C.Y., Manocha, S., Vishnubhotla, S.: A new set of features for text-independent speaker ide ntification. In: Proceedings of ICSLP 2006, pp. 1475–1478 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Impedovo, D., Refice, M. (2009). Improving Text-Dependent Speaker Recognition Performance. In: Koutsojannis, C., Sirmakessis, S. (eds) Tools and Applications with Artificial Intelligence. Studies in Computational Intelligence, vol 166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88069-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-88069-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88068-4
Online ISBN: 978-3-540-88069-1
eBook Packages: EngineeringEngineering (R0)