Improving Text-Dependent Speaker Recognition Performance

Impedovo, Donato; Refice, Mario

doi:10.1007/978-3-540-88069-1_15

Donato Impedovo² &
Mario Refice²

Part of the book series: Studies in Computational Intelligence ((SCI,volume 166))

1421 Accesses

Abstract

In this paper we investigated the role of the frame length on the computation of MFCC acoustic parameters in a text-dependent speaker recognition system. Since the vocal characteristics of subjects may vary along the time, the related information conveyed by the MFCCs usually cause a significant degradation on recognition performance. In our ex- periment we tested the use of different frame lengths for the features extraction in the training and the recognition phases for a set of speakers whose speech productions spanned over 3 months. Results show that a suitable choice of the frame lengths combination for training and testing phases can improve the recognition performance reducing the false rejection rate. An expert system driven to look for the best combination of frame lengths in order to obtain the maximum performance level of the HHM engine may help in decreasing the amount of false rejections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Efficient Feature Fusion Technique for Text-Independent Speaker Identification and Verification

Robust Methods for Text-Dependent Speaker Verification

Article 03 May 2019

Robust features for text-independent speaker recognition with short utterances

Article 10 March 2020

References

Doddington, G.R.: Speaker Recognition-Identifying People by their Voices. Proceedings of IEEE 73(11), 1651–1664 (1985)
Article Google Scholar
Mammone, R.J., Zhang, X., Ramachandran, R.P.: Robust Speaker Recognition, A Feature- based Approach. IEEE Signal Processing Magazine, 58–71 (1996)
Google Scholar
Furui, S.: Digital Speech Frocessing, Synthesis, and Recognition. Marcel Dekker, New York (1989)
Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 19–41 (2000)
Article Google Scholar
Rabiner, L.R., Juang, B.H.: An Introduction to Hidden Markov Models. IEEE ASSF Magazine 3(1), 4–16 (1986)
Article Google Scholar
Zilca, R.D., Kingsbury, B., Ramaswamy, G.N.: Pseudo Pitch Synchronous Analysis of Speech With Applications to Speaker Recognition. IEEE Transactions on Audio, Speech, and Language Processing 14(2) (2006)
Google Scholar
Impedovo, D., Refice, M.: Modular Engineering Prototyping Plan for Speech Recognition in a Visual Object Oriented Environment. Information Science and Applications 2(12), 2228–2234 (2005)
Google Scholar
Impedovo, D., Refice, M.: A Fast Prototyping System for Speech Recognition based on a Visual Object Oriented Environment. In: Proceedings of 5th ISCGAV (2005)
Google Scholar
Quatieri, T.F., Dunn, R.B., Reynolds, D.A.: On the influence of Rate, Pitch, and Spectrum on Automatic Speaker Recognition Performance. In: Proceedings of ICSLP 2000 (2000)
Google Scholar
Kim, S., Eriksson, T., Kang, H.G., Youn, D.H.: A pitch synchronous feature extraction method for speaker recognition. In: Proceedings of ICASSP 2004, pp. II-405 – II-408 (2004)
Google Scholar
Sae-Tang, S., Tanprasert, C.: Feature Windowing-Based for Thai Text-Dependent Speaker Identification Using MLP with Backpropagation Algorithm. In: Proceedings of ISCAS 2000 (2000)
Google Scholar
Liu, J., Zheng, T.F., Wu, W.: P itch Mean Based Frequency Warping. In: Proceedings of ISCSLP 2006, pp. 87–94 (2006)
Google Scholar
Zilca, R.D., Navratil, J., Ramaswamy, G.N.: Depitch and the role of fundamental frequency in speaker recognition. In: Proceedings of ICASSP 2003, pp. II-81 – II-84 (2003)
Google Scholar
Impedovo, D., Refice, M.: The Influence of Frame Length on Speaker Identification Performance. In: Proceedings of IAS 2007, Manchester (2007)
Google Scholar
Young, S.J.: HTK, Hidden Markov model toolkit V1.4, Technical report. Cambridge University, Speech Group
Google Scholar
Rabiner, L.R., Schafer, R.: Digital Frocessing of Speech Signals, ISBN: 0132136031
Google Scholar
Parsons, T.: Voice and Speech Frocessing. McGraw-Hill, New York (1987)
Google Scholar
Hoppenheim, A.V., Schafer, R.W.: Homomorphic Analysis of Speech. IEEE Transaction On Audio and Electroacustics, vol. AU-16(2), pp. 221–226
Google Scholar
Deller, J., Hansen, J., Proakis, J.: Discrete-Time Frocessing of Speech Signals. IEEE Press Classic Reissue (1999) ISBN: 0780353862
Google Scholar
Wan, V., Renals, S.: Speaker Verification Using Sequence Discriminant Support Vector Machines. IEEE Transaction on Speech and Audio Processing 13(2) (March 2005)
Google Scholar
Baum, L., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 14, 164–171 (1970)
Article MathSciNet Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Nolan, F.: Dynamic Variability in Speech (DyViS). A forensic phonetic study on British English, http://www.ling.cam.ac.uk/dyvis/
Espy-Wilson, C.Y., Manocha, S., Vishnubhotla, S.: A new set of features for text-independent speaker ide ntification. In: Proceedings of ICSLP 2006, pp. 1475–1478 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, via Orabona 4, 70123, Bari, Italy
Donato Impedovo & Mario Refice

Authors

Donato Impedovo
View author publications
You can also search for this author in PubMed Google Scholar
Mario Refice
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Patras, Greece
Constantinos Koutsojannis & Spiros Sirmakessis &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Impedovo, D., Refice, M. (2009). Improving Text-Dependent Speaker Recognition Performance. In: Koutsojannis, C., Sirmakessis, S. (eds) Tools and Applications with Artificial Intelligence. Studies in Computational Intelligence, vol 166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88069-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-88069-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88068-4
Online ISBN: 978-3-540-88069-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics