Skip to main content

Improving Text-Dependent Speaker Recognition Performance

  • Chapter
Tools and Applications with Artificial Intelligence

Part of the book series: Studies in Computational Intelligence ((SCI,volume 166))

  • 1361 Accesses

Abstract

In this paper we investigated the role of the frame length on the computation of MFCC acoustic parameters in a text-dependent speaker recognition system. Since the vocal characteristics of subjects may vary along the time, the related information conveyed by the MFCCs usually cause a significant degradation on recognition performance. In our ex- periment we tested the use of different frame lengths for the features extraction in the training and the recognition phases for a set of speakers whose speech productions spanned over 3 months. Results show that a suitable choice of the frame lengths combination for training and testing phases can improve the recognition performance reducing the false rejection rate. An expert system driven to look for the best combination of frame lengths in order to obtain the maximum performance level of the HHM engine may help in decreasing the amount of false rejections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Doddington, G.R.: Speaker Recognition-Identifying People by their Voices. Proceedings of IEEE 73(11), 1651–1664 (1985)

    Article  Google Scholar 

  2. Mammone, R.J., Zhang, X., Ramachandran, R.P.: Robust Speaker Recognition, A Feature- based Approach. IEEE Signal Processing Magazine, 58–71 (1996)

    Google Scholar 

  3. Furui, S.: Digital Speech Frocessing, Synthesis, and Recognition. Marcel Dekker, New York (1989)

    Google Scholar 

  4. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 19–41 (2000)

    Article  Google Scholar 

  5. Rabiner, L.R., Juang, B.H.: An Introduction to Hidden Markov Models. IEEE ASSF Magazine 3(1), 4–16 (1986)

    Article  Google Scholar 

  6. Zilca, R.D., Kingsbury, B., Ramaswamy, G.N.: Pseudo Pitch Synchronous Analysis of Speech With Applications to Speaker Recognition. IEEE Transactions on Audio, Speech, and Language Processing 14(2) (2006)

    Google Scholar 

  7. Impedovo, D., Refice, M.: Modular Engineering Prototyping Plan for Speech Recognition in a Visual Object Oriented Environment. Information Science and Applications 2(12), 2228–2234 (2005)

    Google Scholar 

  8. Impedovo, D., Refice, M.: A Fast Prototyping System for Speech Recognition based on a Visual Object Oriented Environment. In: Proceedings of 5th ISCGAV (2005)

    Google Scholar 

  9. Quatieri, T.F., Dunn, R.B., Reynolds, D.A.: On the influence of Rate, Pitch, and Spectrum on Automatic Speaker Recognition Performance. In: Proceedings of ICSLP 2000 (2000)

    Google Scholar 

  10. Kim, S., Eriksson, T., Kang, H.G., Youn, D.H.: A pitch synchronous feature extraction method for speaker recognition. In: Proceedings of ICASSP 2004, pp. II-405 – II-408 (2004)

    Google Scholar 

  11. Sae-Tang, S., Tanprasert, C.: Feature Windowing-Based for Thai Text-Dependent Speaker Identification Using MLP with Backpropagation Algorithm. In: Proceedings of ISCAS 2000 (2000)

    Google Scholar 

  12. Liu, J., Zheng, T.F., Wu, W.: P itch Mean Based Frequency Warping. In: Proceedings of ISCSLP 2006, pp. 87–94 (2006)

    Google Scholar 

  13. Zilca, R.D., Navratil, J., Ramaswamy, G.N.: Depitch and the role of fundamental frequency in speaker recognition. In: Proceedings of ICASSP 2003, pp. II-81 – II-84 (2003)

    Google Scholar 

  14. Impedovo, D., Refice, M.: The Influence of Frame Length on Speaker Identification Performance. In: Proceedings of IAS 2007, Manchester (2007)

    Google Scholar 

  15. Young, S.J.: HTK, Hidden Markov model toolkit V1.4, Technical report. Cambridge University, Speech Group

    Google Scholar 

  16. Rabiner, L.R., Schafer, R.: Digital Frocessing of Speech Signals, ISBN: 0132136031

    Google Scholar 

  17. Parsons, T.: Voice and Speech Frocessing. McGraw-Hill, New York (1987)

    Google Scholar 

  18. Hoppenheim, A.V., Schafer, R.W.: Homomorphic Analysis of Speech. IEEE Transaction On Audio and Electroacustics, vol. AU-16(2), pp. 221–226

    Google Scholar 

  19. Deller, J., Hansen, J., Proakis, J.: Discrete-Time Frocessing of Speech Signals. IEEE Press Classic Reissue (1999) ISBN: 0780353862

    Google Scholar 

  20. Wan, V., Renals, S.: Speaker Verification Using Sequence Discriminant Support Vector Machines. IEEE Transaction on Speech and Audio Processing 13(2) (March 2005)

    Google Scholar 

  21. Baum, L., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 14, 164–171 (1970)

    Article  MathSciNet  Google Scholar 

  22. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  23. Nolan, F.: Dynamic Variability in Speech (DyViS). A forensic phonetic study on British English, http://www.ling.cam.ac.uk/dyvis/

  24. Espy-Wilson, C.Y., Manocha, S., Vishnubhotla, S.: A new set of features for text-independent speaker ide ntification. In: Proceedings of ICSLP 2006, pp. 1475–1478 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Impedovo, D., Refice, M. (2009). Improving Text-Dependent Speaker Recognition Performance. In: Koutsojannis, C., Sirmakessis, S. (eds) Tools and Applications with Artificial Intelligence. Studies in Computational Intelligence, vol 166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88069-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88069-1_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88068-4

  • Online ISBN: 978-3-540-88069-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics