Skip to main content
Log in

Text-dependent speaker identification based on input/output HMMs: An empirical study

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In this paper, we explore theInput/Output HMM (IOHMM) architecture for a substantial problem, that of text-dependent speaker identification. For subnetworks modeled with generalized linear models, we extend the IRLS algorithm to the M-step of the corresponding EM algorithm. Experimental results show that the improved EM algorithm yields significantly faster training than the original one. In comparison with the multilayer perceptron, the dynamic programming technique and hidden Markov models, we empirically demonstrate that the IOHMM architecture is a promising way to text-dependent speaker identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. G.R.Doddington, “Speaker recognition — identifying people by their voices”, Proc. IEEE, Vol. 73, pp. 1651–1664, 1986.

    Google Scholar 

  2. T.Matsui and S.Furui, “Speaker recognition technology”, NTT Review, Vol. 7, No. 2, pp. 40–48, 1995.

    Google Scholar 

  3. Y. Bennani and P. Gallinari, “Connectionist approaches for automatic speaker recognition”, Proc. ESCA Workshop on Automatic Speaker Recognition, Martigny, Switzerland, pp. 95–102, April 4–7, 1994.

  4. H.Sakoe and S.Chiba, “Dynamic programming algorithm optimization for speech word recognition”, IEEE Trans. Acoustics, Speech and Signal Processing, Vol. ASSP-26, No. 1, pp. 43–49. 1978.

    Google Scholar 

  5. Y.Bengio, P.Simard and P.Frasconi, “Learning long-term dependencies with gradient descent is difficult”, IEEE Trans. Neural Networks, Vol. 5, No. 2, pp. 157–166, 1994.

    Google Scholar 

  6. K.Chen, D.Xie and H.Chi, “Speaker identification using time-delay HMEs”, International Journal of Neural Systems, Vol. 7, No. 1 (March), pp. 29–43, 1996.

    Google Scholar 

  7. Y.Bengio and P.Frasconi, “An Input/Output HMM architecture”, in J.D.Cowan, G.Tesauro, J.Alspector (eds) Advances in Neural Information Systems 7, MIT Press: Cambridge, MA, 1995.

    Google Scholar 

  8. S. Furui, “An overview of speaker recognition technology”, Proc. ESCA Workshop on Automatic Speaker Recognition, Martigny, Switzerland, pp. 1–9, April 4–7, 1994.

  9. M.I.Jordan and R.A.Jacobs, “Hierarchical mixtures of experts and EM algorithm”, Neural Computation, Vol. 6, No. 2, pp. 181–214, 1994.

    Google Scholar 

  10. P.McCullagh and J.A.Nelder, Generalized Linear Models, Chapman and Hall: London, 1989.

    Google Scholar 

  11. K.Zwicker, “Subdivision of the audible frequency range into critical bands”, J. Acoust. Soc. Amer., Vol. 35, No. 2, pp. 248–252, 1961.

    Google Scholar 

  12. L.Rabiner and B.H.Juang, Fundamentals of Speech Recognition, Prentice-Hall: Englewood Cliffs, NJ, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, K., Xie, D. & Chi, H. Text-dependent speaker identification based on input/output HMMs: An empirical study. Neural Process Lett 3, 81–89 (1996). https://doi.org/10.1007/BF00571681

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00571681

Key words

Navigation