Abstract
In this paper, we explore theInput/Output HMM (IOHMM) architecture for a substantial problem, that of text-dependent speaker identification. For subnetworks modeled with generalized linear models, we extend the IRLS algorithm to the M-step of the corresponding EM algorithm. Experimental results show that the improved EM algorithm yields significantly faster training than the original one. In comparison with the multilayer perceptron, the dynamic programming technique and hidden Markov models, we empirically demonstrate that the IOHMM architecture is a promising way to text-dependent speaker identification.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
G.R.Doddington, “Speaker recognition — identifying people by their voices”, Proc. IEEE, Vol. 73, pp. 1651–1664, 1986.
T.Matsui and S.Furui, “Speaker recognition technology”, NTT Review, Vol. 7, No. 2, pp. 40–48, 1995.
Y. Bennani and P. Gallinari, “Connectionist approaches for automatic speaker recognition”, Proc. ESCA Workshop on Automatic Speaker Recognition, Martigny, Switzerland, pp. 95–102, April 4–7, 1994.
H.Sakoe and S.Chiba, “Dynamic programming algorithm optimization for speech word recognition”, IEEE Trans. Acoustics, Speech and Signal Processing, Vol. ASSP-26, No. 1, pp. 43–49. 1978.
Y.Bengio, P.Simard and P.Frasconi, “Learning long-term dependencies with gradient descent is difficult”, IEEE Trans. Neural Networks, Vol. 5, No. 2, pp. 157–166, 1994.
K.Chen, D.Xie and H.Chi, “Speaker identification using time-delay HMEs”, International Journal of Neural Systems, Vol. 7, No. 1 (March), pp. 29–43, 1996.
Y.Bengio and P.Frasconi, “An Input/Output HMM architecture”, in J.D.Cowan, G.Tesauro, J.Alspector (eds) Advances in Neural Information Systems 7, MIT Press: Cambridge, MA, 1995.
S. Furui, “An overview of speaker recognition technology”, Proc. ESCA Workshop on Automatic Speaker Recognition, Martigny, Switzerland, pp. 1–9, April 4–7, 1994.
M.I.Jordan and R.A.Jacobs, “Hierarchical mixtures of experts and EM algorithm”, Neural Computation, Vol. 6, No. 2, pp. 181–214, 1994.
P.McCullagh and J.A.Nelder, Generalized Linear Models, Chapman and Hall: London, 1989.
K.Zwicker, “Subdivision of the audible frequency range into critical bands”, J. Acoust. Soc. Amer., Vol. 35, No. 2, pp. 248–252, 1961.
L.Rabiner and B.H.Juang, Fundamentals of Speech Recognition, Prentice-Hall: Englewood Cliffs, NJ, 1993.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chen, K., Xie, D. & Chi, H. Text-dependent speaker identification based on input/output HMMs: An empirical study. Neural Process Lett 3, 81–89 (1996). https://doi.org/10.1007/BF00571681
Issue Date:
DOI: https://doi.org/10.1007/BF00571681