Abstract
When there is a mismatch between training and testing environments, statistical pattern classification methods may suffer from severe degradation in their performance because the parameters in the classifiers do not represent the testing data well. The mismatch is typically due to the interference or noises from operating environments. In this paper, a neural network based transformation approach is studied to handle the distribution mismatches between training and testing data. The probability density functions of the statistical classifiers are used as the objective function of the neural network. The neural network maximizes the likelihood of the data from a testing environment, and allows global optimization of the network when used with the statistical pattern classifiers. The proposed approach is applied to the area of automatic speech recognition to recognize noisy distant-talking speech and it reduces the error rate by 52.9%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Y. Bengio, R. DeMori, G. Flammia, and R. Kompe. Global optimization of a neural network-hidden Markov model hybrid. IEEE Transactions on Neural Networks, 3(2):252–259, March 1992.
A. Biem and S. Katagiri. Feature extraction based on minimum classification error/generalized probabilistic descent method. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2:275–278, April 1993.
S. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-35(3):400–401, March 1987.
P. Price, W. Fisher, J. Bernstein, and D. Pallett. The DARPA 1000-word resource management database for continuous speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1:651–654, April 1988.
M. Rahim and C. Lee. Simultaneous ANN feature and HMM recognizer design using string-based minimum classification error (MCE) training. International Conference on Spoken Language Processing, 3:1824–1827, October 1996.
D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by error propagation. In J. McClelland D. Rumelhart, editor, Parallel Distributed Processing: Exploration in the Micro-Structure of Cognition, volume 1, pages 318–362. MIT Press, 1986.
S. Tamura and A. Waibel. Noise reduction using connectionist models. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1:553–556, April 1988.
D. Yuk. Robust Speech Recognition Using Neural Networks and Hidden Markov Models. PhD thesis, Rutgers University, 1999.
D. Yuk, C. Che, and J. Flanagan. Robust speech recognition using maximum likelihood neural networks and continuous density hidden Markov models. IEEE Workshop on Automatic Speech Recognition and Understanding, pages 474–481, December 1997.
D. Yuk, C. Che, L. Jin, and Q. Lin. Environment-independent continuous speech recognition using neural networks and hidden Markov models. IEEE International Conference on Acoustics, Speech, and Signal Processing, 6:3358–3361, May 1996.
D. Yuk, C. Che, P. Raghavan, S. Chennoukh, and J. Flanagan. N-best breadth search for large vocabulary continuous speech recognition using a long span language model. 136th meeting of Acoustical Society of America, page 1819, October 1998.
D. Yuk and J. Flanagan. Telephone speech recognition using neural networks and hidden Markov models. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1:157–160, March 1999.
D. Yuk, J. Flanagan, M. Krishnamoorthy, and K. Dayanidhi. Adaptation to environment and speaker using maximum likelihood neural networks. Eurospeech, pages 2531–2534, September 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yook, D. (2002). Hidden Markov Model and Neural Network Hybrid. In: Shafazand, H., Tjoa, A.M. (eds) EurAsia-ICT 2002: Information and Communication Technology. EurAsia-ICT 2002. Lecture Notes in Computer Science, vol 2510. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36087-5_23
Download citation
DOI: https://doi.org/10.1007/3-540-36087-5_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00028-0
Online ISBN: 978-3-540-36087-2
eBook Packages: Springer Book Archive