Abstract
A key concept in pattern recognition is that a pattern recognizer should be designed so as to minimize the errors it makes in classifying patterns. In this article, we review a recent, promising approach for minimizing the error rate of a classifier and describe a particular application to a simple, prototype-based speech recognizer. The key idea is to define a smooth, differentiable loss function that incorporates all adaptable classifier parameters and that approximates the actual performance error rate. Gradient descent can then be used to minimize this loss. This approach allows but does not require the use of explicitly probabilistic models. Furthermore, minimum error training does not involve the estimation of probability distributions that are difficult to obtain reliably. This new method has been applied to a variety of pattern recognition problems, with good results. Here we describe a particular application in which a relatively simple distance-based classifier is trained to minimize errors in speech recognition tasks. The loss function is defined so as to reflect errors at the level of the final, grammar-driven recognition output. Thus, minimization of this loss directly optimizes the overall system performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
R.O. Duda and P.E. Hart,Pattern Classification and Scene Analysis. John Wiley & Sons: New York, 1973.
K.S. Fu,Sequential Methods in Pattern Recognition and Machine Learning. Academic Press: New York, 1968.
E. McDermott and S. Katagiri, “Prototype-based discriminative training for various speech units,”Proc. IEEE ICASSP-92, San Francisco, CA, vol. I: pp. 417–420.
E. McDermott and S. Katagiri, “Smoother GPD training for a prototype-based minimum error classifier,”Proc. Acoust. Soc. Jpn. pp. 203–204, March 1992.
S. Amari, “A theory of adaptive pattern classifiers,”IEEE Trans. Princeton, NJ, vol. EC-16, no. 3, pp. 299–307, 1967.
S. Katagiri, C.H. Lee, and B.H. Juang, “Discriminative multilayer feedforward networks,”Proc. 1991 IEEE Workshop on Neural Networks for Signal Processing, August 1991, pp. 11–20.
S. Katagiri, C.H. Lee, and B.H. Juang, “A generalized probabilistic descent method,”Proc. Acoust. Soc. Jpn., Fall Meeting, 1990, pp. 141–142.
B.H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,”IEEE Trans. Signal Processing, vol. 40, no. 12, December 1992.
D. Rainton and S. Sagayama, “Appropriate error criterion selection for continuous speech HMM minimum error training,”Proc. Int. Conf. Spoken Language Processing, Banff, Canada 1992, pp. 233–236.
D. Rainton and S. Sagayama, “Minimum error classification training of HMMs—implementation details and experimental results,”J. Acoust. Soc. Jpn. vol. 13, no. 6, pp. 379–387, November 1992.
W. Chou, B.H. Juang, and C.H. Lee, “Segmental GPD training of HMM based speech recognizer,”Proc. IEEE ICASSP-92, 1992, pp. I:473–476.
T. Komori and S. Katagiri, “Application of GPD method to dynamic time warping-based speech recognition,”Proc. IEEE ICASSP-92, 1992, pp. I:497–500.
P.C. Chang and B.H. Juang, “Discriminative template training for dynamic programming speech recognition,”Proc. IEEE ICASSP-92, 1992, pp. I:493–496.
K.Y.Su and C.H. Lee, “Robustness and discrimination oriented speech recognition using weighted HMM and subspace projection approaches,”Proc. IEEE ICASSP-91, 1991, pp. I:541–544.
A. Biem and S. Katagiri, “Cepstrum liftering based on minimum classification error,”Proc. IEICE, Sendai, June 1992.
S. Young, N.H. Russell, and J.H.S. Thornton, “The use of syntax and multiple alternatives in the VODIS voice operated database inquiry system,”Comput. Speech Language vol. 5, pp. 65–80, 1991.
F.K. Soong and E.F. Huang, “CA tree-trellis based fast search for finding the N best sentence hypotheses in continuous speech recognition,”Proc. IEEE ICASSP-91, 1991, pp. 705–708.
P. Haffner, M. Franzini, and A. Waibel, “Integrating time alignment and neural networks for high performance continuous speech recognition,”Proc. IEEE ICASSP-91, 1991, pp. 105–108.
Y. Lepage, O. Furuse, and H. Iida, “Relation between a pattern-matching operation and a distance: on the path to reconcile two approaches in Natural Language Processing,”Proc. First Singapore Int. Conf. Intell. Syst., Singapore, November 1992, pp. 513–518.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
McDermott, E., Katagiri, S. Prototype-based minimum error training for speech recognition. Appl Intell 4, 245–256 (1994). https://doi.org/10.1007/BF00872091
Issue Date:
DOI: https://doi.org/10.1007/BF00872091