Skip to main content
Log in

Prototype-based minimum error training for speech recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

A key concept in pattern recognition is that a pattern recognizer should be designed so as to minimize the errors it makes in classifying patterns. In this article, we review a recent, promising approach for minimizing the error rate of a classifier and describe a particular application to a simple, prototype-based speech recognizer. The key idea is to define a smooth, differentiable loss function that incorporates all adaptable classifier parameters and that approximates the actual performance error rate. Gradient descent can then be used to minimize this loss. This approach allows but does not require the use of explicitly probabilistic models. Furthermore, minimum error training does not involve the estimation of probability distributions that are difficult to obtain reliably. This new method has been applied to a variety of pattern recognition problems, with good results. Here we describe a particular application in which a relatively simple distance-based classifier is trained to minimize errors in speech recognition tasks. The loss function is defined so as to reflect errors at the level of the final, grammar-driven recognition output. Thus, minimization of this loss directly optimizes the overall system performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R.O. Duda and P.E. Hart,Pattern Classification and Scene Analysis. John Wiley & Sons: New York, 1973.

    Google Scholar 

  2. K.S. Fu,Sequential Methods in Pattern Recognition and Machine Learning. Academic Press: New York, 1968.

    Google Scholar 

  3. E. McDermott and S. Katagiri, “Prototype-based discriminative training for various speech units,”Proc. IEEE ICASSP-92, San Francisco, CA, vol. I: pp. 417–420.

  4. E. McDermott and S. Katagiri, “Smoother GPD training for a prototype-based minimum error classifier,”Proc. Acoust. Soc. Jpn. pp. 203–204, March 1992.

  5. S. Amari, “A theory of adaptive pattern classifiers,”IEEE Trans. Princeton, NJ, vol. EC-16, no. 3, pp. 299–307, 1967.

    Google Scholar 

  6. S. Katagiri, C.H. Lee, and B.H. Juang, “Discriminative multilayer feedforward networks,”Proc. 1991 IEEE Workshop on Neural Networks for Signal Processing, August 1991, pp. 11–20.

  7. S. Katagiri, C.H. Lee, and B.H. Juang, “A generalized probabilistic descent method,”Proc. Acoust. Soc. Jpn., Fall Meeting, 1990, pp. 141–142.

  8. B.H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,”IEEE Trans. Signal Processing, vol. 40, no. 12, December 1992.

  9. D. Rainton and S. Sagayama, “Appropriate error criterion selection for continuous speech HMM minimum error training,”Proc. Int. Conf. Spoken Language Processing, Banff, Canada 1992, pp. 233–236.

  10. D. Rainton and S. Sagayama, “Minimum error classification training of HMMs—implementation details and experimental results,”J. Acoust. Soc. Jpn. vol. 13, no. 6, pp. 379–387, November 1992.

    Google Scholar 

  11. W. Chou, B.H. Juang, and C.H. Lee, “Segmental GPD training of HMM based speech recognizer,”Proc. IEEE ICASSP-92, 1992, pp. I:473–476.

  12. T. Komori and S. Katagiri, “Application of GPD method to dynamic time warping-based speech recognition,”Proc. IEEE ICASSP-92, 1992, pp. I:497–500.

  13. P.C. Chang and B.H. Juang, “Discriminative template training for dynamic programming speech recognition,”Proc. IEEE ICASSP-92, 1992, pp. I:493–496.

  14. K.Y.Su and C.H. Lee, “Robustness and discrimination oriented speech recognition using weighted HMM and subspace projection approaches,”Proc. IEEE ICASSP-91, 1991, pp. I:541–544.

  15. A. Biem and S. Katagiri, “Cepstrum liftering based on minimum classification error,”Proc. IEICE, Sendai, June 1992.

  16. S. Young, N.H. Russell, and J.H.S. Thornton, “The use of syntax and multiple alternatives in the VODIS voice operated database inquiry system,”Comput. Speech Language vol. 5, pp. 65–80, 1991.

    Google Scholar 

  17. F.K. Soong and E.F. Huang, “CA tree-trellis based fast search for finding the N best sentence hypotheses in continuous speech recognition,”Proc. IEEE ICASSP-91, 1991, pp. 705–708.

  18. P. Haffner, M. Franzini, and A. Waibel, “Integrating time alignment and neural networks for high performance continuous speech recognition,”Proc. IEEE ICASSP-91, 1991, pp. 105–108.

  19. Y. Lepage, O. Furuse, and H. Iida, “Relation between a pattern-matching operation and a distance: on the path to reconcile two approaches in Natural Language Processing,”Proc. First Singapore Int. Conf. Intell. Syst., Singapore, November 1992, pp. 513–518.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

McDermott, E., Katagiri, S. Prototype-based minimum error training for speech recognition. Appl Intell 4, 245–256 (1994). https://doi.org/10.1007/BF00872091

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00872091

Key words

Navigation