Continuous speech recognition with the connectionist viterbi training procedure: A summary of recent work

Franzini, Michael; Waibel, Alex; Lee, Kai-Fu

doi:10.1007/BFb0035914

Michael Franzini¹,
Alex Waibel² &
Kai-Fu Lee³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 540))

Included in the following conference series:

International Workshop on Artificial Neural Networks

199 Accesses

Abstract

Hybrid methods which combine hidden Markov models (HMMs) and connectionist techniques take advantage of what are believed to be the strong points of each of the two approaches: the powerful discrimination-based learning of connectionist networks and the time-alignment capability of HMMs. Connectionist Viterbi Training (CVT) is a simple variation of Viterbi training which uses a back-propagation network to represent the output distributions associated with the transitions in the HMM. The work reported here represents the culmination of three years of investigation of various means by which HMMs and neural networks (NNs) can be combined for continuous speech recognition. This paper describes the CVT procedure, discusses the factors most important to its design and reports its recognition performance. Several changes made to the system over the past year are reported here, including: (1) the change from recurrent to non-recurrent NNs, (2) the change from Sphinx-style phone-based HMMs to word-based HMMS, (3) the addition of a corrective training procedure, and (3) the addition of an alternate model for every word. The CVT system, incorporating these changes, achieves 99.1% word accuracy and 98.0% string accuracy on the TI/NBS Connected Digits task (“TI Digits”).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brown, P., The Acoustic Modeling Problem in Automatic Speech Recognition, Ph.D. Thesis, Carnegie Mellon University, May, 1987.
Google Scholar
Huang, W., Lippmann, R. “HMM Speech Recognition with Neural Net Discrimination,” Proc. Neural Information Processing Systems (NIPS) Conference, November, 1989.
Google Scholar
Bourlard, H. and Morgan, N. Merging Multilayer Perceptrons and Hidden Markov Models: Some Experiments in Continuous Speech Recognition, Tech. Report TR-89-033, July, 1989, International Computer Science Institute, Berkeley, CA.
Google Scholar
Elman, J.L. Finding Structure in Time, Tech. report, Center for Research in Language, University of California, San Diego, April, 1988.
Google Scholar
Bakis, R. “Continuous Speech Recognition via Centisecond Acoustic States,” Proc. 91st Meeting Acoustical Soc. of America. April, 1976.
Google Scholar
Picone, J. “On Modeling Duration in Context in Speech Recognition,” Proc. ICASSP, April, 1989.
Google Scholar
Lee, K.F. Large-Vocabulary Speaker-Independent Continuous Speech Recognition: The SPHINX System, Ph.D. Thesis, Carnegie Mellon University. 1988.
Google Scholar

Download references

Author information

Authors and Affiliations

Telefónica Investigación y Desarrollo, Emilio Vargas, 6, 28043, Madrid, SPAIN
Michael Franzini
School of Computer Science, Carnegie Mellon University, 15213, Pittsburgh, PA, USA
Alex Waibel
Apple Computer Corporation, 95014, Cupertino, CA, USA
Kai-Fu Lee

Authors

Michael Franzini
View author publications
You can also search for this author in PubMed Google Scholar
Alex Waibel
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Fu Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alberto Prieto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Franzini, M., Waibel, A., Lee, KF. (1991). Continuous speech recognition with the connectionist viterbi training procedure: A summary of recent work. In: Prieto, A. (eds) Artificial Neural Networks. IWANN 1991. Lecture Notes in Computer Science, vol 540. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0035914

Download citation

DOI: https://doi.org/10.1007/BFb0035914
Published: 22 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54537-8
Online ISBN: 978-3-540-38460-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics