Abstract
Computer speech recognition has been very successful in limited domains and for isolated word recognition. However, widespread use of large-vocabulary continuous-speech recognizers is limited by the speed of current recognizers, which cannot reach acceptable error rates while running in real time. This paper shows how to harness shared memory multiprocessors, which are becoming increasingly common, to increase the speed significantly, and therefore the accuracy or vocabulary size, of a speech recognizer. To cover the necessary background, we begin with a tutorial on speech recognition. We then describe the parallelization of an existing high-quality speech recognizer, achieving a speedup of a factor of 3, 5, and 6 on 4-, 8-, and 12-processors respectively for the benchmark North American business news (NAB) recognition task.
Similar content being viewed by others
REFERENCES
M. D. Riley, A. Ljolje, D. Hindle, and F. Pereira, The AT & T 60,000 Word Speech-to-text System, Proc. EUROSPEECH-95, pp. 207–210 (1995).
K. A. Wen and J. F. Wang, Efficient Computing Methods for Parallel Processing: An Implementation of the Viterbi Algorithm, Computers Math. Applic., 17(12):1511–1521 (1989).
H. Noda and M. N. Shirazi, A MRF-Based Parallel Processing Algorithm for Speech Recognition Using Linear Predictive HMM, Proc. '94, pp. I-597-I-600 (1994).
M. Goudreau, K. Lang, S. Rao, and T. Tsantilas. Towards Efficiency and Portability: Programming with the BSP Model, Proc. Symp. Parallel Algorithms and Architectures (1996).
S. Phillips and A. Rogers, Parallel Speech Recognition, Proc. EUROSPEECH-97 (1997).
C.-H. Lee and L. R. Rabiner, A Frame-Synchronous Network Search Algorithm for Connected Word Recognition, IEEE Trans. Acoustics, Speech, Signal Proc., 37:1649–1658 (1989).
M. Mohri, F. Pereira, and M. Riley, Weighted Automata in Text and Speech Processing, Proc. ECAI-96 Workshop, ECAI (1996).
M. Mohri, Finite-State Transducers in Language and Speech Processing, Computational Linguistics, 23(2): 269–311 (1997).
L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, New Jersey (1993).
S. Pinker, The Language Instinct, W. Morrow and Co., New York (1994).
J. E. Shoup, Phonological Aspects of Speech Recognition, in Trends in Speech Recognition, W. A. Lea ( ed.), Prentice-Hall, Englewood Cliffs, New Jersey (1980).
A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm, J. Roy. Statist. Soc. 39(1):1–38 (1977).
E. W. Dijkstra, A Note on Two Problems in Connection with Graphs, Numerical Mathematics, 1:269–271 (1959).
F. Pereira and M. Riley, Speech Recognition by Composition of Weighted Finite Automata, Finite-State Language Processing, MIT Press (1997).
Kiem-Phong Vo, Vmalloc: A General and Efficient Memory Allocator, Software Practice & Experience, 26:1–18 (1996).
Rights and permissions
About this article
Cite this article
Phillips, S., Rogers, A. Parallel Speech Recognition. International Journal of Parallel Programming 27, 257–288 (1999). https://doi.org/10.1023/A:1018741730355
Issue Date:
DOI: https://doi.org/10.1023/A:1018741730355