Abstract
We propose a technique for regulating LVCSR decoding speed based on a proportional-integral-derivative (PID) model that is widely used in automatic control theory. Our experiments show that such a controller can maintain a given decoding speed level despite computer performance fluctuations, difficult acoustic conditions, or speech material that is out of the scope of the language model, without notable deterioration in overall recognition quality.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Steinbiss, V., Tran, B.-H., Ney, H.: Improvements in Beam Search. In: Proc. of the ICSLP, Yokohama, Japan, September 18-22, pp. 2143–2146 (1994)
Nolden, D., Schluter, R., Ney, H.: Extended search space pruning in LVCSR. In: Proc. of the ICASSP, Kyoto, Japan, March 25-30, pp. 4429–4432 (2012)
Hamme, H., Aellen, F.: An Adaptive-Beam Pruning Technique for Continuous Speech Recognition. In: Proc. of the ICSLP, Philadelphia, Pennsylvania, October 3-6, pp. 2083–2086 (1996)
Zhang, D., Du, L.: Dynamic Beam Pruning Strategy Using Adaptive Control. In: Proc. of the INTERSPEECH, Jeju Island, Korea, October 4-8, pp. 285–288 (2004)
Fabian, T., Lieb, R., Ruske, G., Thomae, M.: A Confidence-Guided Dynamic Pruning Approach-Utilization of Confidence Measurement in Speech Recognition. In: Proc. of the INTERSPEECH, Lisbon, Portugal, September 4-8, pp. 585–588 (2005)
Chan, A., Mosur, R., Rudnicky, A., Sherwani, J.: Four-layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems. In: Proc. of the ICSLP, Jeju Island, Korea, October 4-8, pp. 689–692 (2004)
Dixon, P., Oonishi, T., Furui, S.: Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition. Computer Speech & Language 23(4), 510–526 (2009)
Lei, X., Senior, A., Gruenstein, A., Sorensen, J.: Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices. In: Proc. of the INTERSPEECH, Lyon, France, August 25-29, pp. 662–665 (2013)
Ang, K., Chong, G., Li, Y.: PID control system analysis, design, and technology. IEEE Transactions on Control Systems Technology 13(4), 559–576 (2005)
Young, S., Russell, N., Thornton, J.: Token Passing: a Conceptual Model for Connected Speech Recognition Systems. CUED Technical Report F INFENG/TR38. Cambridge University, Cambridge (1989)
Saon, G., Povey, D., Zweig, G.: Anatomy of an extremely fast LVCSR decoder. In: Proc. of the INTERSPEECH, Lisbon, Portugal, September 4-8, pp. 549–552 (2005)
Li, Y., Ang, K., Chong, G.: Patents, software and hardware for PID control: an overview and analysis of the current art. IEEE Control Systems Magazine 26(1), 42–54 (2006)
Dixon, P., Caseiro, D., Oonishi, T., Furui, S.: The Titech large vocabulary WFST speech recognition system. In: Proc. of the ASRU, Kyoto, Japan, December 9-13, pp. 443–448 (2007)
Novak, J., Minematsu, N., Hirose, K.: Open Source WFST Tools for LVCSR Cascade Development. In: Proc. of the FSMNLP, Bois, France, July 12-16, pp. 65–73 (2011)
Allauzen, C., Mohri, M., Riley, M., Roark, B.: A Generalized Construction of Integrated Speech Recognition Transducers. In: Proc. of the ICASSP, Montreal, Canada, May 17-21, vol. 1, pp. 761–764 (2004)
Mohri, M., Pereira, F., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)
Schwarz, P.: Phoneme recognition based on long temporal context (PhD thesis). Faculty of Information Technology BUT, Brno (2008)
Yurkov, P., Korenevsky, M., Levin, K.: An Improvement of robustness to speech loudness change for an ASR system based on LC-RC features. In: Proc. of the SPECOM, Kazan, Russia, September 27-30, pp. 62–66 (2011)
Tomashenko, N.A., Khokhlov, Y.Y.: Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 146–153. Springer, Heidelberg (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zatvornitsky, A., Romanenko, A., Korenevsky, M. (2014). Proportional-Integral-Derivative Control of Automatic Speech Recognition Speed. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-11581-8_45
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)