Abstract
We propose an approach to improving the usability of an automatic speech recognition system in real time. We introduce the concept of an “uncertainty area” (UA): a time span within which the current recognition result may vary. By fixing the length of the UA we make it possible to start editing the recognized text without waiting for the phrase to end. We control the length of the UA by regularly pruning hypotheses using additional criteria. The approach was implemented in the software-hardware system for closed captioning of Russian live TV broadcasts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Evans, M.J.: Speech Recognition in Assisted and Live Subtitling for Television. R&D White Paper WHP 065, BBC Research & Development (2003)
Pražák, A., Loose, Z., Trmal, J., Psutka, V.J., Psutka, J.: Novel Approach to Live Captioning Through Re-speaking: Tailoring Speech Recognition to Re-speaker’s Needs. In: Proc. of the INTERSPEECH, Portland, USA, September 9-13 (2012)
Viterbi, A.J.: Convolutional codes and their performance in communication systems. IEEE Transactions on Communication Technology 19(5), 751–772 (1971)
Mohri, M., Pereira, F., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)
Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: Cross-Validation State Control in Acoustic Model Training of Automatic Speech Recognition System. Scientific and Technical Journal Priborostroenie 57(2), 23–28 (2014)
Yurkov, P., Korenevsky, M., Levin, K.: An Improvement of robustness to speech loudness change for an ASR system based on LC-RC features. In: Proc. of the SPECOM, Kazan, Russia, September 27-30, pp. 62–66 (2011)
Prisyach, T., Khokhlov, Y.: Class acoustic models in automatic speech recognition. In: Proc. of the SPECOM, Kazan, Russia, September 27-30, pp. 67–72 (2011)
Korenevsky, M., Bulusheva, A., Levin, K.: Unknown Words Modeling in Training and Using Language Models for Russian LVCSR System. In: Proc. of the SPECOM, Kazan, Russia, pp. 144–150 (2011)
Tomashenko, N., Khokhlov, Y.: Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data. In: Proc. SPECOM, Plzen, Czech Republic, September 1-5, pp. 146–153 (2013)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 30–42 (2012)
Schwarz, P.: Phoneme recognition based on long temporal context (PhD thesis). Faculty of Information Technology BUT, Brno (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Merkin, N., Medennikov, I., Romanenko, A., Zatvornitskiy, A. (2014). Controlling the Uncertainty Area in the Real Time LVCSR Application. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-11581-8_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)