Improving speech transcription by exploiting user feedback and word repetition

Wang, Xiangdong; Yang, Ying; Liu, Hong; Qian, Yueliang

doi:10.1007/s11042-017-4714-x

Improving speech transcription by exploiting user feedback and word repetition

Published: 08 June 2017

Volume 76, pages 20359–20376, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xiangdong Wang ORCID: orcid.org/0000-0002-4226-3250^1,2,
Ying Yang³,
Hong Liu^1,2 &
…
Yueliang Qian^1,2

227 Accesses
2 Citations
Explore all metrics

Abstract

Speech Transcription is important for video/audio retrieval and many other applications. In automatic speech transcription, recognition errors are inevitable, which makes user feedback such as manual error correction necessary. In this paper, an approach is proposed to improve the accuracy of speech transcription by exploiting user feedback and word repetition. The method aims at learning from user feedback and recognition results of preceding utterances and then correcting errors when repeated words are falsely recognized in following utterances. An interaction scheme for user feedback is proposed, which facilitate error correction by candidate lists and provide a new kind of feedback referred to as word indication to extend error correction from repeated words to repeated phrases. For template extraction and matching, the representation of word template and recognition results based on syllable confusion network (SCN) is proposed. During the transcription, templates of multi-syllable words/phrases based on SCN are extracted from user feedback and the N-best lattice, and then matched in SCN corresponding to recognition results of subsequent utterances to yield a new candidate list when repeated words are detected. Experimental results show that considerate error reduction is achieved in the newly-generated candidate lists.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction of Semantic Model to Help Speech Recognition

Keyword Spotting in Continuous Speech Using Spectral and Prosodic Information Fusion

Article 17 November 2018

Domain Knowledge Enhanced Error Correction Service for Intelligent Speech Interaction

References

Chen H, Cooper M, Joshi D, Girod B (2014) Multi-modal language models for lecture video retrieval. ACM International Conference on Multimedia, pp 1081–1084
Favre B, Rouvier M, Bechet F (2014) Reranked aligners for interactive transcript correction. Proc ICASSP 2014:146–150
Google Scholar
Harwath D, Gruenstein A, Mcgraw I et al (2014) Choosing useful word alternates for automatic speech recognition correction interfaces. Proc INTERSPEECH 2014:949–953
Google Scholar
Jia D, Wang X, Ma Y, Yang Y, Liu H, Qian Y (2016) Language model adaptation based on correction information for interactive speech transcription. The 2016 International Conference on Progress in Informatics and Computing (PIC-2016), Shanghai
Karat CM, Halverson C, Horn D, Karat J (1999) Patterns of entry and correction in large vocabulary continuous speech recognition systems. Proc. CHI, pp 568–575
Laurent A, Meignier S et al (2011) Computer-assisted transcription of speech based on confusion network reordering. ICASSP 2011:4884–4887
Google Scholar
Lecouteux B, Linares G et al (2006) Imperfect transcript driven speech recognition. Interspeech 2006, Pittburgh
Li X, Wang X, Qian Y, Lin S (2009) Candidate generation for interactive Chinese speech recognition. Proc. joint conferences on pervasive computing (JCPC), pp 583–588
Liang Y, Iwano K, Shinoda K (2014, Dec 7) An Efficient error correction Interface for speech recognition on mobile touchscreen devices. Proc. Spoken Language Technology (SLT) Workshop, pp 454–459
Liang Y, Iwano K, Shinoda K (2014, Sept 16) Simple gesture-based error correction Interface for smartphone speech recognition. Proc. INTERSPEECH, pp 1194–1198
Mangu L, Brill E, Stolcke A (2000) Finding consensus in speech recognition: word error minization and other application of confusion network. Comput Speech Lang 14(4):373–400
Article Google Scholar
Miro JDV, Silvestrecerda JA, Civera J, Turro C, Juan A (2015) Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories. Speech Comm 2015:65–75
Article Google Scholar
Nie L, Wang M, Gao Y, Zha Z-J, Chua T-S (2013) Beyond text QA: multimedia answer generation by harvesting web information. IEEE Trans Multimedia 15(2):426–441
Article Google Scholar
Ogata J, Goto M (2005) Speech repair: quick error correction just by using selection operation for speech input interfaces. In: Proc Interspeech, pp 133–136, 2006
Parada C, Sethy A, Dredze M, Jelinek F (2010) A spoken term detection framework for recovering out-of-vocabulary words using the web. Proc INTERSPEECH 2010:1269–1272
Google Scholar
Rodríguez L, Casacuberta F, Vidal E (2007) Computer assisted transcription of speech. Lect Notes Comput Sci 4477:241–248
Article Google Scholar
Rodríguez L, García-Varea I, Vidal E (2010) Multi-modal computer assisted speech transcription. International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal interaction (ICMI-MLMI '10)
Sperber M, Neubig G, Nakamura S, Waibe A (2016) Optimizing computer-assisted transcription quality with iterative user interfaces. Proc. Language Resources and Evaluation (LREC)
Suhm B (1997) Empirical evaluation of interactive Multimodal error correction. Proc. IEEE Workshop on speech recognition and understanding, pp 583–590
Suhm B, Myers B, Waibel A (1996) Designing interactive error recovery methods for speech interfaces. Proceedings of ACM CHI. Workshop on Designing the User interface for Speech Recognition applications
Valor Miró JD, Spencer RN, Pérez González de Martos A, Garcés G, Díaz-Munío CT, Civera J, Juan A (2014) Evaluating intelligent interfaces for post-editing automatic transcriptions of online video lectures. Open Learning: The Journal of Open and Distance Learning 29(1):72–85
Article Google Scholar
Valor Miró JD, Silvestre-Cerdà JA, Civera J, Turró C, Juan A (2015) Efficient generation of high-quality multilingual subtitles for video lecture repositories. In: Conole G, Klobučar T, Rensing C, Konert J, Lavoué É (eds) Design for teaching and learning in a networked world. Lecture notes in Computer Science, vol 9307. Springer, Cham
Wang L, Hu T, Liu P, Soong FK (2008) Efficient handwriting correction of speech recognition errors with template constrained posterior (TCP). Proc. INTERSPEECH, pp 2659–2662
Wang X, Li X, Qian Y, Liu H (2016) Automatic error correction for repeated words in Mandarin speech recognition. Journal of Automation and Control Engineering 4(2):153–158
Article Google Scholar
Xue J and Zhao Y-X (2005) Improved confusion network algorithm and shortest path search from word lattice. ICASSP 2005; 1: 853–856
Zhang H, Wang X, Qian Y, Lin S (2011) An interactive way to acquire internet documents for language model adaptation of speech recognition systems. International Conference on Intelligent Human-Machine ystems and Cybernetics (IHMSC 2011), pp 97–100

Download references

Author information

Authors and Affiliations

Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Xiangdong Wang, Hong Liu & Yueliang Qian
Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Xiangdong Wang, Hong Liu & Yueliang Qian
China Agricultural University, Beijing, 100083, China
Ying Yang

Authors

Xiangdong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yueliang Qian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangdong Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Yang, Y., Liu, H. et al. Improving speech transcription by exploiting user feedback and word repetition. Multimed Tools Appl 76, 20359–20376 (2017). https://doi.org/10.1007/s11042-017-4714-x

Download citation

Received: 15 November 2016
Revised: 17 March 2017
Accepted: 12 April 2017
Published: 08 June 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11042-017-4714-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving speech transcription by exploiting user feedback and word repetition

Abstract

Access this article

Similar content being viewed by others

Introduction of Semantic Model to Help Speech Recognition

Keyword Spotting in Continuous Speech Using Spectral and Prosodic Information Fusion

Domain Knowledge Enhanced Error Correction Service for Intelligent Speech Interaction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving speech transcription by exploiting user feedback and word repetition

Abstract

Access this article

Similar content being viewed by others

Introduction of Semantic Model to Help Speech Recognition

Keyword Spotting in Continuous Speech Using Spectral and Prosodic Information Fusion

Domain Knowledge Enhanced Error Correction Service for Intelligent Speech Interaction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation