Abstract
This paper analysis the behavior of forward and backward-based decoders used for speech transcription. Experiments have showed that backward-based decoding leads to similar recognition performance as forward-based decoding, which is consistent with the fact that both systems handle similar information through the acoustic, lexical and language models. However, because of heuristics, search algorithms used in decoding explore only a limited portion of the search space. As forward-based and backward-based approaches do not process the speech signal in the same temporal way, they explore different portions of the search space; leading to complementary systems that can be efficiently combined using the ROVER approach. The speech transcription results achieved by combining forward-based and backward-based systems are significantly better than the results obtained by combining the same amount of forward-only or backward-only systems. This confirms the complementary of the forward and backward approaches and thus the usefulness of their combination.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fiscus, J.G.: A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER). In: Proc. ASRU 1997, IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 347–354 (1997)
Schwenk, H., Gauvain, J.-L.: Combining multiple speech recognizers using voting and language model information. In: Proc. INTERSPEECH 2000, pp. 915–918 (2000)
Hillard, D., Hoffmeister, B., Ostendorf, M., Schlüter, R., Ney, H.: iROVER: improving system combination with classification. In: Conf. of the North American Chapter of the Association for Computational Linguistics, Rochester, New-York, pp. 65–68 (2007)
Evermann, G., Woodland, P.C.: Posterior probability decoding, confidence estimation and system combination. In: Proc. NIST Speech Transcription Workshop (2000)
Bougares, F., Estéve, Y., Deléglise, P., Linares, G.: Bag of n-gram driven decoding for LVCSR system harnessing. In: Proc. ASRU 2011, IEEE Workshop on Automatic Speech Recognition and Understanding, Hawaï, USA (2011)
Sphinx (2011), http://cmusphinx.sourceforge.net/
Galliano, S., Gravier, G., Chaubard, L.: The Ester 2 evaluation campaign for rich transcription of French broadcasts. In: Proc. INTERSPEECH 2009, 10th Annual Conf. of the Int. Speech Communication Association, Brighton, UK, pp. 2583–2586 (2009)
Gravier, G., Adda, G., Paulsson, N., Carré, M., Giraudel, A., Galibert, O.: The ETAPE corpus for the evaluation of speech-based TV content processing in the French language. In: Proc. LREC 2012, Int. Conf. on Language Resources, Evaluation and Corpora, Istanbul, Turkey (2012)
Estéve, Y., Bazillon, T., Antoine, J.-Y., Béchet, F., Farinas, J.: The EPAC corpus: Manual and automatic annotations of conversational speech in French broadcast news. In: Proc. LREC 2010, European Conf. on Language Resources and Evaluation, Valetta, Malta (2010)
Corpus EPAC: Transcriptions orthographiques. Catalogue ELRA, reference ELRA-S0305, http://catalog.elra.info
de Calmés, M., Pérennou, G.: BDLEX: A Lexicon for Spoken and Written French. In: Proc. LREC 1998, 1st Int. Conf. on Language Resources & Evaluation, Grenade, pp. 1129–1136 (1998)
Illina, I., Fohr, D., Jouvet, D.: Grapheme-to-Phoneme Conversion using Conditional Random Fields. In: Proc. INTERSPEECH 2011, Florence, Italy (2011)
Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proc. ICSLP 2002, Int. Conf. on Spoken Language Processing, Denver, Colorado (2002)
Mendonça, A., Graff, D., DiPersio, D.: French Gigaword Second Edition. Linguistic Data Consortium, Philadelphia (2009)
NIST evaluation tools, http://www.itl.nist.gov/iad/mig//tools/
Gravier, G., Adda, G.: Evaluations en traitement automatique de la parole (ETAPE). Evaluation Plan, Etape 2011, version 2.0 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jouvet, D., Fohr, D. (2013). Analysis and Combination of Forward and Backward Based Decoders for Improved Speech Transcription. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-40585-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)