The Vocapia Research ASR Systems for Evalita 2011

Despres, Julien; Lamel, Lori; Gauvain, Jean-Luc; Vieru, Bianca; Woehrling, Cécile; Le, Viet Bac; Oparin, Ilya

doi:10.1007/978-3-642-35828-9_31

Julien Despres²³,
Lori Lamel^23,24,
Jean-Luc Gauvain^23,24,
Bianca Vieru²³,
Cécile Woehrling²³,
Viet Bac Le²³ &
…
Ilya Oparin²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7689))

Included in the following conference series:

International Workshop on Evaluation of Natural Language and Speech Tool for Italian

667 Accesses
1 Citations

Abstract

This document describes the automatic speech-to-text transcription used by Vocapia Research for the Evalita 2011 evaluation for the open unconstrained automatic speech recognition (ASR) task. The aim of this evaluation was to perform automatic speech recognition of parliament audio sessions in the Italian language. About 30h of untranscribed audio data and one year of minutes from parliament sessions were provided as training corpus. This corpus was used to carry out an unsupervised adaptation of Vocapia’s Italian broadcast speech transcription system. Transcriptions produced by two systems were submitted. The primary system has a single decoding pass and was optimized to run in real time. The contrastive system, developed in collaboration with Limsi-CNRS, has two decoding passes and runs in about 5×RT. The case-insensitive word error rates (WER) of these systems are respectively 10.2% and 9.3% on the Evalita development data and 6.4% and 5.4% on the evaluation data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kimball, O., Kao, C.L., Iyer, R., Arvizo, T., Makhoul, J.: Using Quick Transcriptions to Improve Conversational Speech Models. In: INTERSPEECH, Jeju Island, pp. 2265–2268 (2004)
Google Scholar
Cieri, C., Miller, D., Walker, K.: The Fisher Corpus: a Resource for the Next Generations of Speech-To-Text. In: LREC, Lisbon, pp. 69–71 (2004)
Google Scholar
Gollan, C., Bisani, M., Kanthak, S., Schluter, R., Ney, H.: Cross Domain Automatic Transcription on the TC-STAR EPPS Corpus. In: ICASSP, Philadelphia, pp. 825–828 (2005)
Google Scholar
Bisani, M., Ney, H.: Joint-Sequence Models for Grapheme-to-Phoneme Conversion. Speech Communication 50(5), 434–451 (2008)
Article Google Scholar
Wessel, F., Ney, H.: Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition. IEEE Transactions on Speech and Audio Processing 13(1), 23–31 (2005)
Article Google Scholar
Ma, J., Schwartz, R.: Unsupervised Versus Supervised Training of Acoustic Models. In: INTERSPEECH, Brisbane, pp. 2374–2377 (2008)
Google Scholar
Brugnara, F., Cettolo, M., Federico, M., Giuliani, D.: A Baseline for the Transcription of Italian Broadcast News. In: ICASSP, Istanbul, pp. 1667–1670 (2000)
Google Scholar
Brugnara, F., Cettolo, M., Federico, M., Giuliani, D.: Advances in Automatic Transcription of Italian Broadcast News. In: ICSLP, Beijing, vol. II, pp. 660–663 (2000)
Google Scholar
Bertoldi, N., Brugnara, F., Cettolo, M., Federico, M., Giuliani, D.: Cross-task Portability of a Broadcast News Speech Recognition System. Speech Communication 38(3-4), 335–347 (2002)
Article MATH Google Scholar
Lefevre, F., Gauvain, J.-L., Lamel, L.: Towards Task-Independent Speech Recognition. In: ICASSP, Salt Lake City, pp. 521–524 (2001)
Google Scholar
Gauvain, J.-L., Lamel, L., Adda, G.: The Limsi Broadcast News Transcription System. Speech Communication 37(1-2), 89–108 (2002)
Article MATH Google Scholar
Gauvain, J.-L., Lamel, L., Adda, G.: Partitioning and Transcription of Broadcast News Data. In: ICSLP, Sydney, pp. 1335–1338 (1998)
Google Scholar
Schwenk, H., Gauvain, J.-L.: Training Neural Network Language Models On Very Large Corpora. In: HLT/EMNLP, Vancouver, pp. 201–208 (2005)
Google Scholar
Schwenk, H.: Continuous Space Language Models. Computer, Speech & Language 21, 492–518 (2007)
Article Google Scholar
Hermansky, H.: Perceptual Linear Prediction (PLP) Analysis for Speech. Journal of the Acoustical Society of America 87, 1738–1752 (1990)
Article Google Scholar
Fousek, P., Lamel, L., Gauvain, J.-L.: On the Use of MLP Features for Broadcast News Transcription. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 303–310. Springer, Heidelberg (2008)
Chapter Google Scholar
Grézl, F., Fousek, P.: Optimizing Bottle-Neck Features for LVCSR. In: ICASSP, Las Vegas, pp. 4729–4732 (2008)
Google Scholar
Zhu, Q., Stolcke, A., Chen, B.Y., Morgan, N.: Using MLP Features in SRI’s Conversational Speech Recognition System. In: INTERSPEECH, Lisbon, pp. 2141–2144 (2005)
Google Scholar
Lamel, L., Gauvain, J.-L., Adda, G.: Lightly Supervised and Unsupervised Acoustic Model Training. Computer Speech and Language 16, 115–129 (2002)
Article Google Scholar
Gauvain, J.-L., Lee, C.H.: Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)
Article Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models. Computer Speech & Language 9(2), 171–185 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Vocapia Research, 3 rue Jean Rostand, 91400, Orsay, France
Julien Despres, Lori Lamel, Jean-Luc Gauvain, Bianca Vieru, Cécile Woehrling & Viet Bac Le
CNRS-LIMSI, 91403, Orsay, France
Lori Lamel, Jean-Luc Gauvain & Ilya Oparin

Authors

Julien Despres
View author publications
You can also search for this author in PubMed Google Scholar
Lori Lamel
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Gauvain
View author publications
You can also search for this author in PubMed Google Scholar
Bianca Vieru
View author publications
You can also search for this author in PubMed Google Scholar
Cécile Woehrling
View author publications
You can also search for this author in PubMed Google Scholar
Viet Bac Le
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Oparin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fondazione Bruno Kessler, Via Sommarive 18, 38123, Povo, TN, Italy
Bernardo Magnini
University of Naples, Via Cinthia, 80126, Napoli, NA, Italy
Francesco Cutugno
Fondazione Ugo Bordoni, Viale del Policlinico, 161, Roma, Italy
Mauro Falcone
CELCT, Via alla Cascata, 38123, Povo, TN, Italy
Emanuele Pianta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Despres, J. et al. (2013). The Vocapia Research ASR Systems for Evalita 2011. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science(), vol 7689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35828-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-35828-9_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35827-2
Online ISBN: 978-3-642-35828-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics