Skip to main content

The Vocapia Research ASR Systems for Evalita 2011

  • Conference paper
Book cover Evaluation of Natural Language and Speech Tools for Italian (EVALITA 2012)

Abstract

This document describes the automatic speech-to-text transcription used by Vocapia Research for the Evalita 2011 evaluation for the open unconstrained automatic speech recognition (ASR) task. The aim of this evaluation was to perform automatic speech recognition of parliament audio sessions in the Italian language. About 30h of untranscribed audio data and one year of minutes from parliament sessions were provided as training corpus. This corpus was used to carry out an unsupervised adaptation of Vocapia’s Italian broadcast speech transcription system. Transcriptions produced by two systems were submitted. The primary system has a single decoding pass and was optimized to run in real time. The contrastive system, developed in collaboration with Limsi-CNRS, has two decoding passes and runs in about 5×RT. The case-insensitive word error rates (WER) of these systems are respectively 10.2% and 9.3% on the Evalita development data and 6.4% and 5.4% on the evaluation data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kimball, O., Kao, C.L., Iyer, R., Arvizo, T., Makhoul, J.: Using Quick Transcriptions to Improve Conversational Speech Models. In: INTERSPEECH, Jeju Island, pp. 2265–2268 (2004)

    Google Scholar 

  2. Cieri, C., Miller, D., Walker, K.: The Fisher Corpus: a Resource for the Next Generations of Speech-To-Text. In: LREC, Lisbon, pp. 69–71 (2004)

    Google Scholar 

  3. Gollan, C., Bisani, M., Kanthak, S., Schluter, R., Ney, H.: Cross Domain Automatic Transcription on the TC-STAR EPPS Corpus. In: ICASSP, Philadelphia, pp. 825–828 (2005)

    Google Scholar 

  4. Bisani, M., Ney, H.: Joint-Sequence Models for Grapheme-to-Phoneme Conversion. Speech Communication 50(5), 434–451 (2008)

    Article  Google Scholar 

  5. Wessel, F., Ney, H.: Unsupervised Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition. IEEE Transactions on Speech and Audio Processing 13(1), 23–31 (2005)

    Article  Google Scholar 

  6. Ma, J., Schwartz, R.: Unsupervised Versus Supervised Training of Acoustic Models. In: INTERSPEECH, Brisbane, pp. 2374–2377 (2008)

    Google Scholar 

  7. Brugnara, F., Cettolo, M., Federico, M., Giuliani, D.: A Baseline for the Transcription of Italian Broadcast News. In: ICASSP, Istanbul, pp. 1667–1670 (2000)

    Google Scholar 

  8. Brugnara, F., Cettolo, M., Federico, M., Giuliani, D.: Advances in Automatic Transcription of Italian Broadcast News. In: ICSLP, Beijing, vol. II, pp. 660–663 (2000)

    Google Scholar 

  9. Bertoldi, N., Brugnara, F., Cettolo, M., Federico, M., Giuliani, D.: Cross-task Portability of a Broadcast News Speech Recognition System. Speech Communication 38(3-4), 335–347 (2002)

    Article  MATH  Google Scholar 

  10. Lefevre, F., Gauvain, J.-L., Lamel, L.: Towards Task-Independent Speech Recognition. In: ICASSP, Salt Lake City, pp. 521–524 (2001)

    Google Scholar 

  11. Gauvain, J.-L., Lamel, L., Adda, G.: The Limsi Broadcast News Transcription System. Speech Communication 37(1-2), 89–108 (2002)

    Article  MATH  Google Scholar 

  12. Gauvain, J.-L., Lamel, L., Adda, G.: Partitioning and Transcription of Broadcast News Data. In: ICSLP, Sydney, pp. 1335–1338 (1998)

    Google Scholar 

  13. Schwenk, H., Gauvain, J.-L.: Training Neural Network Language Models On Very Large Corpora. In: HLT/EMNLP, Vancouver, pp. 201–208 (2005)

    Google Scholar 

  14. Schwenk, H.: Continuous Space Language Models. Computer, Speech & Language 21, 492–518 (2007)

    Article  Google Scholar 

  15. Hermansky, H.: Perceptual Linear Prediction (PLP) Analysis for Speech. Journal of the Acoustical Society of America 87, 1738–1752 (1990)

    Article  Google Scholar 

  16. Fousek, P., Lamel, L., Gauvain, J.-L.: On the Use of MLP Features for Broadcast News Transcription. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 303–310. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  17. Grézl, F., Fousek, P.: Optimizing Bottle-Neck Features for LVCSR. In: ICASSP, Las Vegas, pp. 4729–4732 (2008)

    Google Scholar 

  18. Zhu, Q., Stolcke, A., Chen, B.Y., Morgan, N.: Using MLP Features in SRI’s Conversational Speech Recognition System. In: INTERSPEECH, Lisbon, pp. 2141–2144 (2005)

    Google Scholar 

  19. Lamel, L., Gauvain, J.-L., Adda, G.: Lightly Supervised and Unsupervised Acoustic Model Training. Computer Speech and Language 16, 115–129 (2002)

    Article  Google Scholar 

  20. Gauvain, J.-L., Lee, C.H.: Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)

    Article  Google Scholar 

  21. Leggetter, C.J., Woodland, P.C.: Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models. Computer Speech & Language 9(2), 171–185 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Despres, J. et al. (2013). The Vocapia Research ASR Systems for Evalita 2011. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science(), vol 7689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35828-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35828-9_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35827-2

  • Online ISBN: 978-3-642-35828-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics