Skip to main content

TED-LIUM 3: Twice as Much Data and Corpus Repartition for Experiments on Speaker Adaptation

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Included in the following conference series:

Abstract

In this paper, we present TED-LIUM release 3 corpus (TED-LIUM 3 is available on https://lium.univ-lemans.fr/ted-lium3/) dedicated to speech recognition in English, which multiplies the available data to train acoustic models in comparison with TED-LIUM 2, by a factor of more than two. We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014. We demonstrate that, passing from 207 to 452 h of transcribed speech training data is really more useful for end-to-end ASR systems than for HMM-based state-of-the-art ones. This is the case even if the HMM-based ASR system still outperforms the end-to-end ASR system when the size of audio training data is 452 h, with a Word Error Rate (WER) of 6.7% and 13.7%, respectively. Finally, we propose two repartitions of the TED-LIUM release 3 corpus: the legacy repartition that is the same as that existing in release 2, and a new repartition, calibrated and designed to make experiments on speaker adaptation. Similar to the two first releases, TED-LIUM 3 corpus will be freely available for the research community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/steps/cleanup/segment_long_utterances.sh.

  2. 2.

    https://github.com/kaldi-asr/kaldi/tree/master/egs/tedlium/s5_r2.

  3. 3.

    https://github.com/danpovey/pocolm.

  4. 4.

    https://github.com/SeanNaren/deepspeech.pytorch.

  5. 5.

    This LM is similar to the “small” LM trained with the pocolm toolkit, which is used in the Kaldi tedlium s5_r2 recipe. The only difference is that we modified a training set by adding text data from TED-LIUM 3 and removing from it those data, that present in our test and development sets (from the adaptation corpus).

References

  1. Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016)

    Google Scholar 

  2. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)

    Google Scholar 

  3. Hannun, A.Y., Maas, A.L., Jurafsky, D., Ng, A.Y.: First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs. arXiv preprint arXiv:1408.2873 (2014)

  4. Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: Audio augmentation for speech recognition. In: INTERSPEECH (2015)

    Google Scholar 

  5. Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: INTERSPEECH (2015)

    Google Scholar 

  6. Peddinti, V., Wang, Y., Povey, D., Khudanpur, S.: Low latency acoustic modeling using temporal convolution and LSTMs. IEEE Signal Process. Lett. 25(3), 373–377 (2018)

    Article  Google Scholar 

  7. Povey, D., et al.: Semi-orthogonal low-rank matrix factorization for deep neural networks. In: INTERSPEECH (2018, submitted)

    Google Scholar 

  8. Povey, D., et al.: The Kaldi speech recognition toolkit. In: ASRU. IEEE Signal Processing Society, December 2011

    Google Scholar 

  9. Povey, D., et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: INTERSPEECH (2016)

    Google Scholar 

  10. Rousseau, A., Deléglise, P., Estève, Y.: TED-LIUM: an automatic speech recognition dedicated corpus. In: LREC, pp. 125–129 (2012)

    Google Scholar 

  11. Rousseau, A., Deléglise, P., Estève, Y.: Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. In: LREC, pp. 3935–3939 (2014)

    Google Scholar 

  12. Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: INTERSPEECH (2014)

    Google Scholar 

  13. Wang, C., Yogatama, D., Coates, A., Han, T., Hannun, A., Xiao, B.: Lookahead convolution layer for unidirectional recurrent neural networks. In: ICLR 2016 (2016)

    Google Scholar 

  14. Xu, H., et al.: A pruned RNNLM lattice-rescoring algorithm for automatic speech recognition. In: ICASSP (2017)

    Google Scholar 

  15. Xu, H., et al.: Neural network language modeling with letter-based features and importance sampling. In: ICASSP (2017)

    Google Scholar 

Download references

Acknowledgments

This work was partially funded by the French ANR Agency through the CHIST-ERA M2CR project, under the contract number ANR-15-CHR2-0006-01, and by the Google Digital News Innovation Fund through the news.bridge project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to François Hernandez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hernandez, F., Nguyen, V., Ghannay, S., Tomashenko, N., Estève, Y. (2018). TED-LIUM 3: Twice as Much Data and Corpus Repartition for Experiments on Speaker Adaptation. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99579-3_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99578-6

  • Online ISBN: 978-3-319-99579-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics