Abstract
In this paper, we present TED-LIUM release 3 corpus (TED-LIUM 3 is available on https://lium.univ-lemans.fr/ted-lium3/) dedicated to speech recognition in English, which multiplies the available data to train acoustic models in comparison with TED-LIUM 2, by a factor of more than two. We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014. We demonstrate that, passing from 207 to 452 h of transcribed speech training data is really more useful for end-to-end ASR systems than for HMM-based state-of-the-art ones. This is the case even if the HMM-based ASR system still outperforms the end-to-end ASR system when the size of audio training data is 452 h, with a Word Error Rate (WER) of 6.7% and 13.7%, respectively. Finally, we propose two repartitions of the TED-LIUM release 3 corpus: the legacy repartition that is the same as that existing in release 2, and a new repartition, calibrated and designed to make experiments on speaker adaptation. Similar to the two first releases, TED-LIUM 3 corpus will be freely available for the research community.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
This LM is similar to the “small” LM trained with the pocolm toolkit, which is used in the Kaldi tedlium s5_r2 recipe. The only difference is that we modified a training set by adding text data from TED-LIUM 3 and removing from it those data, that present in our test and development sets (from the adaptation corpus).
References
Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)
Hannun, A.Y., Maas, A.L., Jurafsky, D., Ng, A.Y.: First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs. arXiv preprint arXiv:1408.2873 (2014)
Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: Audio augmentation for speech recognition. In: INTERSPEECH (2015)
Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: INTERSPEECH (2015)
Peddinti, V., Wang, Y., Povey, D., Khudanpur, S.: Low latency acoustic modeling using temporal convolution and LSTMs. IEEE Signal Process. Lett. 25(3), 373–377 (2018)
Povey, D., et al.: Semi-orthogonal low-rank matrix factorization for deep neural networks. In: INTERSPEECH (2018, submitted)
Povey, D., et al.: The Kaldi speech recognition toolkit. In: ASRU. IEEE Signal Processing Society, December 2011
Povey, D., et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: INTERSPEECH (2016)
Rousseau, A., Deléglise, P., Estève, Y.: TED-LIUM: an automatic speech recognition dedicated corpus. In: LREC, pp. 125–129 (2012)
Rousseau, A., Deléglise, P., Estève, Y.: Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. In: LREC, pp. 3935–3939 (2014)
Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: INTERSPEECH (2014)
Wang, C., Yogatama, D., Coates, A., Han, T., Hannun, A., Xiao, B.: Lookahead convolution layer for unidirectional recurrent neural networks. In: ICLR 2016 (2016)
Xu, H., et al.: A pruned RNNLM lattice-rescoring algorithm for automatic speech recognition. In: ICASSP (2017)
Xu, H., et al.: Neural network language modeling with letter-based features and importance sampling. In: ICASSP (2017)
Acknowledgments
This work was partially funded by the French ANR Agency through the CHIST-ERA M2CR project, under the contract number ANR-15-CHR2-0006-01, and by the Google Digital News Innovation Fund through the news.bridge project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Hernandez, F., Nguyen, V., Ghannay, S., Tomashenko, N., Estève, Y. (2018). TED-LIUM 3: Twice as Much Data and Corpus Repartition for Experiments on Speaker Adaptation. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-99579-3_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)