We describe a new approach to converting written tokens to their spoken form, which can be shared by automatic speech recognition (ASR) and text-to-speech synthesis (TTS) systems. Both ASR and TTS need to map from the written to the spoken domain, and we present an approach that enables us to share verbalization grammars between the two systems while exploiting linguistic commonalities to provide simple default verbalizations. We also describe improvements to an induction system for number names grammars. Between these shared ASR/TTS verbalizers and the improved induction system for number names grammars, we achieve significant gains in development time and scalability across languages.
Cite as: Ritchie, S., Sproat, R., Gorman, K., Esch, D.v., Schallhart, C., Bampounis, N., Brard, B., Mortensen, J.F., Holt, M., Mahon, E. (2019) Unified Verbalization for Speech Recognition & Synthesis Across Languages. Proc. Interspeech 2019, 3530-3534, doi: 10.21437/Interspeech.2019-2807
@inproceedings{ritchie19_interspeech, author={Sandy Ritchie and Richard Sproat and Kyle Gorman and Daan van Esch and Christian Schallhart and Nikos Bampounis and Benoît Brard and Jonas Fromseier Mortensen and Millie Holt and Eoin Mahon}, title={{Unified Verbalization for Speech Recognition & Synthesis Across Languages}}, year=2019, booktitle={Proc. Interspeech 2019}, pages={3530--3534}, doi={10.21437/Interspeech.2019-2807} }