Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN

Potard, Blaise; Aylett, Matthew P.; Baude, David A.; Motlicek, Petr

doi:10.21437/Interspeech.2016-1188

Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN

Blaise Potard, Matthew P. Aylett, David A. Baude, Petr Motlicek

This paper presents a text to speech (TTS) extension to Kaldi — a liberally licensed open source speech recognition system. The system, Idlak Tangle, uses recent deep neural network (DNN) methods for modelling speech, the Idlak XML based text processing system as the front end, and a newly released open source mixed excitation MLSA vocoder included in Idlak. The system has none of the licensing restrictions of current freely available HMM style systems, such as the HTS toolkit. To date no alternative open source DNN systems are available. Tangle combines the Idlak front-end and vocoder, with two DNNs modelling respectively the units duration and acoustic parameters, providing a fully functional end-to-end TTS system.

Experimental results using the freely available SLT speaker from CMU ARCTIC, reveal that the speech output is rated in a MUSHRA test as significantly more natural than the output of HTS-demo, the only other free to download HMM system available with no commercially restricted or proprietary IP. The tools, audio database and recipe required to reproduce the results presented in these paper are fully available online.

doi: 10.21437/Interspeech.2016-1188

Cite as: Potard, B., Aylett, M.P., Baude, D.A., Motlicek, P. (2016) Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN. Proc. Interspeech 2016, 2293-2297, doi: 10.21437/Interspeech.2016-1188

@inproceedings{potard16_interspeech,
  author={Blaise Potard and Matthew P. Aylett and David A. Baude and Petr Motlicek},
  title={{Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={2293--2297},
  doi={10.21437/Interspeech.2016-1188}
}