Code-switching in Indic Speech Synthesisers

Thomas, Anju Leela; Prakash, Anusha; Baby, Arun; Murthy, Hema

doi:10.21437/Interspeech.2018-1178

Code-switching in Indic Speech Synthesisers

Anju Leela Thomas, Anusha Prakash, Arun Baby, Hema Murthy

Most Indians are inherently bilingual or multilingual owing to the diverse linguistic culture in India. As a result, code-switching is quite common in conversational speech. The objective of this work is to train good quality text-to-speech (TTS) synthesisers that can seamlessly handle code-switching. To achieve this, bilingual TTSes that are capable of handling phonotactic variations across languages are trained using combinations of monolingual data in a unified framework. In addition to segmenting Indic speech data using signal processing cues in tandem with hidden Markov model-deep neural network (HMM-DNN), we propose to segment Indian English data using the same approach after NIST syllabification. Then, bilingual HTS-STRAIGHT based systems are trained by randomizing the order of data so that the systematic interactions between the two languages are captured better. Experiments are conducted by considering three language pairs: Hindi+English, Tamil+English and Hindi+Tamil. The code-switched systems are evaluated on monolingual, code-mixed and code-switched texts. Degradation mean opinion score (DMOS) for monolingual sentences shows marginal degradation over that of an equivalent monolingual TTS system, while the DMOS for bilingual sentences is significantly better than that of the corresponding monolingual TTS systems.

doi: 10.21437/Interspeech.2018-1178

Cite as: Thomas, A.L., Prakash, A., Baby, A., Murthy, H. (2018) Code-switching in Indic Speech Synthesisers. Proc. Interspeech 2018, 1948-1952, doi: 10.21437/Interspeech.2018-1178

@inproceedings{thomas18_interspeech,
  author={Anju Leela Thomas and Anusha Prakash and Arun Baby and Hema Murthy},
  title={{Code-switching in Indic Speech Synthesisers}},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1948--1952},
  doi={10.21437/Interspeech.2018-1178}
}