This paper presents a methodology for articulatory synthesis of running speech in American English driven by real-time magnetic resonance imaging (rtMRI) mid-sagittal vocal-tract data. At the core of the methodology is a time-domain simulation of the propagation of sound in the vocal tract developed previously by Maeda. The first step of the methodology is the automatic derivation of air-tissue boundaries from the rtMRI data. These articulatory outlines are then modified in a systematic way in order to introduce additional precision in the formation of consonantal vocal-tract constrictions. Other elements of the methodology include a previously reported set of empirical rules for setting the time-varying characteristics of the glottis and the velopharyngeal port, and a revised sagittal-to-area conversion. Results are promising towards the development of a full-fledged text-to-speech synthesis system leveraging directly observed vocal-tract dynamics.
Cite as: Toutios, A., Sorensen, T., Somandepalli, K., Alexander, R., Narayanan, S.S. (2016) Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data. Proc. Interspeech 2016, 1492-1496, doi: 10.21437/Interspeech.2016-596
@inproceedings{toutios16_interspeech, author={Asterios Toutios and Tanner Sorensen and Krishna Somandepalli and Rachel Alexander and Shrikanth S. Narayanan}, title={{Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={1492--1496}, doi={10.21437/Interspeech.2016-596} }