Abstract:
A machine learning technique is used to match reconstructed tongue contours in 30 frame per second ultrasound images to speaker vocal tract parameters obtained from a syn...Show MoreMetadata
Abstract:
A machine learning technique is used to match reconstructed tongue contours in 30 frame per second ultrasound images to speaker vocal tract parameters obtained from a synchronized audio track. Speech synthesized using the learned parameters and noise as an activation function displays many of the time and frequency domain characteristics of the original audio, and, for isolated passages, is remarkably clear - although no articulators other than the tongue are included.
Date of Conference: 17-21 May 2004
Date Added to IEEE Xplore: 30 August 2004
Print ISBN:0-7803-8484-9
Print ISSN: 1520-6149