Estimating articulatory movements from speech acoustic features is known as acoustic-to-articulatory inversion (AAI). Large amount of parallel data from speech and articulatory motion is required for training an AAI model in a subject dependent manner, referred to as subject dependent AAI (SD-AAI). Electromagnetic articulograph (EMA) is a promising technology to record such parallel data, but it is expensive, time consuming and tiring for a subject. In order to reduce the demand for parallel acoustic-articulatory data in the AAI task for a subject, we, in this work, propose a subject-adaptative AAI method (SA-AAI) from an existing AAI model which is trained using large amount of parallel data from a fixed set of subjects. Experiments are performed with 30 subjects’ acoustic-articulatory data and AAI is trained using BLSTM network to examine the amount of data needed from a new target subject for the SA-AAI to achieve an AAI performance equivalent to that of SD-AAI. Experimental results reveal that the proposed SA-AAI performs similar to that of the SD-AAI with ∼62.5% less training data. Among different articulators, the SA-AAI performance for tongue articulators matches with the corresponding SD-AAI performance with only ∼12.5% of the data used for SD-AAI training.
Cite as: Illa, A., Ghosh, P.K. (2018) Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory. Proc. Interspeech 2018, 3122-3126, doi: 10.21437/Interspeech.2018-1843
@inproceedings{illa18_interspeech, author={Aravind Illa and Prasanta Kumar Ghosh}, title={{Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory}}, year=2018, booktitle={Proc. Interspeech 2018}, pages={3122--3126}, doi={10.21437/Interspeech.2018-1843} }