Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model

Tobing, Patrick Lumban; Toda, Tomoki; Kameoka, Hirokazu; Nakamura, Satoshi

doi:10.21437/Interspeech.2016-1196

Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model

Patrick Lumban Tobing, Tomoki Toda, Hirokazu Kameoka, Satoshi Nakamura

A maximum likelihood parameter trajectory estimation based on a Gaussian mixture model (GMM) has been successfully implemented for acoustic-to-articulatory inversion mapping. In the conventional method, GMM parameters are optimized by maximizing a likelihood function for joint static and dynamic features of acoustic-articulatory data, and then, the articulatory parameter trajectories are estimated for given the acoustic data by maximizing a likelihood function for only the static features, imposing a constraint between static and dynamic features to consider the inter-frame correlation. Due to the inconsistency of the training and mapping criterion, the trained GMM is not optimum for the mapping process. This inconsistency problem is addressed within a trajectory training framework, but it becomes more difficult to optimize some parameters, e.g., covariance matrices and mixture component sequences. In this paper, we propose an inversion mapping method based on a latent trajectory GMM (LT-GMM) as yet another way to overcome the inconsistency issue. The proposed method makes it possible to use a well-formulated algorithm, such as EM algorithm, to optimize the LT-GMM parameters, which is not feasible in the traditional trajectory training. Experimental results demonstrate that the proposed method yields higher accuracy in the inversion mapping compared to the conventional GMM-based method.

doi: 10.21437/Interspeech.2016-1196

Cite as: Tobing, P.L., Toda, T., Kameoka, H., Nakamura, S. (2016) Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model. Proc. Interspeech 2016, 953-957, doi: 10.21437/Interspeech.2016-1196

@inproceedings{tobing16_interspeech,
  author={Patrick Lumban Tobing and Tomoki Toda and Hirokazu Kameoka and Satoshi Nakamura},
  title={{Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={953--957},
  doi={10.21437/Interspeech.2016-1196}
}