On The Performance of EMA-Synchronized Speech and Stand-alone Speech in Speech Recognition and Acoustic-to-Articulatory Inversion
Pages 162 - 166
Abstract
Synchronized acoustic-articulatory data is the basis of various applications, such as exploring the fundamental mechanisms of speech production, acoustic to articulatory inversion (AAI), articulatory to acoustic mapping, etc. Most of the studies in these fields directly trained various models with EMA-synchronized speech, while the target inputs or outputs are stand-alone speech in real applications. However, the recording conditions of EMA-synchronized speech and stand-alone speech are different, which may make the EMA-synchronized speech different to the stand-alone speech and degrade the performance of downstream tasks. Hence, it is necessary to shed light on whether the EMA-synchronized speech and stand-alone speech signals are different, and if so, how this affects the performance of the models trained with synchronized acoustic-articulatory data. In this study, we explore differences between EMA-synchronized speech and stand-alone speech from the aspect of speech recognition, and its influence on the performance of AAI. The results indicate the performance of phone error rate increases from 7.8% for stand-alone speech to 37.4% for EMA-synchronized speech, and the RMSE increases from 0.71mm for EMA- synchronized speech to 3.07mm when the input is switched from EMA-synchronized speech to stand-alone speech for the AAI model trained with EMA-synchronized speech.
References
[1]
Meenakshi, N., Yarra, C. Yamini, B.K., 2014. Comparison of speech quality with and without sensors in electromagnetic articulograph AG501 recording. in Interspeech, 2014. 935-939.
[2]
Dromey, C., Hunter, E., and Nissena, S.L. 2018. Speech Adaptation to Kinematic Recording Sensors: Perceptual and Acoustic Findings. Journal of Speech, Language, and Hearing Research, 61, 593-603.
[3]
Wang, J., Liu, J., Li, X., 2023. Two-Stream Joint-Training for Speaker Independent Acoustic-to-Articulatory Inversion. in ICASSP2023. 1-5.
[4]
Wang, J., Liu, J., Zhao, L. 2022. Acoustic-to-articulatory inversion based on speech decomposition and auxiliary feature. in ICASSP2022. 4808-4812.
[5]
Shahrebabaki, A. S., Olfati, N., Imran, A. S., 2022. Acoustic-to-articulatory mapping with joint optimization of deep speech enhancement and articulatory inversion models. IEEE Trans. Acoust., Speech, and Language Processing, 30, 135-147.
[6]
Wu, P., Chen, L., Cho, C. J., 2023. Speaker-Independent Acoustic-to-Articulatory Speech Inversion, in ICASSP2023. 5060-5064.
[7]
Udupa, S., Roy, A., Singh, A., 2021. Estimating articulatory movements in speech production with transformer networks. in Interspeech2021. 1154-1158.
[8]
Illa, A. and Ghosh, P. K. 2020. The impact of speaking rate on acoustic-to-articulatory inversion. Computer Speech&Language, 59, 75-90.
[9]
Illa, A., Meenakshi, G. N., and Ghosh, P. K. 2017. A comparative study of acoustic-to-articulatory inversion for neutral and whispered speech. in ICASSP2017. 5075–5079.
[10]
Siriwardena, Y. M., Sivaraman, G., and Espy-Wilson, C. 2022. Acoustic-to-articulatory Speech Inversion with Multi-task Learning. in Interspeech2022. 5020-5024.
[11]
Sun, G., Huang, Z., Wang, L., 2021. Temporal convolution network based joint optimization of acoustic-to-articulatory inversion. Applied Sciences, 11, 9056.
[12]
Bozorg, N. and Johnson, M. T. 2020. Acoustic-to-Articulatory Inversion with Deep Autoregressive Articulatory-WaveNet. in Interspeech2020. 3725-3729.
[13]
Seneviratne, N., Sivaraman, G., and Espy-Wilson, C. 2019. Multi-corpus Acoustic-to-articulatory Speech Inversion. in Interspeech2019. 859-863.
[14]
Liu, Z., Ling, Z. and Dai, L. 2018. Articulatory-to-Acoustic Conversion Using BLSTM-RNNs with Augmented Input Representation. Speech Communication, 99, 161-172.
[15]
Illa, A., Nair, A. and Ghosh, P. K. 2022. The impact of cross language on acoustic-to-articulatory inversion and its influence on articulatory speech synthesis. in ICASSP2022. 8267-8271.
[16]
Toda, T., Black, A. W. and Tokuda, K. 2008. Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication, 50, 215–227.
[17]
Aryal, S. and Gutierrez-Osuna, R. 2016. Data driven articulatory synthesis with deep neural networks. Computer Speech and Language, 36, 260-273.
[18]
Ling, Z., Richmond, K., Yamagishi, J., 2009. Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis. IEEE Transactions on Audio, Speech and Language Processing, 17, 6, 1171-1185
[19]
King, S., Frankel, J., Livescu, K., 2007. Speech production knowledge in automatic speech recognition. J. Acoust. Soc. Am., 121, 2, 723-742.
[20]
Yu, J., Markov, K. and Matsui, T. 2019. Articulatory and Spectrum Information Fusion Based on Deep Recurrent Neural Networks. IEEE Transactions on Audio, Speech and Language Processing, 27, 4, 742-752.
[21]
Li, M., Kim, J., Lammert, A., 2016. Speaker verification based on the fusion of speech acoustics and inverted articulatory signals. Computer Speech and Language, 36, 196-211.
[22]
Guo, P., Boyer, F., Chang, X., 2021. Recent Developments on ESPNET Tookit Boosted by Conformer. in ICASSP2021. 5874-5878.
[23]
Gulati, A., Qin, J., Chiu, C., 2020. Conformer: Convolution-augmented Transformer for Speech Recognition. in Interspeech2020. 5036-5040.
Index Terms
- On The Performance of EMA-Synchronized Speech and Stand-alone Speech in Speech Recognition and Acoustic-to-Articulatory Inversion
Comments
Information & Contributors
Information
Published In

May 2024
439 pages
ISBN:9798400709562
DOI:10.1145/3665348
Copyright © 2024 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 03 July 2024
Check for updates
Qualifiers
- Research-article
- Research
- Refereed limited
Conference
GAIIS 2024
GAIIS 2024: 2024 International Conference on Generative Artificial Intelligence and Information Security
May 10 - 12, 2024
Kuala Lumpur, Malaysia
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 23Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)6
Reflects downloads up to 17 Feb 2025
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign inFull Access
View options
View or Download as a PDF file.
PDFeReader
View online with eReader.
eReaderHTML Format
View this article in HTML Format.
HTML Format