Abstract
In this paper, the contributions of dynamic articulatory information were evaluated by using an articulatory speech recognition system. The Electromagnetic Articulographic dataset is relatively small and hard to be recorded compared with popular speech corpora used for modern speech study. We used articulatory data to study the contribution of each observation channel of vocal tracts in speech recognition by DNN framework. We also analyzed the recognition results of each phoneme according to speech production rules. The contribution rate of each articulator can be considered as the crucial level of each phoneme in speech production. Furthermore, the results indicate that the contribution of each observation point is not relevant to a specific method. The tendency of a contribution of each sensor is identical to the rules of Japanese phonology. In this work, we also evaluated the compensation effect between different channels. We discovered that crucial points are hard to be compensated for compared with non-crucial points. The proposed method can help us identify the crucial points of each phoneme during speech. The results of this paper can contribute to the study of speech production and articulatory-based speech recognition.
Similar content being viewed by others
References
Akamatsu T (1997) Japanese phonetics: theory and practice. Lincom Europa, München ISBN 3-89586-095-6
Chen Q, Zhang WL, Tong N, Li B-C (2013) RBM-based phoneme recognition by deep neural network based on RBM. Journal of Information Engineering University 14(5):569–574
Dahl GE, Yu D, Deng L et al (2015) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42
Dang J, Honda K (2001) A physical articulatory model for simulating speech production process. Acoust Sci Technol 22:6
Dang J, Lizuka Y, Markov K, Nakamura S (2003) Improvement of speech recognition method using speech production mechanism. In: Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Spain, pp 731–734
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Hinton G, Deng L, Yu D et al (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29(6):82–97
Honda K (2008) Physiological processes of speech production. Springer, Berlin, Heidelberg
Honda K, Bao H, Lu W (2016) Articulatory idiosyncrasy inferred from relative size and mobility of the tongue. J Acoust Soc Am 139(4):2192–2192
Itō J, Armin MR (1995) Japanese phonology. In: Goldsmith J (ed) The handbook of phonological theory. Blackwell, Oxford, pp 817–838
Lu X, Dang J (2004) Speech recognition based on a combination of traditional speech features with articulatory information. In: The 18th International Congress on Acoustics (ICA2004), Kyoto, Japan, 4–9 April, pp 3499–3502
Lu X, Dang J (2005) Speech recognition based on a combination of acoustic features with articulatory information. Chin J Acoust 3:271–279
Okada H (1991) Japanese. J Int Phon Assoc 21(2):94–96. https://doi.org/10.1017/S002510030000445X
Povey D, Ghoshal A, Boulianne G et al (2011) The Kaldi speech recognition toolkit. Idiap, Martigny
Riney TJ, Takagi N, Ota K, Uchida Y (2007) The intermediate degree of VOT in Japanese initial voiceless stops. J Phon 35(3):439–443. https://doi.org/10.1016/j.wocn.2006.01.002
Tsuchida A (2001) Japanese vowel devoicing. J East Asian Linguis 10(3):225–245. https://doi.org/10.1023/A:1011221225072.
Zhang J, Wei J (2015) Vowel normalization by articulatory information. In: Signal and information processing association summit and conference asia-pacific signal and information processing association pp 217–221
Acknowledgements
This work was supported in part by grants from the National Natural Science Foundation of China (General Program No. 61471259, and Key Program No. 61233009) and in part by NSFC of Tianjin (No. 16JCZDJC35400).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wei, J., Ji, Y., Zhang, J. et al. Study of articulators’ contribution and compensation during speech by articulatory speech recognition. Multimed Tools Appl 77, 18849–18864 (2018). https://doi.org/10.1007/s11042-018-5667-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5667-4