Skip to main content
Log in

Study of articulators’ contribution and compensation during speech by articulatory speech recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, the contributions of dynamic articulatory information were evaluated by using an articulatory speech recognition system. The Electromagnetic Articulographic dataset is relatively small and hard to be recorded compared with popular speech corpora used for modern speech study. We used articulatory data to study the contribution of each observation channel of vocal tracts in speech recognition by DNN framework. We also analyzed the recognition results of each phoneme according to speech production rules. The contribution rate of each articulator can be considered as the crucial level of each phoneme in speech production. Furthermore, the results indicate that the contribution of each observation point is not relevant to a specific method. The tendency of a contribution of each sensor is identical to the rules of Japanese phonology. In this work, we also evaluated the compensation effect between different channels. We discovered that crucial points are hard to be compensated for compared with non-crucial points. The proposed method can help us identify the crucial points of each phoneme during speech. The results of this paper can contribute to the study of speech production and articulatory-based speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Akamatsu T (1997) Japanese phonetics: theory and practice. Lincom Europa, München ISBN 3-89586-095-6

    Google Scholar 

  2. Chen Q, Zhang WL, Tong N, Li B-C (2013) RBM-based phoneme recognition by deep neural network based on RBM. Journal of Information Engineering University 14(5):569–574

    Google Scholar 

  3. Dahl GE, Yu D, Deng L et al (2015) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42

    Article  Google Scholar 

  4. Dang J, Honda K (2001) A physical articulatory model for simulating speech production process. Acoust Sci Technol 22:6

    Article  Google Scholar 

  5. Dang J, Lizuka Y, Markov K, Nakamura S (2003) Improvement of speech recognition method using speech production mechanism. In: Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Spain, pp 731–734

  6. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  MATH  Google Scholar 

  7. Hinton G, Deng L, Yu D et al (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29(6):82–97

    Article  Google Scholar 

  8. Honda K (2008) Physiological processes of speech production. Springer, Berlin, Heidelberg

    Book  Google Scholar 

  9. Honda K, Bao H, Lu W (2016) Articulatory idiosyncrasy inferred from relative size and mobility of the tongue. J Acoust Soc Am 139(4):2192–2192

    Article  Google Scholar 

  10. Itō J, Armin MR (1995) Japanese phonology. In: Goldsmith J (ed) The handbook of phonological theory. Blackwell, Oxford, pp 817–838

    Google Scholar 

  11. Lu X, Dang J (2004) Speech recognition based on a combination of traditional speech features with articulatory information. In: The 18th International Congress on Acoustics (ICA2004), Kyoto, Japan, 4–9 April, pp 3499–3502

  12. Lu X, Dang J (2005) Speech recognition based on a combination of acoustic features with articulatory information. Chin J Acoust 3:271–279

    Google Scholar 

  13. Okada H (1991) Japanese. J Int Phon Assoc 21(2):94–96. https://doi.org/10.1017/S002510030000445X

    Article  Google Scholar 

  14. Povey D, Ghoshal A, Boulianne G et al (2011) The Kaldi speech recognition toolkit. Idiap, Martigny

    Google Scholar 

  15. Riney TJ, Takagi N, Ota K, Uchida Y (2007) The intermediate degree of VOT in Japanese initial voiceless stops. J Phon 35(3):439–443. https://doi.org/10.1016/j.wocn.2006.01.002

    Article  Google Scholar 

  16. Tsuchida A (2001) Japanese vowel devoicing. J East Asian Linguis 10(3):225–245. https://doi.org/10.1023/A:1011221225072.

    Article  Google Scholar 

  17. Zhang J, Wei J (2015) Vowel normalization by articulatory information. In: Signal and information processing association summit and conference asia-pacific signal and information processing association pp 217–221

Download references

Acknowledgements

This work was supported in part by grants from the National Natural Science Foundation of China (General Program No. 61471259, and Key Program No. 61233009) and in part by NSFC of Tianjin (No. 16JCZDJC35400).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenhuan Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, J., Ji, Y., Zhang, J. et al. Study of articulators’ contribution and compensation during speech by articulatory speech recognition. Multimed Tools Appl 77, 18849–18864 (2018). https://doi.org/10.1007/s11042-018-5667-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5667-4

Keywords

Navigation