Skip to main content
Log in

A novel neural-based model for acoustic-articulatory inversion mapping

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this paper, a new bidirectional neural network for better acoustic-articulatory inversion mapping is proposed. The model is motivated by the parallel structure of human brain, processing information by having forward--reverse connections. In other words, there would be a feedback from articulatory system to the acoustic signals emitted from that organ. Inspired by this mechanism, a new bidirectional model is developed to map speech representations to articulatory features. Formation of attractor dynamics in such bidirectional model is first carried out by training the reference speaker subspace as the continuous attractor. Then, it is used to recognize the other speaker’s speech. In fact, the structure and training of this bidirectional model is designed in such a way that the network learns to denoise the signal step by step, using properties of attractors it has formed. In this work, the efficiency of a nonlinear feedforward network is compared to the same one with a bidirectional connection. The bidirectional model increases the accuracy up to approximately 3% (from 62.09 to 64.91%) in the phone recognition process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Wrench A, Richmond K (2000) Continuous speech recognition using articulatory data. In: Proceedings of the ICSLP, Beijing, China, pp 145–148

  2. Frankel J, Richmond K, Simon K, Taylor P (2000) An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory tracks. In: Proceedings of ICSLP, vol 4, pp 254–257

  3. Deng L, Erler K (1992) Structural design of hidden Markov model speech recognizer using multivalued phonetic features: comparison with segmental speech units. J Acoust Soc Am 92(6):3058–3067

    Article  Google Scholar 

  4. Richmond K (2006) A trajectory mixture density network for the acoustic-articulatory inversion mapping. In: Proceedings of interspeech, Pittsburgh, USA

  5. Richmond K (2007) Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion. In: Proceedings of NOLISP, Paris, France, pp 263–272

  6. Richmond K (2007) Multitask learning perspective on acoustic-articulatory inversion. In: Proceedings of interspeech, Antwerp, Belgium, pp 2465–2468

  7. Richmond K, King S, Taylor P (2003) Modeling the uncertainty in recovering articulation from acoustics. Comput Speech Lang 17:153–172

    Article  Google Scholar 

  8. Toda T, Black A, Tokuda K (2004) Acoustic-to-articulatory inversion mapping with Gaussian mixture model. In: Proceedings of 8th international conference on spoken language processing, Jeju, Korea, pp 1129–1132

  9. Hogden J, Lofqvist A, Gracco V, Zlokarnik I, Rubin P, Saltzman E (1996) Accurate recovery of articulator positions from acoustics: new conclusions based on human data. J Acoust Soc Am 100(3):1819–1834

    Article  Google Scholar 

  10. Zhang L, Renals S (2008) Acoustic-articulatory modeling with the trajectory HMM. IEEE Signal Process Lett 15:245–248

    Article  Google Scholar 

  11. Kello CT, Plaut DC (2004) A neural network model of the articulatory-acoustic forward mapping trained on recordings of articulatory parameters. J Acoust Soc Am 116(4):2354–2364

    Article  Google Scholar 

  12. Yazdchi MR, Seyyedsalehi SA, Zafarani R (2007) A new bidirectional neural network for lexical modeling and speech recognition improvement. Scientica Iranica 6:571–578

    Google Scholar 

  13. Heilman KM, Voeller K, Alexander AW (1996) Dyslexia: a motor-articulatory feedback hypothesis. Ann Neurol 39(3):407–412

    Article  Google Scholar 

  14. Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37(3):324–329

    Article  Google Scholar 

  15. Wrench A (2000) A multi-channel/multi-speaker articulatory database for continuous speech recognition research. Phonus 5:1–13

    Google Scholar 

  16. Nejadgholi I, Seyyedsalehi SA (2007) Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks. Neural Comput Appl 18(1):45–55

    Google Scholar 

  17. Dehyadegary L (2005) Noisy and distorted speech enhancement using neural networks. M.S. Thesis, Department of Biomedical Engineering, Amirkabir University of Technology (in Persian)

  18. Seyyedsalehi SA, Nejatgholi I, Tohidkhah F (2004) Feed forward neural networks recognition performance improvement using bidirectional processing. In the research project final report, Biomedical Engineering Faculty, Amirkabir University of Technology (in Persian)

  19. Castles A, Coltheart M (1993) Varieties of developmental dyslexia. Cognition 47(2):149–180

    Article  Google Scholar 

  20. Rapcsak SZ, Beeson PM, Henry ML, Leyden A, Kim E, Rising K, Andersen S, Cho H (2008) Phonological dyslexia and dysgraphia cognitive mechanisms and neural substrates. Cortex 45(5):575–591

    Article  Google Scholar 

  21. Behbood H, Fallahnezhad M, Seyedsalehi SA, Gharibzadeh S (2010) Improving phonological dyslexia using electrical stimulation in articulatory system. J Neuropsychiatr Clin Neurosci 22(3):352

    Article  Google Scholar 

  22. Behbood H, Seyyedsalehi SA, Tohidypour HR (2010) A new bidirectional neural network model for the acoustic-articulatory inversion mapping. Speech Prosody 2010, Chicago, USA

  23. Behbood H, Seyyedsalehi SA, Tohidypour HR (2010) A novel feature extraction for neural—based modes in acoustic-articulatory inversion mapping. Speech Prosody 2010, Chicago, USA

  24. Gillick L, Cox S, (1989) Some statistical issues in the comparison of speech recognition algorithms. ICASSP 1989, vol 1. Glasgow, UK, pp 532–535

Download references

Acknowledgments

We would like to thank Alan wrench, Mark Hasegawa-Johnson, and Sarah Borys for the useful discussions and other help along the way.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hossein Behbood.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Behbood, H., Seyyedsalehi, S.A., Tohidypour, H.R. et al. A novel neural-based model for acoustic-articulatory inversion mapping. Neural Comput & Applic 21, 935–943 (2012). https://doi.org/10.1007/s00521-011-0563-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-011-0563-0

Keywords

Navigation