Abstract
In this paper, a new bidirectional neural network for better acoustic-articulatory inversion mapping is proposed. The model is motivated by the parallel structure of human brain, processing information by having forward--reverse connections. In other words, there would be a feedback from articulatory system to the acoustic signals emitted from that organ. Inspired by this mechanism, a new bidirectional model is developed to map speech representations to articulatory features. Formation of attractor dynamics in such bidirectional model is first carried out by training the reference speaker subspace as the continuous attractor. Then, it is used to recognize the other speaker’s speech. In fact, the structure and training of this bidirectional model is designed in such a way that the network learns to denoise the signal step by step, using properties of attractors it has formed. In this work, the efficiency of a nonlinear feedforward network is compared to the same one with a bidirectional connection. The bidirectional model increases the accuracy up to approximately 3% (from 62.09 to 64.91%) in the phone recognition process.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Wrench A, Richmond K (2000) Continuous speech recognition using articulatory data. In: Proceedings of the ICSLP, Beijing, China, pp 145–148
Frankel J, Richmond K, Simon K, Taylor P (2000) An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory tracks. In: Proceedings of ICSLP, vol 4, pp 254–257
Deng L, Erler K (1992) Structural design of hidden Markov model speech recognizer using multivalued phonetic features: comparison with segmental speech units. J Acoust Soc Am 92(6):3058–3067
Richmond K (2006) A trajectory mixture density network for the acoustic-articulatory inversion mapping. In: Proceedings of interspeech, Pittsburgh, USA
Richmond K (2007) Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion. In: Proceedings of NOLISP, Paris, France, pp 263–272
Richmond K (2007) Multitask learning perspective on acoustic-articulatory inversion. In: Proceedings of interspeech, Antwerp, Belgium, pp 2465–2468
Richmond K, King S, Taylor P (2003) Modeling the uncertainty in recovering articulation from acoustics. Comput Speech Lang 17:153–172
Toda T, Black A, Tokuda K (2004) Acoustic-to-articulatory inversion mapping with Gaussian mixture model. In: Proceedings of 8th international conference on spoken language processing, Jeju, Korea, pp 1129–1132
Hogden J, Lofqvist A, Gracco V, Zlokarnik I, Rubin P, Saltzman E (1996) Accurate recovery of articulator positions from acoustics: new conclusions based on human data. J Acoust Soc Am 100(3):1819–1834
Zhang L, Renals S (2008) Acoustic-articulatory modeling with the trajectory HMM. IEEE Signal Process Lett 15:245–248
Kello CT, Plaut DC (2004) A neural network model of the articulatory-acoustic forward mapping trained on recordings of articulatory parameters. J Acoust Soc Am 116(4):2354–2364
Yazdchi MR, Seyyedsalehi SA, Zafarani R (2007) A new bidirectional neural network for lexical modeling and speech recognition improvement. Scientica Iranica 6:571–578
Heilman KM, Voeller K, Alexander AW (1996) Dyslexia: a motor-articulatory feedback hypothesis. Ann Neurol 39(3):407–412
Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37(3):324–329
Wrench A (2000) A multi-channel/multi-speaker articulatory database for continuous speech recognition research. Phonus 5:1–13
Nejadgholi I, Seyyedsalehi SA (2007) Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks. Neural Comput Appl 18(1):45–55
Dehyadegary L (2005) Noisy and distorted speech enhancement using neural networks. M.S. Thesis, Department of Biomedical Engineering, Amirkabir University of Technology (in Persian)
Seyyedsalehi SA, Nejatgholi I, Tohidkhah F (2004) Feed forward neural networks recognition performance improvement using bidirectional processing. In the research project final report, Biomedical Engineering Faculty, Amirkabir University of Technology (in Persian)
Castles A, Coltheart M (1993) Varieties of developmental dyslexia. Cognition 47(2):149–180
Rapcsak SZ, Beeson PM, Henry ML, Leyden A, Kim E, Rising K, Andersen S, Cho H (2008) Phonological dyslexia and dysgraphia cognitive mechanisms and neural substrates. Cortex 45(5):575–591
Behbood H, Fallahnezhad M, Seyedsalehi SA, Gharibzadeh S (2010) Improving phonological dyslexia using electrical stimulation in articulatory system. J Neuropsychiatr Clin Neurosci 22(3):352
Behbood H, Seyyedsalehi SA, Tohidypour HR (2010) A new bidirectional neural network model for the acoustic-articulatory inversion mapping. Speech Prosody 2010, Chicago, USA
Behbood H, Seyyedsalehi SA, Tohidypour HR (2010) A novel feature extraction for neural—based modes in acoustic-articulatory inversion mapping. Speech Prosody 2010, Chicago, USA
Gillick L, Cox S, (1989) Some statistical issues in the comparison of speech recognition algorithms. ICASSP 1989, vol 1. Glasgow, UK, pp 532–535
Acknowledgments
We would like to thank Alan wrench, Mark Hasegawa-Johnson, and Sarah Borys for the useful discussions and other help along the way.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Behbood, H., Seyyedsalehi, S.A., Tohidypour, H.R. et al. A novel neural-based model for acoustic-articulatory inversion mapping. Neural Comput & Applic 21, 935–943 (2012). https://doi.org/10.1007/s00521-011-0563-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-011-0563-0