A novel neural-based model for acoustic-articulatory inversion mapping

Behbood, Hossein; Seyyedsalehi, Seyyed Ali; Tohidypour, Hamid Reza; Najafi, Mojtaba; Gharibzadeh, Shahriar

doi:10.1007/s00521-011-0563-0

A novel neural-based model for acoustic-articulatory inversion mapping

Original Article
Published: 15 March 2011

Volume 21, pages 935–943, (2012)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Hossein Behbood¹,
Seyyed Ali Seyyedsalehi¹,
Hamid Reza Tohidypour¹,
Mojtaba Najafi² &
…
Shahriar Gharibzadeh¹

251 Accesses
Explore all metrics

Abstract

In this paper, a new bidirectional neural network for better acoustic-articulatory inversion mapping is proposed. The model is motivated by the parallel structure of human brain, processing information by having forward--reverse connections. In other words, there would be a feedback from articulatory system to the acoustic signals emitted from that organ. Inspired by this mechanism, a new bidirectional model is developed to map speech representations to articulatory features. Formation of attractor dynamics in such bidirectional model is first carried out by training the reference speaker subspace as the continuous attractor. Then, it is used to recognize the other speaker’s speech. In fact, the structure and training of this bidirectional model is designed in such a way that the network learns to denoise the signal step by step, using properties of attractors it has formed. In this work, the efficiency of a nonlinear feedforward network is compared to the same one with a bidirectional connection. The bidirectional model increases the accuracy up to approximately 3% (from 62.09 to 64.91%) in the phone recognition process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Persian speech recognition using deep learning

Article 06 November 2020

Improvement of Phone Recognition Accuracy Using Articulatory Features

Article 08 May 2017

Amazigh Speech Recognition via Parallel CNN Transformer-Encoder Model

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Wrench A, Richmond K (2000) Continuous speech recognition using articulatory data. In: Proceedings of the ICSLP, Beijing, China, pp 145–148
Frankel J, Richmond K, Simon K, Taylor P (2000) An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory tracks. In: Proceedings of ICSLP, vol 4, pp 254–257
Deng L, Erler K (1992) Structural design of hidden Markov model speech recognizer using multivalued phonetic features: comparison with segmental speech units. J Acoust Soc Am 92(6):3058–3067
Article Google Scholar
Richmond K (2006) A trajectory mixture density network for the acoustic-articulatory inversion mapping. In: Proceedings of interspeech, Pittsburgh, USA
Richmond K (2007) Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion. In: Proceedings of NOLISP, Paris, France, pp 263–272
Richmond K (2007) Multitask learning perspective on acoustic-articulatory inversion. In: Proceedings of interspeech, Antwerp, Belgium, pp 2465–2468
Richmond K, King S, Taylor P (2003) Modeling the uncertainty in recovering articulation from acoustics. Comput Speech Lang 17:153–172
Article Google Scholar
Toda T, Black A, Tokuda K (2004) Acoustic-to-articulatory inversion mapping with Gaussian mixture model. In: Proceedings of 8th international conference on spoken language processing, Jeju, Korea, pp 1129–1132
Hogden J, Lofqvist A, Gracco V, Zlokarnik I, Rubin P, Saltzman E (1996) Accurate recovery of articulator positions from acoustics: new conclusions based on human data. J Acoust Soc Am 100(3):1819–1834
Article Google Scholar
Zhang L, Renals S (2008) Acoustic-articulatory modeling with the trajectory HMM. IEEE Signal Process Lett 15:245–248
Article Google Scholar
Kello CT, Plaut DC (2004) A neural network model of the articulatory-acoustic forward mapping trained on recordings of articulatory parameters. J Acoust Soc Am 116(4):2354–2364
Article Google Scholar
Yazdchi MR, Seyyedsalehi SA, Zafarani R (2007) A new bidirectional neural network for lexical modeling and speech recognition improvement. Scientica Iranica 6:571–578
Google Scholar
Heilman KM, Voeller K, Alexander AW (1996) Dyslexia: a motor-articulatory feedback hypothesis. Ann Neurol 39(3):407–412
Article Google Scholar
Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37(3):324–329
Article Google Scholar
Wrench A (2000) A multi-channel/multi-speaker articulatory database for continuous speech recognition research. Phonus 5:1–13
Google Scholar
Nejadgholi I, Seyyedsalehi SA (2007) Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks. Neural Comput Appl 18(1):45–55
Google Scholar
Dehyadegary L (2005) Noisy and distorted speech enhancement using neural networks. M.S. Thesis, Department of Biomedical Engineering, Amirkabir University of Technology (in Persian)
Seyyedsalehi SA, Nejatgholi I, Tohidkhah F (2004) Feed forward neural networks recognition performance improvement using bidirectional processing. In the research project final report, Biomedical Engineering Faculty, Amirkabir University of Technology (in Persian)
Castles A, Coltheart M (1993) Varieties of developmental dyslexia. Cognition 47(2):149–180
Article Google Scholar
Rapcsak SZ, Beeson PM, Henry ML, Leyden A, Kim E, Rising K, Andersen S, Cho H (2008) Phonological dyslexia and dysgraphia cognitive mechanisms and neural substrates. Cortex 45(5):575–591
Article Google Scholar
Behbood H, Fallahnezhad M, Seyedsalehi SA, Gharibzadeh S (2010) Improving phonological dyslexia using electrical stimulation in articulatory system. J Neuropsychiatr Clin Neurosci 22(3):352
Article Google Scholar
Behbood H, Seyyedsalehi SA, Tohidypour HR (2010) A new bidirectional neural network model for the acoustic-articulatory inversion mapping. Speech Prosody 2010, Chicago, USA
Behbood H, Seyyedsalehi SA, Tohidypour HR (2010) A novel feature extraction for neural—based modes in acoustic-articulatory inversion mapping. Speech Prosody 2010, Chicago, USA
Gillick L, Cox S, (1989) Some statistical issues in the comparison of speech recognition algorithms. ICASSP 1989, vol 1. Glasgow, UK, pp 532–535

Download references

Acknowledgments

We would like to thank Alan wrench, Mark Hasegawa-Johnson, and Sarah Borys for the useful discussions and other help along the way.

Author information

Authors and Affiliations

Department of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran
Hossein Behbood, Seyyed Ali Seyyedsalehi, Hamid Reza Tohidypour & Shahriar Gharibzadeh
Azad University, South Tehran Branch, Tehran, Iran
Mojtaba Najafi

Authors

Hossein Behbood
View author publications
You can also search for this author inPubMed Google Scholar
Seyyed Ali Seyyedsalehi
View author publications
You can also search for this author inPubMed Google Scholar
Hamid Reza Tohidypour
View author publications
You can also search for this author inPubMed Google Scholar
Mojtaba Najafi
View author publications
You can also search for this author inPubMed Google Scholar
Shahriar Gharibzadeh
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hossein Behbood.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Behbood, H., Seyyedsalehi, S.A., Tohidypour, H.R. et al. A novel neural-based model for acoustic-articulatory inversion mapping. Neural Comput & Applic 21, 935–943 (2012). https://doi.org/10.1007/s00521-011-0563-0

Download citation

Received: 15 June 2010
Accepted: 07 February 2011
Published: 15 March 2011
Issue Date: July 2012
DOI: https://doi.org/10.1007/s00521-011-0563-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel neural-based model for acoustic-articulatory inversion mapping

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Persian speech recognition using deep learning

Improvement of Phone Recognition Accuracy Using Articulatory Features

Amazigh Speech Recognition via Parallel CNN Transformer-Encoder Model

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now