Vowel Imitation Using Vocal Tract Model and Recurrent Neural Network

Kanda, Hisashi; Ogata, Tetsuya; Komatani, Kazunori; Okuno, Hiroshi G.

doi:10.1007/978-3-540-69162-4_24

Hisashi Kanda¹,
Tetsuya Ogata¹,
Kazunori Komatani¹ &
…
Hiroshi G. Okuno¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4985))

Included in the following conference series:

International Conference on Neural Information Processing

1527 Accesses
1 Citations

Abstract

A vocal imitation system was developed using a computational model that supports the motor theory of speech perception. A critical problem in vocal imitation is how to generate speech sounds produced by adults, whose vocal tracts have physical properties (i.e., articulatory motions) differing from those of infants’ vocal tracts. To solve this problem, a model based on the motor theory of speech perception, was constructed. Applying this model enables the vocal imitation system to estimate articulatory motions for unexperienced speech sounds that have not actually been generated by the system. The system was implemented by using Recurrent Neural Network with Parametric Bias (RNNPB) and a physical vocal tract model, called Maeda model. Experimental results demonstrated that the system was sufficiently robust with respect to individual differences in speech sounds and could imitate unexperienced vowel sounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liberman, A.M., Cooper, F.S., et al.: A motor theory of speech perception. In: Proc. Speech Communication Seminar, Paper-D3, Stockholm (1962)
Google Scholar
Tani, J., Ito, M.: Self-organization of behavioral primitives as multiple attractor dynamics: A robot experiment. IEEE Transactions on SMC Part A 33(4), 481–488 (2003)
Google Scholar
Minematsu, N., Nishimura, T., Nishinari, K., Sakuraba, K.: Theorem of the invariant structure and its derivation of speech gestalt. In: Proc. Int. Workshop on Speech Recognition and Intrinsic Variations, pp. 47–52 (2006)
Google Scholar
Fadiga, L., Craighero, L., Buccino, G., Rizzolatti, G.: Speech listening specifically modulates the excitability of tongue muscles: a TMS study. European Journal of Cognitive Neuroscience 15, 399–402 (2002)
Article Google Scholar
Hickok, G., Buchsbaum, B., Humphries, C., Muftuler, T.: Auditory-motor interaction revealed by fmri. Area Spt. Journal of Cognitive Neuroscience 15(5), 673–682 (2003)
Google Scholar
Yokoya, R., Ogata, T., Tani, J., Komatani, K., Okuno, H.G.: Experience based imitation using RNNPB. In: IEEE/RSJ IROS 2006 (2006)
Google Scholar
Maeda, S.: Compensatory articulation during speech: Evidence from the analysis and synthesis of vocal tract shapes using an articulatory model. In: Speech production and speech modeling, pp. 131–149. Kluwer Academic Publishers, Dordrecht (1990)
Google Scholar
Kitawaki, N., Itakura, F., Saito, S.: Optimum coding of transmission parameters in parcor speech analysis synthesis system. Transactions of the Institute of Electronics and Communication Engineers of Japan (IEICE) J61-A(2), 119–126 (1978)
Google Scholar
Kawahara, H.: Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 1303–1306 (1997)
Google Scholar
Jordan, M.: Attractor dynamics and parallelism in a connectionist sequential machine. In: Eighth Annual Conference of the Cognitive Science Society, Erlbaum, Hillsdale, NJ, pp. 513–546 (1986)
Google Scholar
Rumelhart, D., Hinton, G., Williams, R.: Learning internal representation by error propagation. MIT Press, Cambridge (1986)
Google Scholar
Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America 55, 1304–1312 (1972)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University, Engineering Building #10, Sakyo, Kyoto, 606-8501, Japan
Hisashi Kanda, Tetsuya Ogata, Kazunori Komatani & Hiroshi G. Okuno

Authors

Hisashi Kanda
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuya Ogata
View author publications
You can also search for this author in PubMed Google Scholar
Kazunori Komatani
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi G. Okuno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Masumi Ishikawa Kenji Doya Hiroyuki Miyamoto Takeshi Yamakawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kanda, H., Ogata, T., Komatani, K., Okuno, H.G. (2008). Vowel Imitation Using Vocal Tract Model and Recurrent Neural Network. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds) Neural Information Processing. ICONIP 2007. Lecture Notes in Computer Science, vol 4985. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69162-4_24

Download citation

DOI: https://doi.org/10.1007/978-3-540-69162-4_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69159-4
Online ISBN: 978-3-540-69162-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics