Skip to main content

Vowel Imitation Using Vocal Tract Model and Recurrent Neural Network

  • Conference paper
Book cover Neural Information Processing (ICONIP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4985))

Included in the following conference series:

Abstract

A vocal imitation system was developed using a computational model that supports the motor theory of speech perception. A critical problem in vocal imitation is how to generate speech sounds produced by adults, whose vocal tracts have physical properties (i.e., articulatory motions) differing from those of infants’ vocal tracts. To solve this problem, a model based on the motor theory of speech perception, was constructed. Applying this model enables the vocal imitation system to estimate articulatory motions for unexperienced speech sounds that have not actually been generated by the system. The system was implemented by using Recurrent Neural Network with Parametric Bias (RNNPB) and a physical vocal tract model, called Maeda model. Experimental results demonstrated that the system was sufficiently robust with respect to individual differences in speech sounds and could imitate unexperienced vowel sounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liberman, A.M., Cooper, F.S., et al.: A motor theory of speech perception. In: Proc. Speech Communication Seminar, Paper-D3, Stockholm (1962)

    Google Scholar 

  2. Tani, J., Ito, M.: Self-organization of behavioral primitives as multiple attractor dynamics: A robot experiment. IEEE Transactions on SMC Part A 33(4), 481–488 (2003)

    Google Scholar 

  3. Minematsu, N., Nishimura, T., Nishinari, K., Sakuraba, K.: Theorem of the invariant structure and its derivation of speech gestalt. In: Proc. Int. Workshop on Speech Recognition and Intrinsic Variations, pp. 47–52 (2006)

    Google Scholar 

  4. Fadiga, L., Craighero, L., Buccino, G., Rizzolatti, G.: Speech listening specifically modulates the excitability of tongue muscles: a TMS study. European Journal of Cognitive Neuroscience 15, 399–402 (2002)

    Article  Google Scholar 

  5. Hickok, G., Buchsbaum, B., Humphries, C., Muftuler, T.: Auditory-motor interaction revealed by fmri. Area Spt. Journal of Cognitive Neuroscience 15(5), 673–682 (2003)

    Google Scholar 

  6. Yokoya, R., Ogata, T., Tani, J., Komatani, K., Okuno, H.G.: Experience based imitation using RNNPB. In: IEEE/RSJ IROS 2006 (2006)

    Google Scholar 

  7. Maeda, S.: Compensatory articulation during speech: Evidence from the analysis and synthesis of vocal tract shapes using an articulatory model. In: Speech production and speech modeling, pp. 131–149. Kluwer Academic Publishers, Dordrecht (1990)

    Google Scholar 

  8. Kitawaki, N., Itakura, F., Saito, S.: Optimum coding of transmission parameters in parcor speech analysis synthesis system. Transactions of the Institute of Electronics and Communication Engineers of Japan (IEICE) J61-A(2), 119–126 (1978)

    Google Scholar 

  9. Kawahara, H.: Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 1303–1306 (1997)

    Google Scholar 

  10. Jordan, M.: Attractor dynamics and parallelism in a connectionist sequential machine. In: Eighth Annual Conference of the Cognitive Science Society, Erlbaum, Hillsdale, NJ, pp. 513–546 (1986)

    Google Scholar 

  11. Rumelhart, D., Hinton, G., Williams, R.: Learning internal representation by error propagation. MIT Press, Cambridge (1986)

    Google Scholar 

  12. Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America 55, 1304–1312 (1972)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Masumi Ishikawa Kenji Doya Hiroyuki Miyamoto Takeshi Yamakawa

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kanda, H., Ogata, T., Komatani, K., Okuno, H.G. (2008). Vowel Imitation Using Vocal Tract Model and Recurrent Neural Network. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds) Neural Information Processing. ICONIP 2007. Lecture Notes in Computer Science, vol 4985. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69162-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69162-4_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69159-4

  • Online ISBN: 978-3-540-69162-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics