Skip to main content

Effectiveness of A Direct Speech Transform Method Using Inductive Learning from Laryngectomee Speech to Normal Speech

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2903))

Abstract

This paper proposes and evaluates a new direct speech transform method with waveforms from laryngectomee speech to normal speech. Almost all conventional speech recognition systems and other speech processing systems are not able to treat laryngectomee speech with satisfactory results. One of the major causes is difficulty preparing corpora. It is very hard to record a large amount of clear and intelligible utterance data because the acoustical quality depends strongly on the individual status of such people.

We focus on acoustic characteristics of speech waveform by laryngectomee people and transform them directly into normal speech. The proposed method is able to deal with esophageal and alaryngeal speech in the same algorithm. The method is realized by learning transform rules that have acoustic correspondences between laryngectomee and normal speech. Results of several fundamental experiments indicate a promising performance for real transform.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Müller, J., Stahl, H.: Speech understanding and speech transform by maximum a-posteriori semantic decoding. Proceedings of Artificial Intelligence in Engineering, 373–384 (1999)

    Google Scholar 

  2. Ding, W., Higuchi, N.: A voice conversion method based on complex RBF network. In: Proceedings of the 1997 autumn meeting of ASJ(Japanese), pp. 335–336 (1997)

    Google Scholar 

  3. Turk, O., Arslan, L.M.: Subband based Voice Conversation. In: Proceedings of ICSLP 2002, pp. 289–292 (2002)

    Google Scholar 

  4. Murakami, K., Hiroshige, M., Araki, K., Tochinai, K.: Evaluation of direct speech transform method using Inductive Learning for conversations in the travel domain. In: Proceedings of ACL 2002 Workshop on Speech-to-Speech Translation, pp. 45–52 (2002)

    Google Scholar 

  5. Murakami, K., Araki, K., Hiroshige, M., Tochinai, K.: Evaluation of the rule acquisition on a direct speech translation method with waveforms using Inductive Learning for nouns and noun phrases. In: Proceedings of Pacific Association for Computational Linguistics PACLING 2003, pp. 121–130 (2003)

    Google Scholar 

  6. Araki, K., Tochinai, K.: Effectiveness of natural language processing method using inductive learning. In: Proceedings of Artificial Intelligence and Soft Computing ASC 2001, pp. 295–300 (2001)

    Google Scholar 

  7. Matsui, K., Noguchi, E.: Enhancement of esophageal speech. In: Proceedings the 1996 autumn meeting of the ASJ(Japanese), pp. 423–424 (1996)

    Google Scholar 

  8. Lu, J., Doi, Y., Nakamura, S., Shikano, K.: Acoustical Characteristics of Vowels of Esophageal Speech. Technical report of IEICE, SP96-126, pp. 233–240 (1997)

    Google Scholar 

  9. Espy-Wilson, C.Y., Chari, V.R., Huang, C.B.: Enhancement of Alaryngeal Speech by Adaptive Filter. In: Proceedings of ICSLP 1996, pp. 764–767 (1996)

    Google Scholar 

  10. Lee, A., Kawahara, T., Shikano, K.: Julius – a Open Source Real-Time Large Vocabulary Recognition Engine. In: Proceedings of EUROSPEECH 2001, pp. 1691–1693 (2001)

    Google Scholar 

  11. Katoh, Y.: Acoustic characteristics of speech in voice disorders. In: The 2000 spring meeting of the ASJ(Japanese), pp. 309–310 (2002)

    Google Scholar 

  12. Callan, D., Kent, R.D., Roy, N., Tasko, S.M.: Self-organizing Map for the Classification of Normal and Disordered Female Voices. Journal of Speech, Language, and Hearing Research 43, 355–366 (1999)

    Google Scholar 

  13. Silverman, H.F., Morgan, D.P.: The application of dynamic programming to connected speech recognition. IEEE, ASSP Magazine, 6–25 (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Murakami, K., Araki, K., Hiroshige, M., Tochinai, K. (2003). Effectiveness of A Direct Speech Transform Method Using Inductive Learning from Laryngectomee Speech to Normal Speech. In: Gedeon, T.(.D., Fung, L.C.C. (eds) AI 2003: Advances in Artificial Intelligence. AI 2003. Lecture Notes in Computer Science(), vol 2903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24581-0_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24581-0_59

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20646-0

  • Online ISBN: 978-3-540-24581-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics