Effectiveness of A Direct Speech Transform Method Using Inductive Learning from Laryngectomee Speech to Normal Speech

Murakami, Koji; Araki, Kenji; Hiroshige, Makoto; Tochinai, Koji

doi:10.1007/978-3-540-24581-0_59

Effectiveness of A Direct Speech Transform Method Using Inductive Learning from Laryngectomee Speech to Normal Speech

Koji Murakami⁸,
Kenji Araki⁸,
Makoto Hiroshige⁸ &
…
Koji Tochinai⁹

Conference paper

1528 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2903))

Abstract

This paper proposes and evaluates a new direct speech transform method with waveforms from laryngectomee speech to normal speech. Almost all conventional speech recognition systems and other speech processing systems are not able to treat laryngectomee speech with satisfactory results. One of the major causes is difficulty preparing corpora. It is very hard to record a large amount of clear and intelligible utterance data because the acoustical quality depends strongly on the individual status of such people.

We focus on acoustic characteristics of speech waveform by laryngectomee people and transform them directly into normal speech. The proposed method is able to deal with esophageal and alaryngeal speech in the same algorithm. The method is realized by learning transform rules that have acoustic correspondences between laryngectomee and normal speech. Results of several fundamental experiments indicate a promising performance for real transform.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Müller, J., Stahl, H.: Speech understanding and speech transform by maximum a-posteriori semantic decoding. Proceedings of Artificial Intelligence in Engineering, 373–384 (1999)
Google Scholar
Ding, W., Higuchi, N.: A voice conversion method based on complex RBF network. In: Proceedings of the 1997 autumn meeting of ASJ(Japanese), pp. 335–336 (1997)
Google Scholar
Turk, O., Arslan, L.M.: Subband based Voice Conversation. In: Proceedings of ICSLP 2002, pp. 289–292 (2002)
Google Scholar
Murakami, K., Hiroshige, M., Araki, K., Tochinai, K.: Evaluation of direct speech transform method using Inductive Learning for conversations in the travel domain. In: Proceedings of ACL 2002 Workshop on Speech-to-Speech Translation, pp. 45–52 (2002)
Google Scholar
Murakami, K., Araki, K., Hiroshige, M., Tochinai, K.: Evaluation of the rule acquisition on a direct speech translation method with waveforms using Inductive Learning for nouns and noun phrases. In: Proceedings of Pacific Association for Computational Linguistics PACLING 2003, pp. 121–130 (2003)
Google Scholar
Araki, K., Tochinai, K.: Effectiveness of natural language processing method using inductive learning. In: Proceedings of Artificial Intelligence and Soft Computing ASC 2001, pp. 295–300 (2001)
Google Scholar
Matsui, K., Noguchi, E.: Enhancement of esophageal speech. In: Proceedings the 1996 autumn meeting of the ASJ(Japanese), pp. 423–424 (1996)
Google Scholar
Lu, J., Doi, Y., Nakamura, S., Shikano, K.: Acoustical Characteristics of Vowels of Esophageal Speech. Technical report of IEICE, SP96-126, pp. 233–240 (1997)
Google Scholar
Espy-Wilson, C.Y., Chari, V.R., Huang, C.B.: Enhancement of Alaryngeal Speech by Adaptive Filter. In: Proceedings of ICSLP 1996, pp. 764–767 (1996)
Google Scholar
Lee, A., Kawahara, T., Shikano, K.: Julius – a Open Source Real-Time Large Vocabulary Recognition Engine. In: Proceedings of EUROSPEECH 2001, pp. 1691–1693 (2001)
Google Scholar
Katoh, Y.: Acoustic characteristics of speech in voice disorders. In: The 2000 spring meeting of the ASJ(Japanese), pp. 309–310 (2002)
Google Scholar
Callan, D., Kent, R.D., Roy, N., Tasko, S.M.: Self-organizing Map for the Classification of Normal and Disordered Female Voices. Journal of Speech, Language, and Hearing Research 43, 355–366 (1999)
Google Scholar
Silverman, H.F., Morgan, D.P.: The application of dynamic programming to connected speech recognition. IEEE, ASSP Magazine, 6–25 (1990)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate school of Engineering, Hokkaido University, North 13, West 8, Kita-ku, Sapporo, 060-8628, Japan
Koji Murakami, Kenji Araki & Makoto Hiroshige
Graduate school of Business Administration, Hokkai Gakuen University, Asahimachi 4-1-40, Toyohira-ku, Sapporo, 062-8605, Japan
Koji Tochinai

Authors

Koji Murakami
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Araki
View author publications
You can also search for this author in PubMed Google Scholar
Makoto Hiroshige
View author publications
You can also search for this author in PubMed Google Scholar
Koji Tochinai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Australian National University, ACT 0200, Acton, Australia
Tamás (Tom) Domonkos Gedeon
Murdoch University,
Lance Chun Che Fung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Murakami, K., Araki, K., Hiroshige, M., Tochinai, K. (2003). Effectiveness of A Direct Speech Transform Method Using Inductive Learning from Laryngectomee Speech to Normal Speech. In: Gedeon, T.(.D., Fung, L.C.C. (eds) AI 2003: Advances in Artificial Intelligence. AI 2003. Lecture Notes in Computer Science(), vol 2903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24581-0_59

Download citation

DOI: https://doi.org/10.1007/978-3-540-24581-0_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20646-0
Online ISBN: 978-3-540-24581-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics