Speech recognition and synthesis technology development at NTT for telecommunications services

Hakoda, Kazuo; Kitai, Mikio; Sagayama, Shigeki

doi:10.1007/BF02208826

Speech recognition and synthesis technology development at NTT for telecommunications services

Published: December 1997

Volume 2, pages 145–153, (1997)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Kazuo Hakoda¹,
Mikio Kitai¹ &
Shigeki Sagayama¹

85 Accesses
2 Citations
Explore all metrics

Abstract

This paper describes recent developments at NTT in the areas of speech recognition, speech synthesis, and interactive voice systems as they relate to telecommunications applications. Speaker-independent largevocabulary speech recognition based on context-dependent phone models and LR parser, and high-quality text-to-speech (TTS) conversion using the waveform concatenation method, both realized as software, have enabled interactive voice systems for fast and easy prototyping of telephone-based applications. Practical applications are discussed with examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WebTransc — A WWW Interface for Speech Corpora Production and Processing

Recent Speech Coding Technologies and Standards

Speech Communication

References

Abe, M., Hakoda, K., and Tsukada, H. (1996). An information retrieval system from text database using text-to-speech.Proc. AVIOS'96, pp. 189–196.
Charpentier, F. and Moulines, E. (1989). Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones.Proc. Eurospeech'89, pp. 13–19.
Darrel, S. and Bernie, R. (1994). DECtalk software in a desktop environment.Proc. AVIOS'94, pp. 189–193.
Hakoda, K., Nakajima, S., Hirokawa, T., and Mizuno, H. (1990). A new Japanese text-to-speech synthesizer based on COC synthesized method.Proc. ICSLP'90, pp. 809–812.
Hakoda, K., Hirokawa, T., Tsukada, H., Yoshida, Y., and Mizuno, H. (1995). Japanese text-to-speech software based on waveform concatenation method.Proc. AVIOS'95, pp. 65–72.
Hirokawa, T., Itoh, K., and Sato, H. (1993). High quality speech synthesis system based on waveform concatenation of phoneme segment.IEICE Trans. Fundamentals, E76-A(11): 1964–1970.
Google Scholar
Ikehara, S., Murakami, K., Miyazaki, M., and Ohyama, Y. (1986). Construction of Japanese text-to-speech system.ECL Tech. J., 35(2): 145–155 (in Japanese).
Google Scholar
Imamura, A. and Suzuki, Y. (1990). Speaker-independent word spotting and a tranputer-based implementation.Proc. ICSLP'90, pp. 537–540.
Intoh, K. and Miki, S. (1988). Speaker independent isolated word recognition board and its application.American Voice I/O Systems Applications Conf., AVIOS'88.
Itakura, F. (1975). Line spectrum representation of linear prediction coefficients of speech signal.Trans. of the Committee on Speech Research, ASJ, S75-34.
Itakura, F. and Saito, S. (1969). Speech analysis-synthesis system based on the partial autocorrelation coefficient.Acoust. Soc. of Japan Meeting, pp. 199–200 (in Japanese).
Minami, Y, Shikano, K., Yamada, T., and Matsuoka, T. (1992). Very-large-vocabulary continuous speech recognition system for telephone directory assistance.Proc. IVTTA'92.
Momosaki, K., Hara, Y., Shiga, Y., Kaseno, O., Tamanaka, N., Nitta, T., and Kobayashi, K. (1994). A Japanese TTS software for personal computers.ASJ'94 Autumn Meeting.3-5-6, pp. 327–328 (in Japanese).
Google Scholar
Nakatsu, R. and Ishii, N. (1987). Voice response and recognition system for telephone information services.Proc. of SPEECH TECH'87, pp. 168–172.
Noda, Y and Sagayama, S. (1995). Fast and accurate beam search using forward heuristic functions in HMM-LR speech recognition.Proc. Eurospeech'95 (Madrid), WEamIA.5, pp. 913–916.
Sato, H., Sagisaka, Y, Kogure, K., and Sagayama, S. (1982). Investigation on Japanese text-to-speech conversion.Trans. of the Committee on Speech Research, S82-08 (in Japanese).
Takahashi, J. and Sagayama, S. (1994). Fast telephone channel adaptation based on vector field smoothing technique.Proc. IVTTA'94 Workshop, pp. 97–100.
Takahashi, J. and Sagayama, S. (1995). Vector-field-smoothed bayesian learning for incremental speaker adaptation.Proc. ICASSP95 (Detroit), pp. 696–699.
Takahashi, K., Iwata, K., Mitome, Y, and Nagano, K. (1994). Japanese text-to-speech conversion software for personal computers.Proc. ICSLPV4, pp. 1743–1746.
Takahashi, S. and Sagayama, S. (1995). Four-level tied structure for efficient representation of acoustic modeling.Proc. ICASSP'95 (Detroit), pp. 520–523.
Tomita, M. (1991).Generalized LR Parsing. Kluwer Academic Publishers.
Yamada, T. and Sagayama, S. (1994). An implementation of LR parser using context-dependent phone models.Proc. JASJ Conf., 3-8-8, pp. 123–124 (in Japanese).
Google Scholar
Yoshida, Y, Nakajima, S., Hakoda, K., and Hirokawa, T. (1996). A new method of generating speech synthesis units based on phonological knowledge and clustering technique.Proc. ICSLP'96, pp. 1712–1715.

Download references

Author information

Authors and Affiliations

NTT Human Interface Laboratories, Japan
Kazuo Hakoda, Mikio Kitai & Shigeki Sagayama

Authors

Kazuo Hakoda
View author publications
You can also search for this author in PubMed Google Scholar
Mikio Kitai
View author publications
You can also search for this author in PubMed Google Scholar
Shigeki Sagayama
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hakoda, K., Kitai, M. & Sagayama, S. Speech recognition and synthesis technology development at NTT for telecommunications services. Int J Speech Technol 2, 145–153 (1997). https://doi.org/10.1007/BF02208826

Download citation

Received: 15 August 1996
Accepted: 31 March 1997
Issue Date: December 1997
DOI: https://doi.org/10.1007/BF02208826

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech recognition and synthesis technology development at NTT for telecommunications services

Abstract

Access this article

Similar content being viewed by others

WebTransc — A WWW Interface for Speech Corpora Production and Processing

Recent Speech Coding Technologies and Standards

Speech Communication

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech recognition and synthesis technology development at NTT for telecommunications services

Abstract

Access this article

Similar content being viewed by others

WebTransc — A WWW Interface for Speech Corpora Production and Processing

Recent Speech Coding Technologies and Standards

Speech Communication

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation