article

Automatic corpus-based tone and break-index prediction using K-ToBI representation

Authors:
Jin-Seok Lee

KOSCOM, Seoul, South Korea

KOSCOM, Seoul, South Korea
View Profile

,
Byeongchang Kim

Uiduk University, Kyongju, South Korea

Uiduk University, Kyongju, South Korea
View Profile

,
Gary Geunbae Lee

Pohang University of Science & Technology, Pohang, South Korea

Pohang University of Science & Technology, Pohang, South Korea
View Profile

ACM Transactions on Asian Language Information Processing Volume 1 Issue 3pp 207–224https://doi.org/10.1145/772755.772757

Published:01 September 2002Publication History

ACM Transactions on Asian Language Information Processing

Abstract

In this article we present a prosody generation architecture based on K-ToBI (Korean Tone and Break Index) representation. ToBI is a multitier representation system based on linguistic knowledge that transcribes events in an utterance. The TTS (Text-To-Speech) system, which adopts ToBI as an intermediate representation, is known to exhibit higher flexibility, modularity, and domain/task portability compared to the direct prosody generation TTS systems. However, for practical-level performance, the cost of corpus preparation is very expensive because the ToBI labeled corpus is constructed manually by many prosody experts, and normally requires large amounts of data for statistical prosody modeling. Unlike previous ToBI-based systems, this article proposes a new method, which transcribes the K-ToBI labels in Korean speech completely automatically. We develop automatic corpus-based K-ToBI labeling tools and prediction methods based on several lexico-syntactic linguistic features for decision-tree induction. We demonstrate the performance of F0 generation from automatically predicted K-ToBI labels, and confirm that the performance is reasonably comparable to state-of-the-art direct prosody generation methods and previous ToBI-based methods.

References

BECKMAN, M. AND JUN, S. 1998. K-ToBI (KOREAN ToBI) labeling convention. In Proceedings of the Study of Korean Prosody. 1998.Google Scholar
BLACK, A. W. AND HUNT, A. 1996. Generating f0 contours from ToBI labels using linear regression. In Proceedings of the International Conference on Spoken Language Processing (ICSLP, 1996), 1385--1388.Google Scholar
BRILL, E. 1992. A simple rule-based part-of-speech tagger. In Proceedings of the Conference on Applied Natural Language Processing. 152--155. Google Scholar
CHA, J., LEE, G., AND LEE, J. 1998. Generalized unknown morpheme guessing for hybrid POS tagging of Korean. In Proceedings of the Sixth Workshop on Very Large Corpora. 85--93.Google Scholar
D'ALESSANDRO, C. AND MERTENS, P. 1995. Automatic pitch contour stylization using a model of tonal perception. Computer Speech and Language 5, 3 (1995), 257--288.Google Scholar
DUTOIT, T. 1997. An Introduction to Text-to-Speech Synthesis. Kluwer, Amsterdam, The Netherlands. Google Scholar
FUJISAKI, H. AND OHNO, S. 1995. Analysis and modeling of fundamental frequency contours of English utterances. In Proceedings of the Conference on EUROSPEECH'95. 985--988.Google Scholar
HUCKVALE, M. 1996. Speech Filing System, SFS release 3ed.Google Scholar
JUN, S. 2000. K-ToBI (KOREAN ToBI) labeling conventions (version 3.0, revised in January 2000). In Proceedings of The Phonetic Society of Korea Workshop, 2000. 105--140.Google Scholar
LEE, S. 2000. Tree-based modeling of prosody for Korean TTS system. Ph.D thesis, Korea Advanced Institute of Science and Technology.Google Scholar
LEE, Y., LEE, S., KIM, J., KO, H., KIM, Y., KIM, S., AND LEE, J. 1998. A computational algorithm for f0 contour generation in Korean developed with prosodically labeled databases using K-ToBI system. In Proceedings of the International Conference on Spoken language Processing (ICSLP, 1998). 1995--1998.Google Scholar
MITCHELL, T. M. 1997. Machine Learning. McGraw-Hill. Google Scholar
MOHLER, G. AND CONKIE, A. 1998. Parametric modeling of intonation using vector quantization. In Proceedings of the Third Speech Synthesis Workshop. 311--314.Google Scholar
QUINLAN, J. R. 1983. C4.5: Programs for Machine Learning. Morgan Kaufmann. Google Scholar
ROSS, K. 1995. Modeling of intonation for speech synthesis. Ph.D. dissertation, Boston University College of Engineering. Google Scholar
ROSS, K., AND OSTENDORF, M. 1999. A dynamical system model for generating fundamental frequency for speech synthesis. IEEE Trans. Speech Audio Process. 7, 3 (1999), 259--309.Google Scholar
SANDERS, E. AND TAYLOR, P. 1995. Using statistical models to predict phrase boundaries for speech synthesis. In Proceedings of the EUROSPEECH'95 Conference (Madrid, Spain), 1811--1814.Google Scholar
VAN SANTEN, J. P., SPROAT, R.W., OLIVE, J. P., AND HIRSCHBERg, J. 1997. Progress in Speech Synthesis. Springer Verlag. Google Scholar
SILVERMAN, K., BECKMAN, M., PITRELLI, J., OSTENDORF, M., WIGHTMAN, C., PRICE, P., PIERREHUMBERT, J., AND HIRSCHBERG, J. 1992. ToBI: A standard for labeling English prosody. In Proceedings of the nternational Conference on Spoken Language Processing (ICSLP, 1992), 867--870.Google Scholar
STONE, C. 1996. A Course in Probability and Statistics. Duxbury, Belmont, CA.Google Scholar
TAYLOR, P. 1995. The rise/fall/connection model of intonation. Speech Commun. 15 (1995). Google Scholar
TAYLOR, P. AND BLACK A. 1998. Assigning phrase breaks from part-of-speech sequences. Comput. Speech. Lang. 2, 2 (1998).Google Scholar

Index Terms

Automatic corpus-based tone and break-index prediction using K-ToBI representation

Recommendations

On the perception of "segmental intonation": F0 context effects on sibilant identification in German

In normal modally voiced utterances, voiceless fricatives like [s], [ź], [f], and [x] vary such that their aperiodic pitch impressions mirror the pitch level of the adjacent F0 contour. For instance, if the F0 contour creates a high or low pitch context,...
Read More
Prosody dependent speech recognition on radio news corpus of American English

Does prosody help word recognition? This paper proposes a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that reduces word error rates (WER) relative to a prosody-independent recognizer with comparable ...
Read More
Prosody modification for speech recognition in emotionally mismatched conditions

A degradation in the performance of automatic speech recognition systems (ASR) is observed in mismatched training and testing conditions. One of the reasons for this degradation is due to the presence of emotions in the speech. The main objective of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 1, Issue 3
September 2002
106 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/772755
Issue’s Table of Contents

Copyright © 2002 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2002
Published in talip Volume 1, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
K-ToBI
intonation
phrase break
pitch
prosodic phrase
prosody
text-to-speech system
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 653
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic corpus-based tone and break-index prediction using K-ToBI representation

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

On the perception of "segmental intonation": F0 context effects on sibilant identification in German

Prosody dependent speech recognition on radio news corpus of American English

Prosody modification for speech recognition in emotionally mismatched conditions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic corpus-based tone and break-index prediction using K-ToBI representation

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

On the perception of "segmental intonation": F0 context effects on sibilant identification in German

Prosody dependent speech recognition on radio news corpus of American English

Prosody modification for speech recognition in emotionally mismatched conditions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media