skip to main content
article

Morpheme-based grapheme to phoneme conversion using phonetic patterns and morphophonemic connectivity information

Authors Info & Claims
Published:01 March 2002Publication History
Skip Abstract Section

Abstract

Both dictionary-based and rule-based methods on grapheme-to-phoneme conversion have their own advantages and limitations. For example, a large sized phonetic dictionary and complex morphophonemic rules are required for the dictionary-based method and the LTS (letter to sound) rule-based method itself cannot model the complete morphophonemic constraints.This paper describes a grapheme-to-phoneme conversion method for Korean using a dictionary-based and rule-based hybrid method with a phonetic pattern dictionary and CCV (consonant consonant vowel) LTS (letter to sound) rules. The phonetic pattern dictionary, standing for the dictionary-based method, contains entries in the form of a morpheme pattern and its phonetic pattern. The patterns represent candidate phonological changes in left and right boundaries of morphemes. Obviously, the CCV LTS rules stand for the rule-based method. The rules are in charge of grapheme-to-phoneme conversion within morphemes.The conversion method consists of mainly two steps including morpheme to phoneme conversion and morphophonemic connectivity check, and two preprocessing steps including phrase break prediction and morpheme normalization. Phrase break prediction presumes phrase breaks using the stochastic method on part-of-speech (POS) information. Morpheme normalization is to replace non-Korean symbols with their corresponding standard Korean graphemes. In the morpheme-phoneticizing module, each morpheme in the phrase is converted into phonetic patterns by looking it up in the phonetic pattern dictionary. Graphemes within a morpheme are grouped into CCV units and converted into phonemes by the CCV LTS rules. The morphophonemic connectivity table supports grammaticality checking of the two adjacent phonetic morphemes.In experiments with a non-Korean symbol free corpus of 4,973 sentences, we achieved a 99.98% grapheme-to-phoneme conversion performance rate and a 99.0% sentence conversion performance rate. With a broadcast news corpus of 621 sentences, 99.7% of the graphemes and 86.6% of the sentences are correctly converted. The full Korean TTS (Text-to-Speech) system is now being implemented using this conversion method.

References

  1. Allen, J., and Hunnicut, S. 1987. From Text to Speech: the MITalk System. Cambridge University Press. Google ScholarGoogle Scholar
  2. Bagshaw, P. C. 1998. Phonemic transcription by analogy in text-to-speech synthesis: Novel word pronunciation and lexicon compression. Computational Linguistics 12(2), 119-142.Google ScholarGoogle Scholar
  3. Bechet, F., and El-Beze, M. 1997. Automatic assignment of part-of-speech to out-of-vocabulary words for text-to-speech processing. In Proceedings of the EUROSPEECH '97, 983-986.Google ScholarGoogle Scholar
  4. Cha, J., Lee, G., and Lee, J. 1998. Generalized unknown morpheme guessing for hybrid POS tagging of Korean. In Proceedings of the Sixth Workshop on Very Large Corpora, 85-93.Google ScholarGoogle Scholar
  5. Cha, S., and Chung, M. 1998. Automatic generation of Korean pronunciation variants for TTS system. In Proceedigns of the 10th Workshop on Speech Communication and Signal Processing (in Korean).Google ScholarGoogle Scholar
  6. Charniak, E. 1994. Statistical language learning. MIT press. Google ScholarGoogle Scholar
  7. Daelemans, W. M. P., and van den Bosch, A. P. J. 1997. Language-independent data-oriented grapheme-to-phoneme conversion. In Progress in Speech Synthesis, J. P. van Santen, R. W. Sproat, J. P. Olive, and J. Hirschberg, Eds. Springer-Verlag.Google ScholarGoogle Scholar
  8. Divay, M., and Vitale, A. J. 1997. Algorithms for grapheme-phoneme translation for English and French: Applications. Computational Linguistics 23(4), 495-523. Google ScholarGoogle Scholar
  9. Dutoit, T. 1997. An introduction to Text-to-Speech synthesis. Kluwer Academic Publishers,. Google ScholarGoogle Scholar
  10. Jeon, J., Wee, S., and Chung, M. 1997. Generating pronunciation dictionary by analyzing phonological variations frequently found in spoken Korean. In Proceedigns of the Internation Conference on Speech Processing, 519-524Google ScholarGoogle Scholar
  11. Korea Ministry of Education. 1995. Korean Standard Rule Collections. Taehan Publishers (in Korean).Google ScholarGoogle Scholar
  12. Lee, G., Cha, J., and Lee, J. 1997. Hybrid POS tagging with generalized unknown-word handling. In Proceedings of the IRAL '97, 43-50.Google ScholarGoogle Scholar
  13. Lee, S., and Oh, Y. 1996. A text analyzer for Korean text-to-speech systems. In Proceedings of the international conference on spoken language processing (ICSLP), 1692-1695.Google ScholarGoogle Scholar
  14. Park, S., and Kwon, H. 1995. Implementation to phonological alteration module for a Korean text-to-speech. In Proceedings of the 7th conference on Korean and Korean information processing (in Korean), 35-38.Google ScholarGoogle Scholar
  15. Sanders, E. 1995. Using probabilistic methods to predict phrase boundaries for a text-to-speech system. Master's thesis, University of Nijmegen.Google ScholarGoogle Scholar
  16. Taylor, P., and Black, A. W. 1998. Assigning phrase breaks from part-of-speech sequences. Computer Speech and Language 12(2), 99-117.Google ScholarGoogle Scholar
  17. Santen, J. P. van, Sproat, R. W., Olive, J. P., and Hirschberg, J. 1997. Progress in Speech Synthesis. Springer-Verlag. Google ScholarGoogle Scholar
  18. Wee, S., and Chung, M. 1997. Generating phonetic dictionary using phonological rules. In Proceedings of the HCI '97 conference (in Korean), 308-313.Google ScholarGoogle Scholar

Index Terms

  1. Morpheme-based grapheme to phoneme conversion using phonetic patterns and morphophonemic connectivity information

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader