Abstract
Grapheme-phoneme aligned data is crucial to the grapheme-to-phoneme conversion system. Although manual alignment is possible, the task is tedious and time-consuming. Therefore, unsupervised alignment algorithms are proposed to reduce this alignment cost. Several efficient algorithms rely on the assumption that patterns are continuous, but the assumption is not true for Thai. When applying these algorithms to Thai grapheme-to-phoneme alignment, some pre-processing steps for discontinuous patterns are necessary. We propose an algorithm to align Thai graphemes and phonemes which directly incorporates the discontinuous patterns. The experiments show that the precision of the proposed alignment algorithm substantially increases from the conventional alignment with only continuous patterns while the recall decreases from the original method. As a result, the proposed algorithm achieves similar F1 to the conventional algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Luksaneeyanawin, S.: A thai text-to-speech system. In: Regional Workshop an Computer Processing of Asian Languages (1989)
Mittrapiyanuruk, P., Hansakunbuntheung, C., Tesprasit, V., Sornlertlamvanich, V.: Issues in thai text-to-speech synthesis: The nectec approach. In: NECTEC Annual Conference, pp. 483–495 (2000)
Chotimongkol, A., Black, A.W.: Statistically trained orthographic to sound models for Thai. In: INTERSPEECH, pp. 551–554. ISCA (2000)
Tarsaku, P., Sornlertlamvanich, V., Thongprasirt, R.: Grapheme-to-phoneme for thai. In: NLPRS (2001)
Jiampojamarn, S., Kondrak, G., Sherif, T.: Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion. In: NAACL, pp. 372–379 (2007)
Daelemans, W., van den Bosch, A.: Language-Independent Data-Oriented Grapheme-to-Phoneme Conversion. In: Progress in Speech Processing, pp. 77–89. Springer (1997)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Jiampojamarn, S., Kondrak, G.: Letter-Phoneme Alignment: An Exploration. In: ACL, pp. 780–788 (2010)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1–38 (1977)
Kasuriya, S., Sornlertlamvanich, V., Cotsomrong, P., Kanokphara, S., Thatphithakkul, N.: Thai Speech Corpus for Thai Speech Recognition. In: The Oriental COCOSDA 2003, pp. 54–61 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wanvarie, D. (2014). Thai Grapheme-Phoneme Alignment: Many-to-Many Alignment with Discontinuous Patterns. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds) Intelligent Information and Database Systems. ACIIDS 2014. Lecture Notes in Computer Science(), vol 8397. Springer, Cham. https://doi.org/10.1007/978-3-319-05476-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-05476-6_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05475-9
Online ISBN: 978-3-319-05476-6
eBook Packages: Computer ScienceComputer Science (R0)