Skip to main content

A Minimum Cluster-Based Trigram Statistical Model for Thai Syllabification

  • Conference paper
  • 1265 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Abstract

Syllabification is a process of extracting syllables from a word. Problems of syllabification are majorly caused from unknown and ambiguous words. This research aims to resolve these problems in Thai language by exploiting relationships among characters in the word. A character clustering scheme is proposed to generate units smaller than a syllable, called Thai Minimum Clusters (TMCs), from a word. TMCs are then merged into syllables using a trigram statistical model. Experimental evaluations are performed to assess the effectiveness of the proposed technique on a standard data set of 77,303 words. The results show that the technique yields 97.61% accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Christopher, D.M., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval. Cambridge University Press, England (2008)

    MATH  Google Scholar 

  2. Trigram Algorithm, http://ii.nlm.nih.gov/MTI/trigram.shtml (accessed September 28, 2010)

  3. Mao, J., Cheng, G., He, Y., Xing, Z.: A Trigram Statistical Language Model Algorithm for Chinese Word Segmentation. In: Preparata, F.P., Fang, Q. (eds.) FAW 2007. LNCS, vol. 4613, pp. 271–280. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Kanchanacheewa, N.: Principles of Thai Language. Thai Wattana Panich Co., Ltd, Thailand (1996)

    Google Scholar 

  5. Khruahong, S., Nitsuwat, S., Limmaneepraserth, P.: Thai Syllable Segmentation for Text-to-Speech Synthesis by Using Suited-Syllable-Structure Mapping. In: International Conference on Computer Science and Information Technology (2003)

    Google Scholar 

  6. Lorchirachoonkul, V., Khuwinphunt, C.: Thai Soundex Algorithm and Thai-Syllable Separation Algorithm. Research Report. School of Applied Statistics, National Institute of Development Administration, Bangkok (1982)

    Google Scholar 

  7. Thai Script, http://en.wikipedia.org/wiki/Thai_script (accessed October 2, 2010)

  8. Aroonmanakun, W.: Collocation and Thai Word Segmentation. In: Proceedings of the Fifth Symposium on Natural Language Processing & the Fifth Oriental COCOSDA Workshop, Pathumthani, pp. 68–75 (2002)

    Google Scholar 

  9. Aroonmanakun, W., Rivepiboon, W.: A Unified Model of Thai Romanization and Word Segmentation. In: Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation, Tokyo, pp. 205–214 (2004)

    Google Scholar 

  10. Poowarawan, Y.: Dictionary-based Thai Syllable Separation. In: Proceeding of Ninth Electronics Engineering Conference, Khon Kaen (1986)

    Google Scholar 

  11. Theeramunkong, T., Sornlertlamvanich, V.: Character Cluster Based Thai Information Retrieval. In: Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages, Hong Kong, pp. 75–80 (2000)

    Google Scholar 

  12. Inrut, J., Yuanghirun, P., Paludkong, S., Nitsuwat, S., Limmaneepraserth, P.: Thai Word Segmentation Using Combination of Forward and Backward Longest Matching Techniques. In: International Symposium on Communications and Information Technology, Chiang Mai, pp. 37–40 (2001)

    Google Scholar 

  13. Kongsupanich, S.: The Transformation of Thai Morphemes to Phonetic Symbols for Thai Speech Synthesis System. Master Thesis. Faculty of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok (1997)

    Google Scholar 

  14. Paludkong, S.: Developing Thai-Vernacular-to-Romanization Transcriptor Using Ratchabandittayasatan Method. Master Thesis, King Mongkut’s Institute of Technology North Bangkok, Bangkok (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jucksriporn, C., Sornil, O. (2011). A Minimum Cluster-Based Trigram Statistical Model for Thai Syllabification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19437-5_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19436-8

  • Online ISBN: 978-3-642-19437-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics