Skip to main content

Romanization of Thai Proper Names Based on Popularity of Usages

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

Abstract

The lack of standards for Romanization of Thai proper names makes searching activity a challenging task. This is particularly important when searching for people-related documents based on orthographic representation of their names using either solely Thai or English alphabets. Romanization based directly on the names’ pronunciations often fails to deliver exact English spellings due to the non-1-to-1 mapping from Thai to English spelling and personal preferences. This paper proposes a Romanization approach where popularity of usages is taken into consideration. Thai names are parsed into sequences of grams, units of syllable-sized or larger governed by pronunciation and spelling constraints in both Thai and English writing systems. A Gram lexicon is constructed from a corpus of more than 130,000 names. Statistical models are trained accordingly based on the Gram lexicon. The proposed method significantly outperformed the current Romanization approach. Approximately 46% to 75% of the correct English spellings are covered when the number of proposed hypotheses increases from 1 to 15.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Poowarawan, Y.: Dictionary-based Thai Syllable Separation. In: Proceedings of the Ninth Electronics Engineering Conference (1986)

    Google Scholar 

  2. Sornlertlamvanich, V.: Word Segmentation for Thai in a Machine Translation system (in Thai), Papers on Natural Language processing, NECTEC, Thailand (1995)

    Google Scholar 

  3. Thanaruk, T., Thanasan, T., Duangrumol, P., Arunthep, S.: Non-Dictionary-Based Word Segmentation Using Local Context Statistics. In: Proceedings of the 5th Symposium on Natural Language Processing and Oriental COCOSDA Workshop, Hua Hin, Thailand, pp. 81–88 (May 2002)

    Google Scholar 

  4. Aroonmanakun, W.: Collocation and Thai Word Segmentation. In: Proceedings of the Fifth Symposium on Natural Language Processing & The Fifth Oriental COCOSDA Workshop, pp. 68–75. Sirindhorn International Institute of Technology, Pathumthani (2002)

    Google Scholar 

  5. Thatsanee Charoenporn, Ananlada Chotimongkol, and Virach Sornlertlamvanich, Automatic Romanization For Thai, Bangkok, Thailand (1999)

    Google Scholar 

  6. Aroonmanakun, W., Rivepiboon, W.: A Unified Model of Thai Word Segmentation and Romanization. In: Proceedings of The 18th PACLIC, Tokyo, Japan (2004)

    Google Scholar 

  7. Karoonboonyanan, T.: Standardization and Implementations of Thai Language. In: The Seminar on Enhancement of the International Standardization Activities in Asia Pacific Region (AHTS-1) held on at CICC, Japan (March 1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tangverapong, A., Suchato, A., Punyabukkana, P. (2009). Romanization of Thai Proper Names Based on Popularity of Usages. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01307-2_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01306-5

  • Online ISBN: 978-3-642-01307-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics