Skip to main content

An automatic Thai lexical acquisition from text

  • Text Analysis (Summarization, Morphological Analysis)
  • Conference paper
  • First Online:
Book cover PRICAI’98: Topics in Artificial Intelligence (PRICAI 1998)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1531))

Included in the following conference series:

Abstract

The Thai writing system has no natural marks to indicate words or sentences. This is one of the causes for many machine leaning researches including the automatic indexing in Information Retrieval to identify keywords for searching. A new method for constructing lexicons from a corpus text is presented. This method is based on the basic Thai morphologies and Bayesian networks concept. The Bayesian networks are based on the well-known minimal description length (MDL) principle. The MDL concepts allow us to construct the Thai lexicons and are used for segmenting the Thai texts. The segmentation effectiveness in terms of recall/precision is 59/51 while the effectiveness of dictionary procedure has 71/54 of recall/precision. However, this new algorithm does not require any lexicon patterns for training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen A., He J., Xu L., Gey F., and Meggs J., Chinese Text Retrieval Without Using a Dictionary, Proc. of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, Pennsylvania, USA, July 27–July 31, 97, pp. 42–49.

    Google Scholar 

  2. Jaruskulchai C. Thai Text Retrieval: Simply Term Weight and Basic Thai Morphological Rules, Technical Report, Dept. of Computer Science, George Washington University, USA, Jan, 1998.

    Google Scholar 

  3. Shibayama M. and Hoshino S. Thai Morphological Analyses Based on the Syllable Formation Rules, Journal of Information Processing, Vol. 15, No. 4, pp. 554–563, 1992.

    Google Scholar 

  4. Varakulsiripunth R., Suchichit W., Juwan S., and Thipchaksurat S., An Analysis on Correct Sentence Selection by Word’s General Usage Frequency, Papers on Natural Language Processing: Multi-lingual Machine Translation and Related Topics (1987–1994), pp. 291–300, 1994.

    Google Scholar 

  5. Kawtrakul A., Thumkanon C. and Seriburi S., A Statistical Approach to Thai Word Filtering, Proc. SNLP’95, The 2nd Symposium on Natural Language Processing, pp. 398–406, August 2–4, 1995, Bangkok, Thailand.

    Google Scholar 

  6. Phraya Uphakit Silapasan, Thai Grammar, Reprinted in 1989. ( 2461)

    Google Scholar 

  7. Jay M. Ponte and W. Bruce Croft, Useg: A Retargetable Word Segmentation Procedure for Information Retrieval, Computer Science Department Amherst, MA, USA.

    Google Scholar 

  8. Sornlertlamvanich V., Word Segmentation for Thai in Machine Translation System, National Electronics and Computer Technology Center, National Science Technology Development Agency, Ministry of Science, Technology and Environment (In Thai).

    Google Scholar 

  9. Vilas Wuwongse and Ampai Pornprasertaskul, Thai syntax Parsing, Proceedings of the Symposium on Natural Language Processing in Thailand, pp. 446–467, 11–17 Mar, 1993.

    Google Scholar 

  10. Sinthupunprathum D. and Buntitanon T , Thai word Processing, Proc. of the Symposium on Natural Language Processing in Thailand, Mar 17–21, 1993, Thailand.(In Thai)

    Google Scholar 

  11. Sproat R., Shih C., Gale W., and Change N., A Stochastic Finite-State Word-Segmentation Algorithm for Chinese, cmp-lg/940508, 5 May, 94.

    Google Scholar 

  12. Jaruskulchai C., An Automatic indexing for Thai Text Retrieval, Doctor’s Thesis, George Washington University, USA, July 22, 98.

    Google Scholar 

  13. Rissanen J., Universal Coding, Information, Prediction, and Estimation, IEEE Transactions on Information Theory, vol. IT-30, No. 4, Julay 1984.

    Google Scholar 

  14. Lam W., and Bacchus Fahiem, Learning Bayesian Belief Networks An approach based on the MDL principle, Computation Intelligence, Vol. 10:4, 1994.

    Google Scholar 

  15. Friedman Nir and Goldszmidt Moises, Sequential Update of Bayesian Network Structure, Uncertainty in Artificial Intelligence, Proc. of the 13th Conference, Edited by Dan Geiger and Prakash Pundalik Shenoy, August 1–3, 1997, pp. 165–174.

    Google Scholar 

  16. Bahl L. R., Jelinek F., and Mercer R. L., A Maximum likelihood approach to continuous Speech Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-5, No. 2, page 179–190, 1983.

    Article  Google Scholar 

  17. Heckerman D., A Tutorial on Learning Bayesian Networks, Technical Report MSR-TR-95-06, Microsoft Research, 1995.

    Google Scholar 

  18. Cover T.M., and Thomas J.A., Elements of Information Theory, John Wiley and Sons, Inc., New York, New York, 1991.

    MATH  Google Scholar 

  19. Shannon C.E., Prediction and Entropy of printed English, Bell Systems Technical Journal, 30:50–64, 1951.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hing-Yan Lee Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jaruskulchai, C. (1998). An automatic Thai lexical acquisition from text. In: Lee, HY., Motoda, H. (eds) PRICAI’98: Topics in Artificial Intelligence. PRICAI 1998. Lecture Notes in Computer Science, vol 1531. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0095290

Download citation

  • DOI: https://doi.org/10.1007/BFb0095290

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65271-7

  • Online ISBN: 978-3-540-49461-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics