Skip to main content

Thai Syllable-Based Information Extraction Using Hidden Markov Models

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

  • 985 Accesses

Abstract

Information Extraction (IE) is a method which analyzes the information and retrieves significant segments or fields for insertion into tables or databases by automatic extraction. In this paper, we employ a statistical model for an IE system. Thai syllable-based information extraction using Hidden Markov Models (HMM) is our proposed method for automated information extraction. In our system, we develop a non-dictionary based method which requires a rule-based system for syllable segmentation. We employ a Viterbi algorithm, which is a statistical system for learning/testing our corpus, and extract the required fields from the information in corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Sukhahuta, R., Smith, D.: Information Extraction Strategies for Thai Documents. International Journal of Computer Processing of Oriental Languages 14(2), 153–172 (2001)

    Article  Google Scholar 

  2. Sornlertlamvanich, V., Potipiti, T., Charoenporn, T.: Automatic Corpus-based Thai Word Extraction with the C4.5 Learning Algorithm. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), Saarbrucken, Germany, July-August 2000, pp. 802–807 (2000)

    Google Scholar 

  3. Tarsaku, P., Sornlertlamvanich, V., Thongpresert, R.: Thai Grapheme-to-Phoneme Using Probabilistic GLR Parser. In: Proceeding of Eurospeech 2001, Aalborg, Denmark (September 2001)

    Google Scholar 

  4. Chotimongkol, A., Black, A.: Statistcally Trained orthographic to sound models for Thai. In: Proceeding of ICSLP 2000, Beijing, China (2000)

    Google Scholar 

  5. Narupiyakul, L., Khumya, A., Sirinaovakul, B.: Syllable Segmentation by Using Markov Chains. In: 2001 International Symposium on Communications and Information Technology (ISCIT 2001), Chiang Mai, Thailand, November 14-16 (2001)

    Google Scholar 

  6. Freitag, D., McCallum, A.: Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of AAAI 2000 (2000)

    Google Scholar 

  7. Renals, S., Morgan, N., Bourlard, H., Cohen, M., Franco, H.: Connectionist Probability Estimators in HMM Speech Recognition. IEEE Transactions Speech and Audio Processing (1993)

    Google Scholar 

  8. Miller, D.R., Leek, T., Schwartz, R.M.: A hidden Markov model information retrieval system. In: Proceedings of SIGIR 1999: 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, US, pp. 214–221 (1999)

    Google Scholar 

  9. Forney, G.D.: The Viterbi algorithm. Proceeding of IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  10. Huang, X., Acero, A., Han, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, pp. 378–380. Prentice-Hall, New Jersy (2001)

    Google Scholar 

  11. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing, May 1999, pp. 332–333. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Narupiyakul, L., Thomas, C., Cercone, N., Sirinaovakul, B. (2004). Thai Syllable-Based Information Extraction Using Hidden Markov Models. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_67

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24630-5_67

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21006-1

  • Online ISBN: 978-3-540-24630-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics