Abstract
Information Extraction (IE) is a method which analyzes the information and retrieves significant segments or fields for insertion into tables or databases by automatic extraction. In this paper, we employ a statistical model for an IE system. Thai syllable-based information extraction using Hidden Markov Models (HMM) is our proposed method for automated information extraction. In our system, we develop a non-dictionary based method which requires a rule-based system for syllable segmentation. We employ a Viterbi algorithm, which is a statistical system for learning/testing our corpus, and extract the required fields from the information in corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sukhahuta, R., Smith, D.: Information Extraction Strategies for Thai Documents. International Journal of Computer Processing of Oriental Languages 14(2), 153–172 (2001)
Sornlertlamvanich, V., Potipiti, T., Charoenporn, T.: Automatic Corpus-based Thai Word Extraction with the C4.5 Learning Algorithm. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), Saarbrucken, Germany, July-August 2000, pp. 802–807 (2000)
Tarsaku, P., Sornlertlamvanich, V., Thongpresert, R.: Thai Grapheme-to-Phoneme Using Probabilistic GLR Parser. In: Proceeding of Eurospeech 2001, Aalborg, Denmark (September 2001)
Chotimongkol, A., Black, A.: Statistcally Trained orthographic to sound models for Thai. In: Proceeding of ICSLP 2000, Beijing, China (2000)
Narupiyakul, L., Khumya, A., Sirinaovakul, B.: Syllable Segmentation by Using Markov Chains. In: 2001 International Symposium on Communications and Information Technology (ISCIT 2001), Chiang Mai, Thailand, November 14-16 (2001)
Freitag, D., McCallum, A.: Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of AAAI 2000 (2000)
Renals, S., Morgan, N., Bourlard, H., Cohen, M., Franco, H.: Connectionist Probability Estimators in HMM Speech Recognition. IEEE Transactions Speech and Audio Processing (1993)
Miller, D.R., Leek, T., Schwartz, R.M.: A hidden Markov model information retrieval system. In: Proceedings of SIGIR 1999: 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, US, pp. 214–221 (1999)
Forney, G.D.: The Viterbi algorithm. Proceeding of IEEE 61(3), 268–278 (1973)
Huang, X., Acero, A., Han, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, pp. 378–380. Prentice-Hall, New Jersy (2001)
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing, May 1999, pp. 332–333. MIT Press, Cambridge (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Narupiyakul, L., Thomas, C., Cercone, N., Sirinaovakul, B. (2004). Thai Syllable-Based Information Extraction Using Hidden Markov Models. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_67
Download citation
DOI: https://doi.org/10.1007/978-3-540-24630-5_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive