Thai Syllable-Based Information Extraction Using Hidden Markov Models

Narupiyakul, Lalita; Thomas, Calvin; Cercone, Nick; Sirinaovakul, Booncharoen

doi:10.1007/978-3-540-24630-5_67

Lalita Narupiyakul⁵,
Calvin Thomas⁵,
Nick Cercone⁵ &
…
Booncharoen Sirinaovakul⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

985 Accesses

Abstract

Information Extraction (IE) is a method which analyzes the information and retrieves significant segments or fields for insertion into tables or databases by automatic extraction. In this paper, we employ a statistical model for an IE system. Thai syllable-based information extraction using Hidden Markov Models (HMM) is our proposed method for automated information extraction. In our system, we develop a non-dictionary based method which requires a rule-based system for syllable segmentation. We employ a Viterbi algorithm, which is a statistical system for learning/testing our corpus, and extract the required fields from the information in corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Stemming and Segmentation for Classical Tibetan

Semi-automatic Syllable Labelling for Assamese Language Using HMM and Vowel Onset-Offset Points

Automatic Text-Independent Syllable Segmentation Using Singularity Exponents And Rényi Entropy

Article 07 October 2016

References

Sukhahuta, R., Smith, D.: Information Extraction Strategies for Thai Documents. International Journal of Computer Processing of Oriental Languages 14(2), 153–172 (2001)
Article Google Scholar
Sornlertlamvanich, V., Potipiti, T., Charoenporn, T.: Automatic Corpus-based Thai Word Extraction with the C4.5 Learning Algorithm. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), Saarbrucken, Germany, July-August 2000, pp. 802–807 (2000)
Google Scholar
Tarsaku, P., Sornlertlamvanich, V., Thongpresert, R.: Thai Grapheme-to-Phoneme Using Probabilistic GLR Parser. In: Proceeding of Eurospeech 2001, Aalborg, Denmark (September 2001)
Google Scholar
Chotimongkol, A., Black, A.: Statistcally Trained orthographic to sound models for Thai. In: Proceeding of ICSLP 2000, Beijing, China (2000)
Google Scholar
Narupiyakul, L., Khumya, A., Sirinaovakul, B.: Syllable Segmentation by Using Markov Chains. In: 2001 International Symposium on Communications and Information Technology (ISCIT 2001), Chiang Mai, Thailand, November 14-16 (2001)
Google Scholar
Freitag, D., McCallum, A.: Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of AAAI 2000 (2000)
Google Scholar
Renals, S., Morgan, N., Bourlard, H., Cohen, M., Franco, H.: Connectionist Probability Estimators in HMM Speech Recognition. IEEE Transactions Speech and Audio Processing (1993)
Google Scholar
Miller, D.R., Leek, T., Schwartz, R.M.: A hidden Markov model information retrieval system. In: Proceedings of SIGIR 1999: 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, US, pp. 214–221 (1999)
Google Scholar
Forney, G.D.: The Viterbi algorithm. Proceeding of IEEE 61(3), 268–278 (1973)
Article MathSciNet Google Scholar
Huang, X., Acero, A., Han, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, pp. 378–380. Prentice-Hall, New Jersy (2001)
Google Scholar
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing, May 1999, pp. 332–333. MIT Press, Cambridge (1999)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, NS, Canada, B3H 1W5
Lalita Narupiyakul, Calvin Thomas & Nick Cercone
King Mongkut’s University of Technology, Thonburi 91 Pracha Uthit, Thungkru, Bangkok, Thailand, 10140
Booncharoen Sirinaovakul

Authors

Lalita Narupiyakul
View author publications
You can also search for this author in PubMed Google Scholar
Calvin Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Nick Cercone
View author publications
You can also search for this author in PubMed Google Scholar
Booncharoen Sirinaovakul
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Narupiyakul, L., Thomas, C., Cercone, N., Sirinaovakul, B. (2004). Thai Syllable-Based Information Extraction Using Hidden Markov Models. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_67

Download citation

DOI: https://doi.org/10.1007/978-3-540-24630-5_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Thai Syllable-Based Information Extraction Using Hidden Markov Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Stemming and Segmentation for Classical Tibetan

Semi-automatic Syllable Labelling for Assamese Language Using HMM and Vowel Onset-Offset Points

Automatic Text-Independent Syllable Segmentation Using Singularity Exponents And Rényi Entropy

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Thai Syllable-Based Information Extraction Using Hidden Markov Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Stemming and Segmentation for Classical Tibetan

Semi-automatic Syllable Labelling for Assamese Language Using HMM and Vowel Onset-Offset Points

Automatic Text-Independent Syllable Segmentation Using Singularity Exponents And Rényi Entropy

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation