Abstract
Due to the limitations of language-processing tools for the Thai language, pattern-based information extraction from Thai documents requires supplementary techniques. Based on sliding-window rule application and extraction filtering, we present a framework for extracting semantic information from medical-symptom phrases with unknown boundaries in Thai free-text information entries. A supervised rule learning algorithm is employed for automatic construction of information extraction rules from hand-tagged training symptom phrases. Two filtering components are introduced: one uses a classification model for predicting rule application across a symptom-phrase boundary, the other uses extraction distances observed during rule learning for resolving conflicts arising from overlapping-frame extractions. In our experimental study, we focus our attention on two basic types of symptom phrasal descriptions: one is concerned with abnormal characteristics of some observable entities and the other with human-body locations at which symptoms appear. The experimental results show that the filtering components improve precision while preserving recall satisfactorily.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Califf, M.E., Mooney, R.J.: Bottom-up Relational Learning of Pattern Matching Rules for Information Extraction. Journal of Machine Learning Research 4, 177–210 (2003)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Machine Learning 39(2–3), 169–202 (2000)
Kim, E., Song, Y., Lee, C., Kim, K., Lee, G., Yi, B.-K.: Two-Phase Learning for Biological Event Extraction and Verification. ACM Transactions on Asian Language Information Processing 5(1), 61–73 (2006)
Lee, C.-H., Na, J.-C., Khoo, C.S.G.: Towards ontology enrichment with treatment relations extracted from medical abstracts. In: Sugimoto, S., Hunter, J., Rauber, A., Morishima, A. (eds.) ICADL 2006. LNCS, vol. 4312, pp. 419–428. Springer, Heidelberg (2006)
Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34(1–3), 233–272 (1999)
Sornlertlamvanich, V., Potipiti, T., Charoenporn, T.: Automatic Corpus-based Thai Word Extraction with the C4.5 Learning Algorithm. In: Proc. 18th International Conference on Computational Linguistics, Saarbrucken, Germany, pp. 802–807 (2000)
Sukhahuta, R., Smith, D.: Information Extraction Strategies for Thai Documents. International Journal of Computer Processing of Oriental Languages 14(2), 153–172 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Intarapaiboon, P., Nantajeewarawat, E., Theeramunkong, T. (2008). Extracting Semantic Frames from Thai Medical-Symptom Phrases with Unknown Boundaries. In: Domingue, J., Anutariya, C. (eds) The Semantic Web. ASWC 2008. Lecture Notes in Computer Science, vol 5367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89704-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-89704-0_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89703-3
Online ISBN: 978-3-540-89704-0
eBook Packages: Computer ScienceComputer Science (R0)