Abstract
This paper presents a methodology for automatic learning of ontologies from Thai text corpora, by extraction of terms and relations. A shallow parser is used to chunk texts on which we identify taxonomic relations with the help of cues: lexico-syntactic patterns and item lists. The main advantage of the approach is that it simplify the task of concept and relation labeling since cues help for identifying the ontological concept and hinting their relation. However, these techniques pose certain problems, i.e. cue word ambiguity, item list identification, and numerous candidate terms. We also propose the methodology to solve these problems by using lexicon and co-occurrence features and weighting them with information gain. The precision, recall and F-measure of the system are 0.74, 0.78 and 0.76, respectively.



Similar content being viewed by others
Notes
Thai National Agricultural Information Coordinating Center (http://www.thaiagris.lib.ku.ac.th/).
References
Agirre, E., Ansa, O., Hovy, E., & Martinez, D. (2000). Enriching very large ontologies using the WWW. In Proceedings of the Workshop on Ontology Construction of the European Conference of AI (ECAI-00).
Ayan, N. F. (1999). Using information gain as feature weight. In Eighth Turkish Symposium on Artificial Intelligence and Neural Networks.
Bisson, G., Nedellec, C., & Cañamero, D. (2000). Designing clustering methods for ontology building – The Mo’K Workbench. In: Proceedings of the Workshop on Ontology Learning, 14th European Conference on Artificial Intelligence, ECAI’00, Berlin, Germany.
Chanlekha, H., & Kawtrakul, A. (2004). Thai named entity extraction by incorporating maximum entropy model with simple heuristic information. In Proceedings of the IJCNLP’ 2004, Hainan Island, China.
Church, K. W., & Hanks, P. P. (1989). Word association norms, mutual information and lexicography. In Proceedings of the 27th Annual Meeting of the ACL (pp. 76–83). Vancouver.
Dunning, T. (1994). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74. Cambrigde: The MIT Press.
Girju, R., Badulescu, A., & Moldovan, D. (2003). Learning semantic constraints for the automatic discovery of part-whole relations. In The Proceedings of the Human Language Technology Conference, Edmonton.
Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics.
Kawtrakul, A., Suktarachan, A., & Imsombut, A. (2004). Automatic Thai ontology construction and maintenance system. In Workshop on OntoLex LREC Conference, Lisbon.
Maedche, A., & Staab, S. (2001) Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2), 72–79.
Navigli, R., et al. (2003). Ontology learning and its application to automated terminology translation. IEEE Intelligent Systems, 18(1), 22–31.
Nedellec, C. (2000). Corpus-based learning of semantic relations by the ILP system, ASIUM. In Learning Language in Logic, Lecture Notes in Computer Science (Vol. 1925, pp. 259–278). Springer-Verlag.
Pantel, P., & Pennacchiotti, M. (2006). Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of Conference on Computational Linguistics/Association for Computational Linguistics (COLING/ACL-06). Sydney.
Pengphon, N., Kawtrakul, A., & Suktarachan, M. (2002). Word formation approach to noun phrase analysis for Thai. In Proceedings of SNLP2002, Thailand.
Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer. Boston: Addison-Wesley Longman Publishing Co, Inc.
Shinzato, K., & Torisawa, K. (2004). Acquiring hyponymy relations from web documents. In Proceedings of Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting, Boston.
Sudprasert, S., & Kawtrakul, A. (2003). Thai word segmentation based on global and local unsupervised learning. In Proceedings of NCSEC2003, Chonburi, Thailand.
Acknowledgments
The authors would like to present deeply thanks to Michael Zock and Mathieu Lafourcade for their patience to review this work. The work described in this paper has been supported by the grant of NECTEC No. NT-B-22-14-12-46-06. It was also funded in part by the KURDI; Kasetsart University Research and Development Institute.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Imsombut, A., Kawtrakul, A. Automatic building of an ontology on the basis of text corpora in Thai. Lang Resources & Evaluation 42, 137–149 (2008). https://doi.org/10.1007/s10579-007-9045-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-007-9045-5