Skip to main content
Log in

Automatic building of an ontology on the basis of text corpora in Thai

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper presents a methodology for automatic learning of ontologies from Thai text corpora, by extraction of terms and relations. A shallow parser is used to chunk texts on which we identify taxonomic relations with the help of cues: lexico-syntactic patterns and item lists. The main advantage of the approach is that it simplify the task of concept and relation labeling since cues help for identifying the ontological concept and hinting their relation. However, these techniques pose certain problems, i.e. cue word ambiguity, item list identification, and numerous candidate terms. We also propose the methodology to solve these problems by using lexicon and co-occurrence features and weighting them with information gain. The precision, recall and F-measure of the system are 0.74, 0.78 and 0.76, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://www.fao.org/agrovoc/

  2. Thai National Agricultural Information Coordinating Center (http://www.thaiagris.lib.ku.ac.th/).

References

  • Agirre, E., Ansa, O., Hovy, E., & Martinez, D. (2000). Enriching very large ontologies using the WWW. In Proceedings of the Workshop on Ontology Construction of the European Conference of AI (ECAI-00).

  • Ayan, N. F. (1999). Using information gain as feature weight. In Eighth Turkish Symposium on Artificial Intelligence and Neural Networks.

  • Bisson, G., Nedellec, C., & Cañamero, D. (2000). Designing clustering methods for ontology building – The Mo’K Workbench. In: Proceedings of the Workshop on Ontology Learning, 14th European Conference on Artificial Intelligence, ECAI’00, Berlin, Germany.

  • Chanlekha, H., & Kawtrakul, A. (2004). Thai named entity extraction by incorporating maximum entropy model with simple heuristic information. In Proceedings of the IJCNLP’ 2004, Hainan Island, China.

  • Church, K. W., & Hanks, P. P. (1989). Word association norms, mutual information and lexicography. In Proceedings of the 27th Annual Meeting of the ACL (pp. 76–83). Vancouver.

  • Dunning, T. (1994). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74. Cambrigde: The MIT Press.

  • Girju, R., Badulescu, A., & Moldovan, D. (2003). Learning semantic constraints for the automatic discovery of part-whole relations. In The Proceedings of the Human Language Technology Conference, Edmonton.

  • Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics.

  • Kawtrakul, A., Suktarachan, A., & Imsombut, A. (2004). Automatic Thai ontology construction and maintenance system. In Workshop on OntoLex LREC Conference, Lisbon.

  • Maedche, A., & Staab, S. (2001) Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2), 72–79.

    Article  Google Scholar 

  • Navigli, R., et al. (2003). Ontology learning and its application to automated terminology translation. IEEE Intelligent Systems, 18(1), 22–31.

    Article  Google Scholar 

  • Nedellec, C. (2000). Corpus-based learning of semantic relations by the ILP system, ASIUM. In Learning Language in Logic, Lecture Notes in Computer Science (Vol. 1925, pp. 259–278). Springer-Verlag.

  • Pantel, P., & Pennacchiotti, M. (2006). Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of Conference on Computational Linguistics/Association for Computational Linguistics (COLING/ACL-06). Sydney.

  • Pengphon, N., Kawtrakul, A., & Suktarachan, M. (2002). Word formation approach to noun phrase analysis for Thai. In Proceedings of SNLP2002, Thailand.

  • Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer. Boston: Addison-Wesley Longman Publishing Co, Inc.

    Google Scholar 

  • Shinzato, K., & Torisawa, K. (2004). Acquiring hyponymy relations from web documents. In Proceedings of Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting, Boston.

  • Sudprasert, S., & Kawtrakul, A. (2003). Thai word segmentation based on global and local unsupervised learning. In Proceedings of NCSEC2003, Chonburi, Thailand.

Download references

Acknowledgments

The authors would like to present deeply thanks to Michael Zock and Mathieu Lafourcade for their patience to review this work. The work described in this paper has been supported by the grant of NECTEC No. NT-B-22-14-12-46-06. It was also funded in part by the KURDI; Kasetsart University Research and Development Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aurawan Imsombut.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Imsombut, A., Kawtrakul, A. Automatic building of an ontology on the basis of text corpora in Thai. Lang Resources & Evaluation 42, 137–149 (2008). https://doi.org/10.1007/s10579-007-9045-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-007-9045-5

Keywords

Navigation