Automatic building of an ontology on the basis of text corpora in Thai

Imsombut, Aurawan; Kawtrakul, Asanee

doi:10.1007/s10579-007-9045-5

Automatic building of an ontology on the basis of text corpora in Thai

Published: 05 December 2007

Volume 42, pages 137–149, (2008)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Aurawan Imsombut¹ &
Asanee Kawtrakul¹

201 Accesses
7 Citations
Explore all metrics

Abstract

This paper presents a methodology for automatic learning of ontologies from Thai text corpora, by extraction of terms and relations. A shallow parser is used to chunk texts on which we identify taxonomic relations with the help of cues: lexico-syntactic patterns and item lists. The main advantage of the approach is that it simplify the task of concept and relation labeling since cues help for identifying the ontological concept and hinting their relation. However, these techniques pose certain problems, i.e. cue word ambiguity, item list identification, and numerous candidate terms. We also propose the methodology to solve these problems by using lexicon and co-occurrence features and weighting them with information gain. The precision, recall and F-measure of the system are 0.74, 0.78 and 0.76, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Domain Ontology Learning from Text

Automatic ontology construction from text: a review from shallow to deep learning trend

Article 08 November 2019

A Chinese Framework of Semantic Taxonomy and Description: Preliminary Experimental Evaluation Using Web Information Extraction

Notes

http://www.fao.org/agrovoc/
Thai National Agricultural Information Coordinating Center (http://www.thaiagris.lib.ku.ac.th/).

References

Agirre, E., Ansa, O., Hovy, E., & Martinez, D. (2000). Enriching very large ontologies using the WWW. In Proceedings of the Workshop on Ontology Construction of the European Conference of AI (ECAI-00).
Ayan, N. F. (1999). Using information gain as feature weight. In Eighth Turkish Symposium on Artificial Intelligence and Neural Networks.
Bisson, G., Nedellec, C., & Cañamero, D. (2000). Designing clustering methods for ontology building – The Mo’K Workbench. In: Proceedings of the Workshop on Ontology Learning, 14th European Conference on Artificial Intelligence, ECAI’00, Berlin, Germany.
Chanlekha, H., & Kawtrakul, A. (2004). Thai named entity extraction by incorporating maximum entropy model with simple heuristic information. In Proceedings of the IJCNLP’ 2004, Hainan Island, China.
Church, K. W., & Hanks, P. P. (1989). Word association norms, mutual information and lexicography. In Proceedings of the 27th Annual Meeting of the ACL (pp. 76–83). Vancouver.
Dunning, T. (1994). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74. Cambrigde: The MIT Press.
Girju, R., Badulescu, A., & Moldovan, D. (2003). Learning semantic constraints for the automatic discovery of part-whole relations. In The Proceedings of the Human Language Technology Conference, Edmonton.
Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics.
Kawtrakul, A., Suktarachan, A., & Imsombut, A. (2004). Automatic Thai ontology construction and maintenance system. In Workshop on OntoLex LREC Conference, Lisbon.
Maedche, A., & Staab, S. (2001) Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2), 72–79.
Article Google Scholar
Navigli, R., et al. (2003). Ontology learning and its application to automated terminology translation. IEEE Intelligent Systems, 18(1), 22–31.
Article Google Scholar
Nedellec, C. (2000). Corpus-based learning of semantic relations by the ILP system, ASIUM. In Learning Language in Logic, Lecture Notes in Computer Science (Vol. 1925, pp. 259–278). Springer-Verlag.
Pantel, P., & Pennacchiotti, M. (2006). Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of Conference on Computational Linguistics/Association for Computational Linguistics (COLING/ACL-06). Sydney.
Pengphon, N., Kawtrakul, A., & Suktarachan, M. (2002). Word formation approach to noun phrase analysis for Thai. In Proceedings of SNLP2002, Thailand.
Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer. Boston: Addison-Wesley Longman Publishing Co, Inc.
Google Scholar
Shinzato, K., & Torisawa, K. (2004). Acquiring hyponymy relations from web documents. In Proceedings of Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting, Boston.
Sudprasert, S., & Kawtrakul, A. (2003). Thai word segmentation based on global and local unsupervised learning. In Proceedings of NCSEC2003, Chonburi, Thailand.

Download references

Acknowledgments

The authors would like to present deeply thanks to Michael Zock and Mathieu Lafourcade for their patience to review this work. The work described in this paper has been supported by the grant of NECTEC No. NT-B-22-14-12-46-06. It was also funded in part by the KURDI; Kasetsart University Research and Development Institute.

Author information

Authors and Affiliations

NAiST Laboratory, Kasetsart University, Bangkok, Thailand
Aurawan Imsombut & Asanee Kawtrakul

Authors

Aurawan Imsombut
View author publications
You can also search for this author inPubMed Google Scholar
Asanee Kawtrakul
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Aurawan Imsombut.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Imsombut, A., Kawtrakul, A. Automatic building of an ontology on the basis of text corpora in Thai. Lang Resources & Evaluation 42, 137–149 (2008). https://doi.org/10.1007/s10579-007-9045-5

Download citation

Accepted: 03 September 2007
Published: 05 December 2007
Issue Date: May 2008
DOI: https://doi.org/10.1007/s10579-007-9045-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic building of an ontology on the basis of text corpora in Thai

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Domain Ontology Learning from Text

Automatic ontology construction from text: a review from shallow to deep learning trend

A Chinese Framework of Semantic Taxonomy and Description: Preliminary Experimental Evaluation Using Web Information Extraction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now