Abstract
In recent years, for systemizing enormous information on the Internet, ontology that organizes knowledge through a hierarchical structure of concepts has received a large amount of attention in spatiotemporal information science. However, constructing ontology manually requires a large amount of time and deep knowledge of the target field. Consequently, automating ontology generation from raw text corpus is required to meet the ontology demand. As an initial attempt of ontology generation with a neural network, a recurrent neural N = network (RNN)-based method is proposed. However, updating the architecture is possible because of the development in natural language processing (NLP). In contrast, the transfer learning of language models trained by a large unlabeled corpus such as bidirectional encoder representations from transformers (BERT) has yielded a breakthrough in NLP. Inspired by these achievements, to apply transfer learning of language models, we propose a novel workflow for ontology generation consisting of two-stage learning. This paper provides a quantitative comparison between the proposed method and the existing methods. Our result showed that our best method improved accuracy by over 12.5%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bittner, T., Donnelly, M., Smith, B.: A spatio-temporal ontology for geographic information integration. Int. J. Geog. Inf. Sci. 23, 765–798 (2009). https://doi.org/10.1080/13658810701776767
Paik, I., Komiya, R., Ryu, K.: Customizable active situation awareness framework based on meta-process in ontology. In: Proceedings of International Conference on Awareness Science and Technology (iCAST) 2013, Aizu, Fukushima Japan, November 2013
Zhu, H., Paschalidis, I.C., Tahmasebi, A.: Clinical concept extraction with contextual word embedding (2018)
Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles (2020)
Oba, A., Paik, I.: Extraction of taxonomic relation of complex terms by recurrent neural network. In: 2019 IEEE International Conference on Cognitive Computing (ICCC), pp. 70–72, July 2019
Duan, S., Zhao, H.: Attention is all you need for Chinese word segmentation (2019)
Dowdell, T., Zhang, H.: Is attention all what you need? – an empirical investigation on convolution-based active memory and self-attention (2019)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018)
García, I., Agerri, R., Rigau, G.: A common semantic space for monolingual and cross-lingual meta-embeddings (2020)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units (2015)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017). https://www.aclweb.org/anthology/Q17-1010
Heinzerling, B., Strube, M.: BPEMB: tokenization-free pre-trained sub- word embeddings in 275 languages (2017)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations (2019)
Klaussner, C., Zhekova, D.: Lexico-syntactic patterns for automatic ontology building. In: Proceedings of the Second Student Research Workshop associated with RANLP 2011, Hissar, Bulgaria, pp. 109–114. Association for Computational Linguistics, September 2011. https://www.aclweb.org/anthology/R11-2017
Omine, K., Paik, I.: Classification of taxonomic relations by word embedding and wedge product
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing (2019)
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to WordNet: an on-line lexical database. Int. J. Lexicography 3(4), 235–244 (1990). https://doi.org/10.1093/ijl/3.4.235
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019)
Jiao, X., et al.: TinyBERT: distilling BERT for natural language understanding (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Oba, A., Paik, I., Kuwana, A. (2021). Automatic Classification for Ontology Generation by Pretrained Language Model. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12798. Springer, Cham. https://doi.org/10.1007/978-3-030-79457-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-79457-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79456-9
Online ISBN: 978-3-030-79457-6
eBook Packages: Computer ScienceComputer Science (R0)