Automatic Classification for Ontology Generation by Pretrained Language Model

Oba, Atsushi; Paik, Incheon; Kuwana, Ayato

doi:10.1007/978-3-030-79457-6_18

Atsushi Oba¹²,
Incheon Paik¹² &
Ayato Kuwana¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12798))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1769 Accesses
1 Citations

Abstract

In recent years, for systemizing enormous information on the Internet, ontology that organizes knowledge through a hierarchical structure of concepts has received a large amount of attention in spatiotemporal information science. However, constructing ontology manually requires a large amount of time and deep knowledge of the target field. Consequently, automating ontology generation from raw text corpus is required to meet the ontology demand. As an initial attempt of ontology generation with a neural network, a recurrent neural N = network (RNN)-based method is proposed. However, updating the architecture is possible because of the development in natural language processing (NLP). In contrast, the transfer learning of language models trained by a large unlabeled corpus such as bidirectional encoder representations from transformers (BERT) has yielded a breakthrough in NLP. Inspired by these achievements, to apply transfer learning of language models, we propose a novel workflow for ontology generation consisting of two-stage learning. This paper provides a quantitative comparison between the proposed method and the existing methods. Our result showed that our best method improved accuracy by over 12.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bittner, T., Donnelly, M., Smith, B.: A spatio-temporal ontology for geographic information integration. Int. J. Geog. Inf. Sci. 23, 765–798 (2009). https://doi.org/10.1080/13658810701776767
Article Google Scholar
Paik, I., Komiya, R., Ryu, K.: Customizable active situation awareness framework based on meta-process in ontology. In: Proceedings of International Conference on Awareness Science and Technology (iCAST) 2013, Aizu, Fukushima Japan, November 2013
Google Scholar
Zhu, H., Paschalidis, I.C., Tahmasebi, A.: Clinical concept extraction with contextual word embedding (2018)
Google Scholar
Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles (2020)
Google Scholar
Oba, A., Paik, I.: Extraction of taxonomic relation of complex terms by recurrent neural network. In: 2019 IEEE International Conference on Cognitive Computing (ICCC), pp. 70–72, July 2019
Google Scholar
Duan, S., Zhao, H.: Attention is all you need for Chinese word segmentation (2019)
Google Scholar
Dowdell, T., Zhang, H.: Is attention all what you need? – an empirical investigation on convolution-based active memory and self-attention (2019)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018)
Google Scholar
García, I., Agerri, R., Rigau, G.: A common semantic space for monolingual and cross-lingual meta-embeddings (2020)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units (2015)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017). https://www.aclweb.org/anthology/Q17-1010
Heinzerling, B., Strube, M.: BPEMB: tokenization-free pre-trained sub- word embeddings in 275 languages (2017)
Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations (2019)
Google Scholar
Klaussner, C., Zhekova, D.: Lexico-syntactic patterns for automatic ontology building. In: Proceedings of the Second Student Research Workshop associated with RANLP 2011, Hissar, Bulgaria, pp. 109–114. Association for Computational Linguistics, September 2011. https://www.aclweb.org/anthology/R11-2017
Omine, K., Paik, I.: Classification of taxonomic relations by word embedding and wedge product
Google Scholar
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing (2019)
Google Scholar
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to WordNet: an on-line lexical database. Int. J. Lexicography 3(4), 235–244 (1990). https://doi.org/10.1093/ijl/3.4.235
Article Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019)
Google Scholar
Jiao, X., et al.: TinyBERT: distilling BERT for natural language understanding (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Aizu, Tsuruga Ikki-machi, Aizu-Wakamatsu City, 965-8580, Fukushima, Japan
Atsushi Oba, Incheon Paik & Ayato Kuwana

Authors

Atsushi Oba
View author publications
You can also search for this author in PubMed Google Scholar
Incheon Paik
View author publications
You can also search for this author in PubMed Google Scholar
Ayato Kuwana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Incheon Paik .

Editor information

Editors and Affiliations

i-SOMET Incorporate Association, Morioka, Japan
Hamido Fujita
Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
Ali Selamat
Western Norway University of Applied Sciences, Bergen, Norway
Jerry Chun-Wei Lin
Texas State University San Marcos, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oba, A., Paik, I., Kuwana, A. (2021). Automatic Classification for Ontology Generation by Pretrained Language Model. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12798. Springer, Cham. https://doi.org/10.1007/978-3-030-79457-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-79457-6_18
Published: 19 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79456-9
Online ISBN: 978-3-030-79457-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics