Abstract
Distributional semantic models like the Latent Dirichlet Allocation (LDA) model Guo et al. (Concurr. Comput.: Pract. Exper. 29(3), 319–343 2016) consist of defining similar representation of words according to their similar context. LDA has been originally used to model documents and extract topics in Information Retrieval. In recent years, LDA has become a hot topic among ontology learning because of the exponential increase of the number of documents and textual data not only on the web but also in digital libraries. LDA-based approaches have proven to provide the best result. However, they suffer of several limitations related to concept and relation extraction, as well as handling the corpus evolution and maintaining. In order to cope with these problems, we propose in this paper LEOnto+, an extended version of LEOnto (Tissaoui et al. 2020, Tissaoui et al. SN Comput. Sci. J. 1: 336 2020), to provide a new approach for automatic ontology enriching from textual corpus. In LEOnto+, LDA is used to provide dimension reduction and to identify semantic relationships between topic-document and word-topic using probability distributions. Here, we provide several experiments conducted using several evaluation techniques (Evaluation based criteria, Gold standard evaluation, Expert evaluation, Task-based evaluation and Corpus-based evaluation). We also compare the results of LEOnto+ with two existing methods using their respective datasets. The evaluation results show that LEOnto+ outperforms the aforementioned methods (particularly in terms of precision). We also compare our approach using two large corpus in order to demonstrate its scalability.
Similar content being viewed by others
Notes
This difficulty in capturing the knowledge required by knowledge-based systems is called “knowledge acquisition bottleneck”.
LEOnto is limited to extracting concepts and few relations.
References
Abeer Al-Arfaj, A., Al-Salman, A.: Ontology construction from text: challenges and trends. Int. J. Artif. Intell. Expert. Syst. 6(2), 15–26 (2015)
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: ACM SIGMOD’93 international conference on management of data. Washington, pp 207–216 (1993)
Albukhitan, S., Helmy, T., Alnazer, A.: Arabic ontology learning using deep learning. In: Proceedings of the international conference on Web intelligence. ACM, pp 1138–1142 (2017)
Asfari, O., Hannachi, L., Bentayeb, F., Boussaid, O.: On-tological topic modeling to extract Twitter users’ topics of interest. In: 8th International conference on information technology and applications (ICITA), pp 141–146 (2013)
Asuncion, G.P., Manzano-Macho, D.: A survey of ontology learning methods and techniques. Technical Report D1.5.. Madrid (2003)
Aussenac-Gilles, N., Seguela, P.: Les relations sémantiques : de linguistique au formel. Cahiers. de. Grammaire. 25, 175–198 (2000)
Bachimont, B.: Engagement sémantique et engagement ontologique : conception et réalisation d’ontologies en ingénierie des connaissances. In: Proceedings: ingénierie des connaissances: évolutions récentes et nouveaux défis (2000)
Benaissa, B., Bouchiha, D., Zouaoui, A., Doumi, N.: Building arabic ontology from texts. Procedía. Comput. Sci. 73, 7–15 (2015)
Benomrane, S., Sellami, Z., BenAyed, M.: An ontologist feedback driven ontology evolution with an adaptive multiagent system. Adv. Eng. Inform. 30(3), 337–353 (2016)
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Sci. Amer. J 284(5), 34–43 (2001)
Biébow, B, Szulman, S.: TERMINAE: a linguistic-based tool for the building of a domain ontology. In: Proceedings of the 11th European workshop on knowledge acquisition, modelling and management. LCNS Springer, pp 49–66 (1999)
Bisson, G., Nédellec, C, Canamero, D.: Designing clustering methods for ontology building-the Mo’K workbench. Proceedings. of. the. First. International. Conference. on. Ontology. Learning. V31, 13–28 (2000)
Blomqvist, E.: Fully automatic construction of enterprise ontologies using design patterns: initial method and first experiences. In: Proceedings of the 4th international conference on ontologies, databases and applications of semantics (ODBASE) (2005)
Casteleiro, M.A., George, D., Read, W., Maria, J.F.P., Nava, M., Diego, M.F., Nenadic, G., Klein, J., Keane, J., Stevens, R.: Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature. J. Biomed. Semantics. 9(1), 2–24 (2018)
Casteleiro, M., Prieto, M., Demetriou, G., Maroto, N., Read, W., Maseda, F.D., Diz, J., Nenadic, G., Keane, J., Stevens, J.: Ontology learning with deep learning: a case study on patient safety using PubMed. In: Proceedings of semantic Web applications and tools for healthcare and life sciences, p V1795 (2016)
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Data mining and knowledge discovery handbook. Springer, pp 853–867 (2005)
Cimiano, P.: Ontology learning and population from text: algorithms, evaluation and applications. Springer-Verlag New York Inc., New York (2006)
Cimiano, P., Mädche, A, Staab, S., Völker, J: Ontology learning. In: Handbook on ontologies. Springer, pp 245–267 (2009)
Cimiano, P., Steffen, S.: Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In: Proceedings of the ICML workshop on learning and extending lexical ontologies with machine learning methods (2005)
Cimiano, P., Volker, J., Studer, R.: Ontologies on demand? - A description of the state-of-the-art, applications, challenges and trends for ontology learning from text. Inform. Wissenschaft. und. Praxis. J 57(6-7), 315–320 (2009)
Confort, V.T., Revoredo, K., Baiao, F.A., Santoro, F.M.: Ontology extraction from stories: an exploratory study in storytelling. In: International conference on knowledge management in organizations. Springer, pp 477–491 (2015)
Cunningham, H.: Information extraction, automatic. Encyclop. Lang. Linguist. 18(10), 1411–1428 (2005)
David, M.B., Andrew, Y.N., Michael, I.J.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(1), 993–1022 (2003)
David, J., MacKay, C.: Information theory, inference and learning algorithms. Cambridge University Press, 1st edn (2003)
Davies, I., Green, P., Milton, S., Rosemann, M.: Using meta models for the comparison of ontologies. In: Proceedings of evaluation of modeling methods in systems analysis and design workshop-EMMSAD’03 (2003)
De Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. Technical report. Stanford University, pp 338–345 (2008)
Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.: Indexing by latent semantic analysis. Journal. of. the. Association. for. Information. Science. and. Technology. 41(6), 391–407 (1990)
Deng, L., Dong, Y.: Deep learning: methods and applications. Found. Trends. Signal. Process. 7(3–4), 197–387 (2014)
Diaconis, P.: Finite forms of de finetti’s theorem on exchangeability. Synthese. J 36(2), 271–281 (1977)
Dietterich, T.G.: Ensemble methods in machine learning. Lect. Notes. Comput. Sci. 1857, 1–15 (2000)
Faatz, A., Steinmetz, R.: Ontology enrichment with texts fromthe www. In: Proceedings of the ECML/PKDD. Second Workshop on Semantic Web Mining. Helsinki (2002)
Fanizzi, N., d’Amato, C., Esposito, F.: DL-FOIL concept learning in description logics. In: Zelezny, F, Lavrač, N (eds.) Inductive logic programming. Lecture Notes in Computer Science. Springer, vol 5194, pp 107–121 (2008)
Faure, D., Poibeau, T.: First experiments of using semantic knowledge learned by ASIUM for information extraction task using INTEX. In: Proceedings of the first international conference on ontology learning, vol V31, pp 7–12 (2000)
Goadrich, M., Oliphant, L., Shavlik, J.: Gleaner: creating ensembles of first-order clauses to improve recall-precision curves. Mach. Learn. 64, 231–261 (2006)
Gruber, T.R.: A translation approach to portable ontology specifications. Knowl. Acquis. 5(2), 199–220 (2003)
Guo, W., Liang, L., Deng, T.: Topic mining for call centers based on A-LDA and distributed computing. Concurr. Comput:. Pract. Exper. 29(3), 319–343 (2016)
Guo, W., Liang, L., Deng, T.: Topic mining for call centers based on a-lda and distributed computing. Concurr. Comput:. Pract. Exper. 29(3), e3776 (2017)
Gutiérrez-Batista, K, Campaña, JR, Vila, M., Martín-Bautista, MJ: An ontology-based framework for automatic topic detection in multilingual environments. Int. J. Intell. Syst. 33(7), 1459–1475 (2018)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings ofthe Fifteenth conference on Uncertainty in artificial intelligence, pp 289–296 (1990)
Hu, D., Wang, W., Liu, S., Xie, N., Yin, G.: Text segmentation model based LDA and ontology for question answering in agriculture. In: Proceedings of the World agricultural outlook conference, pp 307–319 (2014)
Hubert, N.K., Thomas, D.: Vers une approche sémantique de la détection de cyberattaques Conférence de Recherche en Informatique (2013)
Isaly, L.A.: Augmenting latent Dirichlet allocation and rank threshold detection with ontologies. Air Force Inst of Tech Wrightpatterson Afb of Dept of Graduate Computer Science (2010)
Ivanova, T.: Ontology learning technologies - brief survey, trends and problems. In: Proceedings of the international conference on information technologies, pp 245–255 (2012)
Jiang, J., Zhai, C.X.: A systematic exploration of the feature space for relation extraction. In: Proceedings of the annual conference of the North American chapter of the association for computational linguistics. Rochester, pp 113–120 (2007)
Jiang, X., Huang, Y., Nickel, M., Tresp, V.: Combining information extraction, deductive reasoning and machine learning for relation prediction. In: The semantic Web: Research and applications. ESWC 2012. Lecture Notes in Computer Science. Springer, Berlin, vol 7295, pp 164–178 (2012)
Khadir, A.C., Aliane, H., Guessoum, A.: Ontology learning: grand tour and challenges. Comput. Sci. Rev. 39(1–2), 100339 (2021)
Klarman, S., Britz, K.: Towards unsupervised ontology learning from data. In: Proceedings of the 2015 international conference on defeasible and ampliative reasoning, vol 1423, pp 29–35 (2015)
Kruijff, G-JM.: Formal and computational aspects of dependency grammar: history and development of DG. Technical report, ESSLLI (2002)
LeCun, Y.Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lehmann, J., Fanizzi, N., Bühmann, L, d’Amato, C.: Concept learning, perspectives on ontology learning, vol 18 of Studies on the Semantic Web IOS Press, Amsterdam, pp 71–91 (2014)
Liu, W., Weichselbraun, A., Scharl, A., Chang, E.: Semi-automatic ontology extension using spreading activation. J. Univ. Knowl. Manag. 0(1), 50–58 (2005)
Liu, X-Y, Wu, J., Zhou, Z-H: Exploratory undersampling for class-imbalance learning. Trans. Sys. Man. Cyber. Part. B 39(2), 539–550 (2009)
Maedche, A., Volz, R.: The text-to-onto ontology ex-traction and maintenance environment. In: Proceedings of the ICDM workshop on integrating data mining and knowledge management. San Jose (2001)
Maedche, A., Staab, S.: Measuring similarity between ontologies. In: EKAW. Springer, Heidelberg, pp 251–263 (2002)
Maedche, A., Zacharias, V.: Clustering ontology-based metadata in the semantic Web. In: European conference on principles of data mining and knowledge discovery, pp 348–360 (2002)
Michael, B.F., Eduard, H.H.: Fine grained classification of named entities. In: Proceedings of the 19th international conference on computational linguistics. Association for Computational Linguistics. Morristown, pp 1–7 (2002)
Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space, arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119. Curran Associates, Inc., Red Hook (2013)
Missikof, M., Navigli, R., Velardi, P.: Integrated approach to web ontology learning and engineering. Comput. J 35(11), 60–63 (2002)
Morin, M.: Acquisition de patrons lexico-syntaxiques caractéristiques d’une relation sémantique. Traitement. automatique. des. langues. 40(1), 143–16 (2020)
Muggleton, S.: Inverse entailment and progol. New generation computing. Special. issue. on. Inductive. Logic. Programming. 13, 245–286 (1995)
Muggleton, S., Alireza, T.N.: QG/GA: a stochastic search for Progol. Mach. Learn. 70(2–3), 123–133 (2007)
Muggleton, S., Feng, C.: Efficient induction of logic programs. In: Muggleton, S (ed.) Inductive logic programming, pp 281–298. Academic Press, London (1992)
Noy, N.F., McGuinness, D.L.: Ontology development 101: a guide to creating your first ontology (2001)
Ohgren, A., Sandkuhl, K.: Towards a methodology for ontology development in small and mediumsized enterprises. In: IADIS Conference on applied computing. Algarve (2005)
Petrucci, G., Ghidini, C., Rospocher, M.: Ontology learning in the deep. In: Knowledge engineering and knowledge management: 20th international conference, ekaW 2016, Bologna, Italy, November 19-23, 2016, Proceedings 20. Springer, pp 480–495 (2016)
Petrucci, G., Rospocher, M., Ghidini, C.: Expressive ontology learning as neural machine translation. J. Web. Semantics. 52–53, 66–82 (2018)
Posch, P.: Enriching ontologies with encyclopedic background knowledge for document indexing. In: Mika, P, et al (eds.) The semantic Web. Lecture notes in computer science. Springer, vol 8797 (2014)
Pyysalo, S., Airola, A., Heimonen, J., Bjore, J., Ginter, F.: Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics. 9. Suppl. 3, S6 (2008)
Qianqian, D.G., Liu, S., Tu, Q.: A latent-dirichlet-allocation based extension for domain ontology of enterprise’s technological innovation. Int. J. Comput. Commun. Control. 14, 107–123 (2019)
Robert, E.: CORPORUM-OntoExtract. Ontology extraction tool technical report deliverable 6 Ontoknowledge (2001)
Rohlf, F.J.: Algorithm 76: hierarchical clustering using the minimum spanning tree. Comput. J 16, 93–95 (1973)
Sabou, M., Wroe, C., Goble, C., Mishne, G.: Learning domain ontologies for web service descriptions: an experiment in bioinformatics. In: ACM WWW, pp 190–198 (2005)
Schmidhuber, J.: Deep learning in neural networks: An overview. Neural. Netw. 61, 85–117 (2015)
Stephen, M., José, S., Alireza, T.N.: ProGolem: a system based on relative minimal generalisation. In: De Raedt, L (ed.) Inductive logic programming. Lecture Notes in Computer Science, vol 5989, pp 131–148 (2010)
Steve, J., Paynter, G.P.: Automatic extraction of document key phrases for use in digital libraries: evaluation and applications. J. Am. Soc. Inf. Sci. Technol. 53(8), 653–677 (2002)
Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Landauer, TK, McNamara, DS, Dennis, S, Kintsch, W (eds.) Handbook of latent semantic analysis, pp 427–448. Lawrence Erlbaum Associates Publishers (2007)
Teja Santosha, D., Sudheer Babua, K., Prasada, S.D.V., Vivekananda, A.: Opinion mining of online product reviews from traditional LDA topic clusters using feature ontology tree and sentiwordnet. Educ. Manag. Eng. 6(1), 34–44 (2016)
Tissaoui, A., Sassi, S., Chbeir, R.: LEOnto: new approach for ontology enrichment using LDA. In: Proceedings of the 12th international conference on management of digital EcoSystems (MEDES 2020). ACM, pp 132–139 (2020)
Tissaoui, A., Sassi, S., Chbeir, R.: Probabilistic topic models for enriching ontology from texts. SN. Comput. Sci. J 1, 336 (2020)
Velardia, P., Fabriani, P., Missikoff, M.: Using text processing techniques to automatically enrich a domain ontology. In: Proceedings of the international conference on formal ontology in information systems, vol. 270–284 (2001)
Wei, X.L., Sun, Y., Zhang, S.K., MIAO, Y.J.: Ontological concept extraction method based on maximum entropy model. Comput. Eng. 35 (24), 114–116 (2009)
Wu, S., Hsu, W.: SOAT: a semi-automatic domain ontology acquisition tool from Chinese corpus. In: Proceedings of the 19th International conference on computational linguistics (2002)
Xu, F., Kurz, D., Piskorski, J., Schmeier, S.: A domain adaptive approach to automatic acquisition of domain relevant terms and their relations with boot strapping. In: Proceedings of the 3rd international conference on language resources an evaluation (2002)
Yeh, Jh, Yang, N.: Ontology construction based on latent topic extraction in a digital library. In: International conference on asian digital libraries. Springer, pp 93–103 (2008)
Zavitsanos, E., Paliouras, G., Vouros, G.A., Petridis, S.: Discovering subsumption hierarchies of ontology concepts from text corpora. In: IEEE/WIC/ACM International conference on Web intelligence (WI’07). Fremont, pp 402–408 (2007)
Zhou, L.: Ontology learning: state of the art and open issues. Inf. Technol. Manag. 8(3), 241–252 (2007)
Funding
The authors declare they have no financial interests.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Computational Aspects of Network Science Guest Editors: Apostolos N. Papadopoulos and Richard Chbeir
Rights and permissions
About this article
Cite this article
Sassi, S., Tissaoui, A. & Chbeir, R. LEOnto+: a scalable ontology enrichment approach. World Wide Web 25, 2347–2378 (2022). https://doi.org/10.1007/s11280-021-00997-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-021-00997-x