Skip to main content
Log in

LEOnto+: a scalable ontology enrichment approach

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Distributional semantic models like the Latent Dirichlet Allocation (LDA) model Guo et al. (Concurr. Comput.: Pract. Exper. 29(3), 319–343 2016) consist of defining similar representation of words according to their similar context. LDA has been originally used to model documents and extract topics in Information Retrieval. In recent years, LDA has become a hot topic among ontology learning because of the exponential increase of the number of documents and textual data not only on the web but also in digital libraries. LDA-based approaches have proven to provide the best result. However, they suffer of several limitations related to concept and relation extraction, as well as handling the corpus evolution and maintaining. In order to cope with these problems, we propose in this paper LEOnto+, an extended version of LEOnto (Tissaoui et al. 2020, Tissaoui et al. SN Comput. Sci. J. 1: 336 2020), to provide a new approach for automatic ontology enriching from textual corpus. In LEOnto+, LDA is used to provide dimension reduction and to identify semantic relationships between topic-document and word-topic using probability distributions. Here, we provide several experiments conducted using several evaluation techniques (Evaluation based criteria, Gold standard evaluation, Expert evaluation, Task-based evaluation and Corpus-based evaluation). We also compare the results of LEOnto+ with two existing methods using their respective datasets. The evaluation results show that LEOnto+ outperforms the aforementioned methods (particularly in terms of precision). We also compare our approach using two large corpus in order to demonstrate its scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. This difficulty in capturing the knowledge required by knowledge-based systems is called “knowledge acquisition bottleneck”.

  2. LEOnto is limited to extracting concepts and few relations.

  3. https://elmcip.net/platformsoftware/rita

  4. https://wordnet.princeton.edu/

  5. https://spider.sigappfr.org/research-projects/corpus-journal-www/

  6. https://git.gesis.org/open-data/solis-sofis/blob/18389fb2b6cecd7fe8d5b6b4e89d40eda33a7191/readme.md

  7. https://www.nlm.nih.gov/bsd/pmresources.html

  8. https://paperswithcode.com/datasets

  9. https://www.nlm.nih.gov/portals/researchers.html

  10. https://www.w3.org/TR/sparql11-query/

References

  1. Abeer Al-Arfaj, A., Al-Salman, A.: Ontology construction from text: challenges and trends. Int. J. Artif. Intell. Expert. Syst. 6(2), 15–26 (2015)

    Google Scholar 

  2. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: ACM SIGMOD’93 international conference on management of data. Washington, pp 207–216 (1993)

  3. Albukhitan, S., Helmy, T., Alnazer, A.: Arabic ontology learning using deep learning. In: Proceedings of the international conference on Web intelligence. ACM, pp 1138–1142 (2017)

  4. Asfari, O., Hannachi, L., Bentayeb, F., Boussaid, O.: On-tological topic modeling to extract Twitter users’ topics of interest. In: 8th International conference on information technology and applications (ICITA), pp 141–146 (2013)

  5. Asuncion, G.P., Manzano-Macho, D.: A survey of ontology learning methods and techniques. Technical Report D1.5.. Madrid (2003)

  6. Aussenac-Gilles, N., Seguela, P.: Les relations sémantiques : de linguistique au formel. Cahiers. de. Grammaire. 25, 175–198 (2000)

    Google Scholar 

  7. Bachimont, B.: Engagement sémantique et engagement ontologique : conception et réalisation d’ontologies en ingénierie des connaissances. In: Proceedings: ingénierie des connaissances: évolutions récentes et nouveaux défis (2000)

  8. Benaissa, B., Bouchiha, D., Zouaoui, A., Doumi, N.: Building arabic ontology from texts. Procedía. Comput. Sci. 73, 7–15 (2015)

    Article  Google Scholar 

  9. Benomrane, S., Sellami, Z., BenAyed, M.: An ontologist feedback driven ontology evolution with an adaptive multiagent system. Adv. Eng. Inform. 30(3), 337–353 (2016)

    Article  Google Scholar 

  10. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Sci. Amer. J 284(5), 34–43 (2001)

    Article  Google Scholar 

  11. Biébow, B, Szulman, S.: TERMINAE: a linguistic-based tool for the building of a domain ontology. In: Proceedings of the 11th European workshop on knowledge acquisition, modelling and management. LCNS Springer, pp 49–66 (1999)

  12. Bisson, G., Nédellec, C, Canamero, D.: Designing clustering methods for ontology building-the Mo’K workbench. Proceedings. of. the. First. International. Conference. on. Ontology. Learning. V31, 13–28 (2000)

    Google Scholar 

  13. Blomqvist, E.: Fully automatic construction of enterprise ontologies using design patterns: initial method and first experiences. In: Proceedings of the 4th international conference on ontologies, databases and applications of semantics (ODBASE) (2005)

  14. Casteleiro, M.A., George, D., Read, W., Maria, J.F.P., Nava, M., Diego, M.F., Nenadic, G., Klein, J., Keane, J., Stevens, R.: Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature. J. Biomed. Semantics. 9(1), 2–24 (2018)

    Google Scholar 

  15. Casteleiro, M., Prieto, M., Demetriou, G., Maroto, N., Read, W., Maseda, F.D., Diz, J., Nenadic, G., Keane, J., Stevens, J.: Ontology learning with deep learning: a case study on patient safety using PubMed. In: Proceedings of semantic Web applications and tools for healthcare and life sciences, p V1795 (2016)

  16. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Data mining and knowledge discovery handbook. Springer, pp 853–867 (2005)

  17. Cimiano, P.: Ontology learning and population from text: algorithms, evaluation and applications. Springer-Verlag New York Inc., New York (2006)

    Google Scholar 

  18. Cimiano, P., Mädche, A, Staab, S., Völker, J: Ontology learning. In: Handbook on ontologies. Springer, pp 245–267 (2009)

  19. Cimiano, P., Steffen, S.: Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In: Proceedings of the ICML workshop on learning and extending lexical ontologies with machine learning methods (2005)

  20. Cimiano, P., Volker, J., Studer, R.: Ontologies on demand? - A description of the state-of-the-art, applications, challenges and trends for ontology learning from text. Inform. Wissenschaft. und. Praxis. J 57(6-7), 315–320 (2009)

    Google Scholar 

  21. Confort, V.T., Revoredo, K., Baiao, F.A., Santoro, F.M.: Ontology extraction from stories: an exploratory study in storytelling. In: International conference on knowledge management in organizations. Springer, pp 477–491 (2015)

  22. Cunningham, H.: Information extraction, automatic. Encyclop. Lang. Linguist. 18(10), 1411–1428 (2005)

    Google Scholar 

  23. David, M.B., Andrew, Y.N., Michael, I.J.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(1), 993–1022 (2003)

    MATH  Google Scholar 

  24. David, J., MacKay, C.: Information theory, inference and learning algorithms. Cambridge University Press, 1st edn (2003)

  25. Davies, I., Green, P., Milton, S., Rosemann, M.: Using meta models for the comparison of ontologies. In: Proceedings of evaluation of modeling methods in systems analysis and design workshop-EMMSAD’03 (2003)

  26. De Marneffe, M.C., Manning, C.D.: Stanford typed dependencies manual. Technical report. Stanford University, pp 338–345 (2008)

  27. Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.: Indexing by latent semantic analysis. Journal. of. the. Association. for. Information. Science. and. Technology. 41(6), 391–407 (1990)

    Google Scholar 

  28. Deng, L., Dong, Y.: Deep learning: methods and applications. Found. Trends. Signal. Process. 7(3–4), 197–387 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  29. Diaconis, P.: Finite forms of de finetti’s theorem on exchangeability. Synthese. J 36(2), 271–281 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  30. Dietterich, T.G.: Ensemble methods in machine learning. Lect. Notes. Comput. Sci. 1857, 1–15 (2000)

    Article  Google Scholar 

  31. Faatz, A., Steinmetz, R.: Ontology enrichment with texts fromthe www. In: Proceedings of the ECML/PKDD. Second Workshop on Semantic Web Mining. Helsinki (2002)

  32. Fanizzi, N., d’Amato, C., Esposito, F.: DL-FOIL concept learning in description logics. In: Zelezny, F, Lavrač, N (eds.) Inductive logic programming. Lecture Notes in Computer Science. Springer, vol 5194, pp 107–121 (2008)

  33. Faure, D., Poibeau, T.: First experiments of using semantic knowledge learned by ASIUM for information extraction task using INTEX. In: Proceedings of the first international conference on ontology learning, vol V31, pp 7–12 (2000)

  34. Goadrich, M., Oliphant, L., Shavlik, J.: Gleaner: creating ensembles of first-order clauses to improve recall-precision curves. Mach. Learn. 64, 231–261 (2006)

    Article  MATH  Google Scholar 

  35. Gruber, T.R.: A translation approach to portable ontology specifications. Knowl. Acquis. 5(2), 199–220 (2003)

    Article  Google Scholar 

  36. Guo, W., Liang, L., Deng, T.: Topic mining for call centers based on A-LDA and distributed computing. Concurr. Comput:. Pract. Exper. 29(3), 319–343 (2016)

    Google Scholar 

  37. Guo, W., Liang, L., Deng, T.: Topic mining for call centers based on a-lda and distributed computing. Concurr. Comput:. Pract. Exper. 29(3), e3776 (2017)

    Article  Google Scholar 

  38. Gutiérrez-Batista, K, Campaña, JR, Vila, M., Martín-Bautista, MJ: An ontology-based framework for automatic topic detection in multilingual environments. Int. J. Intell. Syst. 33(7), 1459–1475 (2018)

    Article  Google Scholar 

  39. Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings ofthe Fifteenth conference on Uncertainty in artificial intelligence, pp 289–296 (1990)

  40. Hu, D., Wang, W., Liu, S., Xie, N., Yin, G.: Text segmentation model based LDA and ontology for question answering in agriculture. In: Proceedings of the World agricultural outlook conference, pp 307–319 (2014)

  41. Hubert, N.K., Thomas, D.: Vers une approche sémantique de la détection de cyberattaques Conférence de Recherche en Informatique (2013)

  42. Isaly, L.A.: Augmenting latent Dirichlet allocation and rank threshold detection with ontologies. Air Force Inst of Tech Wrightpatterson Afb of Dept of Graduate Computer Science (2010)

  43. Ivanova, T.: Ontology learning technologies - brief survey, trends and problems. In: Proceedings of the international conference on information technologies, pp 245–255 (2012)

  44. Jiang, J., Zhai, C.X.: A systematic exploration of the feature space for relation extraction. In: Proceedings of the annual conference of the North American chapter of the association for computational linguistics. Rochester, pp 113–120 (2007)

  45. Jiang, X., Huang, Y., Nickel, M., Tresp, V.: Combining information extraction, deductive reasoning and machine learning for relation prediction. In: The semantic Web: Research and applications. ESWC 2012. Lecture Notes in Computer Science. Springer, Berlin, vol 7295, pp 164–178 (2012)

  46. Khadir, A.C., Aliane, H., Guessoum, A.: Ontology learning: grand tour and challenges. Comput. Sci. Rev. 39(1–2), 100339 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  47. Klarman, S., Britz, K.: Towards unsupervised ontology learning from data. In: Proceedings of the 2015 international conference on defeasible and ampliative reasoning, vol 1423, pp 29–35 (2015)

  48. Kruijff, G-JM.: Formal and computational aspects of dependency grammar: history and development of DG. Technical report, ESSLLI (2002)

    Google Scholar 

  49. LeCun, Y.Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  50. Lehmann, J., Fanizzi, N., Bühmann, L, d’Amato, C.: Concept learning, perspectives on ontology learning, vol 18 of Studies on the Semantic Web IOS Press, Amsterdam, pp 71–91 (2014)

  51. Liu, W., Weichselbraun, A., Scharl, A., Chang, E.: Semi-automatic ontology extension using spreading activation. J. Univ. Knowl. Manag. 0(1), 50–58 (2005)

    Google Scholar 

  52. Liu, X-Y, Wu, J., Zhou, Z-H: Exploratory undersampling for class-imbalance learning. Trans. Sys. Man. Cyber. Part. B 39(2), 539–550 (2009)

    Article  Google Scholar 

  53. Maedche, A., Volz, R.: The text-to-onto ontology ex-traction and maintenance environment. In: Proceedings of the ICDM workshop on integrating data mining and knowledge management. San Jose (2001)

  54. Maedche, A., Staab, S.: Measuring similarity between ontologies. In: EKAW. Springer, Heidelberg, pp 251–263 (2002)

  55. Maedche, A., Zacharias, V.: Clustering ontology-based metadata in the semantic Web. In: European conference on principles of data mining and knowledge discovery, pp 348–360 (2002)

  56. Michael, B.F., Eduard, H.H.: Fine grained classification of named entities. In: Proceedings of the 19th international conference on computational linguistics. Association for Computational Linguistics. Morristown, pp 1–7 (2002)

  57. Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space, arXiv:1301.3781 (2013)

  58. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119. Curran Associates, Inc., Red Hook (2013)

  59. Missikof, M., Navigli, R., Velardi, P.: Integrated approach to web ontology learning and engineering. Comput. J 35(11), 60–63 (2002)

    Google Scholar 

  60. Morin, M.: Acquisition de patrons lexico-syntaxiques caractéristiques d’une relation sémantique. Traitement. automatique. des. langues. 40(1), 143–16 (2020)

    Google Scholar 

  61. Muggleton, S.: Inverse entailment and progol. New generation computing. Special. issue. on. Inductive. Logic. Programming. 13, 245–286 (1995)

    Google Scholar 

  62. Muggleton, S., Alireza, T.N.: QG/GA: a stochastic search for Progol. Mach. Learn. 70(2–3), 123–133 (2007)

    MATH  Google Scholar 

  63. Muggleton, S., Feng, C.: Efficient induction of logic programs. In: Muggleton, S (ed.) Inductive logic programming, pp 281–298. Academic Press, London (1992)

  64. Noy, N.F., McGuinness, D.L.: Ontology development 101: a guide to creating your first ontology (2001)

  65. Ohgren, A., Sandkuhl, K.: Towards a methodology for ontology development in small and mediumsized enterprises. In: IADIS Conference on applied computing. Algarve (2005)

  66. Petrucci, G., Ghidini, C., Rospocher, M.: Ontology learning in the deep. In: Knowledge engineering and knowledge management: 20th international conference, ekaW 2016, Bologna, Italy, November 19-23, 2016, Proceedings 20. Springer, pp 480–495 (2016)

  67. Petrucci, G., Rospocher, M., Ghidini, C.: Expressive ontology learning as neural machine translation. J. Web. Semantics. 52–53, 66–82 (2018)

    Article  Google Scholar 

  68. Posch, P.: Enriching ontologies with encyclopedic background knowledge for document indexing. In: Mika, P, et al (eds.) The semantic Web. Lecture notes in computer science. Springer, vol 8797 (2014)

  69. Pyysalo, S., Airola, A., Heimonen, J., Bjore, J., Ginter, F.: Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics. 9. Suppl. 3, S6 (2008)

    Article  Google Scholar 

  70. Qianqian, D.G., Liu, S., Tu, Q.: A latent-dirichlet-allocation based extension for domain ontology of enterprise’s technological innovation. Int. J. Comput. Commun. Control. 14, 107–123 (2019)

    Article  Google Scholar 

  71. Robert, E.: CORPORUM-OntoExtract. Ontology extraction tool technical report deliverable 6 Ontoknowledge (2001)

  72. Rohlf, F.J.: Algorithm 76: hierarchical clustering using the minimum spanning tree. Comput. J 16, 93–95 (1973)

    Google Scholar 

  73. Sabou, M., Wroe, C., Goble, C., Mishne, G.: Learning domain ontologies for web service descriptions: an experiment in bioinformatics. In: ACM WWW, pp 190–198 (2005)

  74. Schmidhuber, J.: Deep learning in neural networks: An overview. Neural. Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  75. Stephen, M., José, S., Alireza, T.N.: ProGolem: a system based on relative minimal generalisation. In: De Raedt, L (ed.) Inductive logic programming. Lecture Notes in Computer Science, vol 5989, pp 131–148 (2010)

  76. Steve, J., Paynter, G.P.: Automatic extraction of document key phrases for use in digital libraries: evaluation and applications. J. Am. Soc. Inf. Sci. Technol. 53(8), 653–677 (2002)

    Article  Google Scholar 

  77. Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Landauer, TK, McNamara, DS, Dennis, S, Kintsch, W (eds.) Handbook of latent semantic analysis, pp 427–448. Lawrence Erlbaum Associates Publishers (2007)

  78. Teja Santosha, D., Sudheer Babua, K., Prasada, S.D.V., Vivekananda, A.: Opinion mining of online product reviews from traditional LDA topic clusters using feature ontology tree and sentiwordnet. Educ. Manag. Eng. 6(1), 34–44 (2016)

    Google Scholar 

  79. Tissaoui, A., Sassi, S., Chbeir, R.: LEOnto: new approach for ontology enrichment using LDA. In: Proceedings of the 12th international conference on management of digital EcoSystems (MEDES 2020). ACM, pp 132–139 (2020)

  80. Tissaoui, A., Sassi, S., Chbeir, R.: Probabilistic topic models for enriching ontology from texts. SN. Comput. Sci. J 1, 336 (2020)

    Article  Google Scholar 

  81. Velardia, P., Fabriani, P., Missikoff, M.: Using text processing techniques to automatically enrich a domain ontology. In: Proceedings of the international conference on formal ontology in information systems, vol. 270–284 (2001)

  82. Wei, X.L., Sun, Y., Zhang, S.K., MIAO, Y.J.: Ontological concept extraction method based on maximum entropy model. Comput. Eng. 35 (24), 114–116 (2009)

    Google Scholar 

  83. Wu, S., Hsu, W.: SOAT: a semi-automatic domain ontology acquisition tool from Chinese corpus. In: Proceedings of the 19th International conference on computational linguistics (2002)

  84. Xu, F., Kurz, D., Piskorski, J., Schmeier, S.: A domain adaptive approach to automatic acquisition of domain relevant terms and their relations with boot strapping. In: Proceedings of the 3rd international conference on language resources an evaluation (2002)

  85. Yeh, Jh, Yang, N.: Ontology construction based on latent topic extraction in a digital library. In: International conference on asian digital libraries. Springer, pp 93–103 (2008)

  86. Zavitsanos, E., Paliouras, G., Vouros, G.A., Petridis, S.: Discovering subsumption hierarchies of ontology concepts from text corpora. In: IEEE/WIC/ACM International conference on Web intelligence (WI’07). Fremont, pp 402–408 (2007)

  87. Zhou, L.: Ontology learning: state of the art and open issues. Inf. Technol. Manag. 8(3), 241–252 (2007)

    Article  Google Scholar 

Download references

Funding

The authors declare they have no financial interests.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anis Tissaoui.

Ethics declarations

Conflict of Interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Computational Aspects of Network Science Guest Editors: Apostolos N. Papadopoulos and Richard Chbeir

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sassi, S., Tissaoui, A. & Chbeir, R. LEOnto+: a scalable ontology enrichment approach. World Wide Web 25, 2347–2378 (2022). https://doi.org/10.1007/s11280-021-00997-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-021-00997-x

Keywords

Navigation