Skip to main content
Log in

Ontology learning: state of the art and open issues

  • Published:
Information Technology and Management Aims and scope Submit manuscript

Abstract

Ontology is one of the fundamental cornerstones of the semantic Web. The pervasive use of ontologies in information sharing and knowledge management calls for efficient and effective approaches to ontology development. Ontology learning, which seeks to discover ontological knowledge from various forms of data automatically or semi-automatically, can overcome the bottleneck of ontology acquisition in ontology development. Despite the significant progress in ontology learning research over the past decade, there remain a number of open problems in this field. This paper provides a comprehensive review and discussion of major issues, challenges, and opportunities in ontology learning. We propose a new learning-oriented model for ontology development and a framework for ontology learning. Moreover, we identify and discuss important dimensions for classifying ontology learning approaches and techniques. In light of the impact of domain on choosing ontology learning approaches, we summarize domain characteristics that can facilitate future ontology learning effort. The paper offers a road map and a variety of insights about this fast-growing field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. T.R. Gruber, A translation approach to portable ontologies, Knowledge Acquisition 5 (1993) 199–220.

    Article  Google Scholar 

  2. N.F. Noy and M.A. Musen, Ontology versioning in an ontology management framework, Intelligent Systems, IEEE [see also IEEE Intelligent Systems and Their Applications] 19 (2004) 6–13.

    Google Scholar 

  3. A.T. Schreiber, B. Dubbeldam, J. Wielemaker and B. Wielinga, Ontology-based photo annotation, Intelligent Systems, IEEE [see also IEEE Intelligent Systems and Their Applications] 16 (2001) 66–74.

    Google Scholar 

  4. Z. Duo, L. Juan-Zi and X. Bin, Web Service Annotation Using Ontology Mapping, presented at Service-Oriented System Engineering, 2005. SOSE 2005. IEEE International Workshop (2005).

  5. D. Fensel, Ontology-based knowledge management, Computer 35 (2002) 56–59.

    Article  Google Scholar 

  6. M. Baziz, M. Boughanem, N. Aussenac-Gilles and C. Chrisment, Semantic cores for representing documents in IR. Proceedings of the 2005 ACM symposium on Applied computing, Santa Fe, New Mexico, ACM Press, 2005, pp. 1011–1017.

  7. L. Khan, D. McLeod and E. Hovy, Retrieval effectiveness of an ontology-based model for information selection, The VLDB Journal 13 (2004) 71–85.

    Article  Google Scholar 

  8. J. Hendler and D.L. McGuinness, DARPA agent markup language, IEEE Intelligent Systems 15 (2001) 72–73.

    Google Scholar 

  9. H. Takeda, K. Iino and T. Nishida, Ontology-supported agent communication, presented at Working Notes of the AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments (1995).

  10. H. Takeda, K. Iwata, M. Takaai, A. Sawada and T. Nishida, An ontology-based cooperative environment for real-world agents, presented at Second International Conference on Multiagent Systems (1996).

  11. M.F. Lopez, A. Gomez-Perez, J.P. Sierra and A.P. Sierra, Building a chemical ontology using methontology and the ontology design environment, Intelligent Systems and Their Applications, IEEE [see also IEEE Intelligent Systems] 14 (1999) 37–46.

    Google Scholar 

  12. R. Navigli, P. Velardi and A. Gangemi, Ontology learning and its application to automated terminology translation, Intelligent Systems, IEEE [see also IEEE Intelligent Systems and Their Applications] 18 (2003) 22–31.

    Google Scholar 

  13. N. Guarino, Formal ontology, conceptual analysis and knowledge representation, International Journal of Human-Computer Studies 43 (1995) 625–640.

    Article  Google Scholar 

  14. R. Studer, R. Benjamins and D. Fensel, Knowledge engineering: Principles and methods, Data and Knowledge Engineering 25 (1998) 161–197.

    Article  Google Scholar 

  15. D. Brickley and R. Guha, Resource description framework (RDF) schema specification 1.0, vol. 2000: W3C recommendation (2000).

  16. D.L. McGuinness and F.v. Harmelen, Web Ontology Language (OWL): Overview (2003).

  17. M. Minskey, A framework for representing knowledge, in: P.H. Winston, (ed.) The Psychology of Computer Vision (McGraw-Hill, New York, 1975).

    Google Scholar 

  18. Y. Ding, Ontology research and development part1 – A review of ontology generation, Journal of Information Science, 28 (2002) 123–136.

    Google Scholar 

  19. A. Gomez-Perez, Some ideas and examples to evaluate ontologies, presented at Artificial Intelligence for Applications, 1995. Proceedings of 11th Conference on 1995.

  20. S. Staab, A. Gomez-Perez, W. Daelemana, M.-L. Reinberger and N.F. Noy, Why evaluate ontology technologies? Because it works!, Intelligent Systems, IEEE 19 (2004) 74–81.

    Google Scholar 

  21. Y. Sure, S. Staab and R. Studer, Methodology for development and employment of ontology based knowledge management applications, SIGMOD Rec 31 (2002) 18–23.

    Article  Google Scholar 

  22. L. Ding, T. Finin, A. Joshi, R. Pan, R.S. Cost, J. Sachs, V. Doshi, P. Reddivari and Y. Peng, Swoogle: A Search and Metadata Engine for the Semantic Web, presented at Thirteenth ACM Conference on Information and Knowledge Management (CIKM), Washington DC, 2004.

  23. A. Maedche and S. Staab, Ontology learning for the Semantic Web, IEEE Intelligent Systems: Special Issue on the Semantic Web 16 (2001) 72–79.

    Google Scholar 

  24. M. Uschold and M. Gruninger, Ontologies: principles, methods, and applications, Knowledge Engineering Review 11 (1996) 93–155.

    Article  Google Scholar 

  25. D. Faure and C. Nedellec, A corpus-based conceptual clustering method for verb frames and ontology, presented at LREC Workshop on Adapting lexical and corpus resources to sublanguages and applications, 1998.

  26. G. Bisson, C. Nedellec and L. Canamero, Designing clustering methods for ontology building – The Mo’K workbench, presented at ECAI Ontology Learning Workshop, Seattle, WA, 2000.

  27. P. Buitelaar, D. Olejnik and M. Sintek, A protégé plug-in for ontology extraction from text based on linguistic analysis, presented at the 1st European Semantic Web Symposium (ESWS), Heraklion, Greece, 2004.

  28. M. Missikoff, R. Navigli and P. Velardi, Integrated approach to web ontology learning and engineering, IEEE Computer (2002) 60–63.

  29. M. Denny, Ontology building: A survey of editing tools, 2002.

  30. A. Farquhar, R. Fikes and J. Rice, The Ontolingua server: a tool for collaborative ontology construction, International Journal of Human-Computer Studies 46 (1997) 707–728.

    Article  Google Scholar 

  31. M. Missikoff and X.F. Wang, Consys – a group decision-making support system for collaborative ontology building, presented at International Conference on Group Decision and Negotiation, 2001.

  32. L. Zhou, Q.E. Booker and D. Zhang, ROD – toward rapid ontology development for underdeveloped domains, presented at 35th Hawaii International Conference on System Sciences, Big Island, Hawaii, 2002.

  33. N.F. Noy, M. Sintek, S. Decker, M. Crubezy, R.W. Fergerson and M.A. Musen, Creating semantic web contents with protege-2000, IEEE Intelligent Systems 16 (2001) 60–71.

    Article  Google Scholar 

  34. L. Zhou, Q. Booker and D. Zhang, ROD – toward rapid ontology development for underdeveloped domains, presented at 35th Hawaii International Conference on System Sciences (HICSS’35), Hawaii, USA, 2002.

  35. S. Soderland, Learning to extract text-based information from the world wide web, presented at Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997.

  36. J. Allen, Natural Language Understanding (The Benjamin /Cummings Publishing Company, Inc., 1995).

  37. G. Salton and M. McGill, Introduction to Modern Information Retrieval (McGraw-Hill Book Company, 1983).

  38. T. Hofmann and J. Puzicha, Statistical Models for Co-occurrence Data, AIM-1625 (1998) 21.

  39. B. Roark and E. Charniak, Noun-Phrase Co-occurrence Statistics for Semi-Automatic Semantic Lexicon Construction, presented at COLING-ACL, 1998.

  40. E. Riloff and J. Shepherd, A corpus-based bootstrapping algorithm for semi-automated semantic lexicon construction, Journal of Natural Language Engineering 5 (1999) 147–156.

    Article  Google Scholar 

  41. P. Jacob and U. Zernik, Acquiring lexical knowledge from text: A case study, presented at 7th National Conference on Artificial Intelligence, 1988.

  42. J.-U. Kietz, A. Mädche and R. Volz, A method for semi-automatic ontology acquisition from a corporate intranet, presented at Learning Language in Logic Workshop (LLL-2000), New Brunswick, NJ, 2000.

  43. W.B. Dolan, L. Vanderwende and S.D. Richardson, Automatically deriving structured knowledge bases from on-line dictionaries, presented at 1st Conference of the Pacific Association for Computational Linguistics, Vancouver, 1993.

  44. A. Shaikevich, Automatic construction of a thesaurus from explanatory dictionaries, Automatic Documentation and Mathematical Linguistics 19 (1985) 76–89.

    Google Scholar 

  45. S. Staab, An overview on machine learning for the semantic web, presented at Machine Learning for the Semantic Web, Schloss Dagstuhl, Waden, Germany, 2005.

  46. M.E. Califf and R.J. Mooney, Bottom-up relational learning of pattern matching rules for information extraction, Machine Learning Research 4 (2004) 177–210.

    Article  Google Scholar 

  47. M. Sanderson and B. Croft, Deriving concept hierarchies from text pages, presented at 22nd ACM SIGIR Conference, 1999.

  48. K. Church and P. Hanks, Word association norms, mutual information, and lexicography, Computational Linguistics 16 (1989).

  49. S.A. Caraballo, Automatic construction of a hypernym-labled noun hierarchy, presented at 37th Annual Meeting of the Association for Computational Linguistics, 1999.

  50. A. Maedche and S. Staab, Mining ontologies from text, presented at 12th International Workshop on Knowledge Engineering and Knowledge Management, French Riviera, 2000.

  51. F. Xu, D. Kurz, J. Piskorski and S. Schmeier, Term extraction and mining of term relations from unrestricted texts in the financial domain, presented at Business Information Systems, Poznan, Poland, 2002.

  52. D. Lin, Automatic retrieval and clustering of similar words, presented at COLING-ACL’98, Montreal, Canada, 1998.

  53. K. Morik, S. Wrobel, J.-U. Kietz and W. Emde, Knowledge Acquisition and Machine Learning: Theory, Methods, and Applications. (Academic Press, London, 1993).

    Google Scholar 

  54. C. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. (MIT Press, Cambridge, MA, 1999).

    Google Scholar 

  55. L. Marquez, Machine learning and natural language processing, 2000.

  56. G. Ruge, Experiments on linguistically based term associations, Information Processing & Management 28 (1992) 317–332.

    Article  Google Scholar 

  57. H. Schutze and J. Pederssn, A cooccurrence-based thesaurus and two applications to information retrieval, presented at RIAO’94, New York, 1994.

  58. C. Aone and S.W. Bennett, Evaluating automated and manual acquisition of anaphora resolution strategies, presented at Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, 1995.

  59. C. Barbu and R. Mitkov, Evaluation tool for rule-based anaphora resolution methods, presented at Meeting of the Association for Computational Linguistics, Toulouse, 2001.

  60. S.K.M. Wong, Y.J. Cai and Y.Y. Yao, Computation of term associations by a neural network, presented at SIGIR’93, Pittsburgh, PA, 1993.

  61. K.W. Church and W.A. Gale, A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams, Computer Speech and Language 5 (1991) 19–54.

    Article  Google Scholar 

  62. F. Jelinek and R. Mercer, Probability distribution estimation from sparse data, IBM Technical Disclosure Bulletin 28 (1985) 2591–2594.

    Google Scholar 

  63. P. Brown, V.D. Pietra, P. deSouza, J. Lai and R. Mercer, Class-based n-gram models of natural language, Computational Linguistics 18 (1992) 467–479.

    Google Scholar 

  64. D. Hindle, Noun Classification from predicate-argument structures, presented at ACL-90, Pittsburg, Pennsylvania, 1990.

  65. I. Dagan, S. Marcus and S. Markovitch, Contextual word similarity and estimation from sparse data, presented at ACL’93, Columbus, Ohio, 1993.

  66. F.C.N. Pereira, N. Tishby and L. Lee, Distributional clustering of English words, presented at 30th Annual Meeting of the ACL, 1993.

  67. E. Glover, D.M. Pennock, S. Lawrence and R. Krovetz, Inferring hierarchical descriptions, presented at 11th International Conference on Information and Knowledge Management (McLean, Virginia, 2002).

  68. H. Li and N. Abe, Clustering words with the MDL principle, Journal of Natural Language Processing 4 (1997).

  69. A. Doan, P. Domingos and A.Y. Levy, Learning source descriptions for data integration, presented at the International Workshop on The Web and Databases (WebDB), Berlin, 2000.

  70. Z. Tari, O.A. Bukhres, J. Stokes and S. Hammoudi, The reengineering of relational databases based on key and data correlations, presented at 7th Conference Database Semantics (DS-7), Chapman & Hall, 1997.

  71. M.A. Hearst, Automatic acquisition of hyponyms from large text corpora, presented at Proceedings of the Fourteenth International Conference on Computational Linguistics, Nantes, France, 1992.

  72. M.A. Hearst, Automated discovery o WordNet relations, in: C. Fellbaum (ed.) WordNet: An Electronic Lexical Database (MIT Press, 1998).

  73. E. Riloff and J. Shepherd, A corpus-based approach for building semantic lexicon, presented at the Second Conference on Empirical Methods in Natural Language Processing, 1997.

  74. N. Aussenac-Gilles, Supervised text analysis for ontology and terminology engineering, presented at Machine Learning for the Semantic Web, Schloss Dagstuhl, Waden, Germany, 2005.

  75. P. Wiemer-Hastings, A.C. Graesser and K. Wiemer-Hastings, Inferring the meaning of verbs from context, presented at 20th Annual Conference of the Cognitive Science Society, Mahwah, NJ, 1998.

  76. U. Hahn, K. Schnattinger, Ontology engineering via text understanding, presented at IFIP’98, 15th World Computer Congress, Vienna and Budapest, 1998.

  77. U. Hahn and K. Schnattinger, Towards text knowledge engineering, presented at 15th National Conference on Artificial Intelligence (AAAI ‘98), Madison, Wisconsin, 1998.

  78. M.-L. Reinberger, Unsupervised text mining for ontology learning, presented at Machine Learning for the Semantic Web, Schloss Dagstuhl, Waden, Germany, 2005.

  79. E. Riloff, Automatically constructing a dictionary for information extraction tasks, presented at 11th National Conference on Artificial Intelligence (AAAI’93), Washington, DC, 1993.

  80. J. Jannink and G. Wiederhold, Thesaurus entry extraction from an on-line dictionary, presented at 2nd International Conference on Information Fusion (Fusion-99), Omnipress, Wisconsin, 1999.

  81. S. Schlobach, Assertional Mining in Description Logics, in: Description Logics, (2000) 237–246.

  82. F. Ciravegna, S. Chapman, A. Dingli and Y. Wilks, Learning to harvest information for the semantic web, presented at 1st European Semantic Web Symposium (ESWS), Heraklion, Greece, 2004.

  83. D. Faure and C. Nedellec, A corpus-based conceptual clustering method for verb frames and ontology acquisition, presented at LREC-98 Workshop on Adapting Lexical and Corpus Resources to Sublan-guages and Applications, Paris, 1998.

  84. H. Assadi, Construction of a regional ontology from text and its use within a documentary system, presented at International conference on Formal Ontology in Information Systems (FOIS’98), Amsterdam, 1998.

  85. D. Faure and C. Nedellec, Knowledge acquisition of predicate-argument structures from technical texts using machine learning, presented at Current Developments in Knowledge Acquisition (EKAW-99), 1999.

  86. A. Thanopoulos, N. Fakotakis and G. Kokkinakis, Automatic extraction of semantic relations from specialized corpora, presented at COLING’2000, Saarbrucken, 2000.

  87. P. Cimiano, A. Hotho and S. Staab, Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text, presented at European Conference on Artificial Intelligence, Valencia, Spain, 2004.

  88. M. Finkelstein-Landau and E. Morin, Extracting semantic relationships between terms: supervised vs. unsupervised methods, presented at International Workshop on Ontological Engineering on the Global Information Infrastructure, Dagstuhl Castle, Germany, 1999.

  89. D.Z. Inkpen and G. Hirst, Automatic sense disambiguation of the near-synonyms in a dictionary entry, presented at Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2003), Mexico City, 2003.

  90. G. Grefenstette, Explorations in Automatic Thesaurus Discovery (Kluwer Academic Publishers, 1994).

  91. S. Rydin, Building a hyponymy lexicon with hierarchical structure, presented at SIGLEX Workshop on Unsupervised Lexical Acquisition, ACL’02, Philadelphia, Pennsylvania, 2002.

  92. J.H. Gennari, S.W. Tu, T.E. Rothenfluh and M.A. Musen, Mapping domains to methods in support of reuse, International Journal of Human-Computer Studies 41 (1994) 399–424.

    Article  Google Scholar 

  93. K. Eilbeck, S.E. Lewis, C.J. Mungall, M. Yandell, L. Stein, R. Durbin and M. Ashburner, The sequence ontology: a tool for the unification of genome annotations, Genome Biology 6 (2005).

  94. M. Berland and E. Charniak, Finding parts in very large corpora, presented at ACL-99, 1999.

  95. M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery, Learning to extract symbolic knowledge from the World Wide Web, presented at the Fifteenth National Conference on Artificial Intelligence (AAAI98), 1998.

  96. H. Cui, M.-Y. Kan and T.-S. Chua, Unsupervised learning of soft patterns for generating definitions from online news, presented at the 13th World Wide Web Conference, 2004.

  97. P. Brown, S.D. Pietra, V.D. Pietra and R. Mercer, Word sense disambiguation using statistical methods, presented at Annual Meeting of the ACL, 1991.

  98. H. Li and N. Abe, Word clustering and disambiguation based on co-occurrence data, presented at COLING-ACL’98, 1998.

  99. H.T. Ng and H.B. Lee, Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach, presented at 34th Annual Meeting of the Association for Computational Linguistics (ACL), 1996.

  100. T. Pedersen and R. Bruce, Knowledge lean word-sense disambiguation, presented at 15th National Conference on Artificial Intelligence, 1998.

  101. H. Schutze, Automatic word sense discrimination, Computational Linguistics 24 (1998) 97–123.

    Google Scholar 

  102. M. Stevenson and Y. Wilks, The interaction of knowledge sources in word sense disambiguation, Computational Linguistics 27 (2001) 321–350.

    Article  Google Scholar 

  103. J.R. Quinlan, Learning logical definitions from relations, Machine Learning Journal 5 (1990) 239–266.

    Google Scholar 

  104. C.J. Crouch and B. Yang, Experiments in automatic statistical thesaurus construction, presented at 15th ACM Annual International SIGIR, Denmark, 1992.

  105. N. Ge, J. Hale and E. Charniak, A statistical approach to anaphora resolution, presented at Proceedings of the Sixth Workshop on Very Large Corpora (COLING-ACL ‘98), Montreal, Canada, 1998.

  106. W.M. Soon, D. Chung and Y. Lim, A machine learning approach to coreference resolution of noun phrases, Computational Linguistics 27 (2001) 521–544.

    Article  Google Scholar 

  107. B. Swartout, R. Patil, K. Knight and T. Russ, Toward distributed use of large-scale ontologies, presented at AAAI-97 Symposium on Ontological Engineering, 1997.

  108. P. Buitelaar, Position paper on machine learning for the semantic web, presented at Machine Learning for the Semantic Web, Schloss Dagstuhl, Waden, Germany, 2005.

  109. F. Ciravegna, A. Dingli, D. Guthrie and Y. Wilks, Integrating information to bootstrap information extraction from web sites, presented at IJCAI Workshop on Information Integration on the Web, Acapulco, Mexico, 2003.

  110. F. Hakimpour and A. Geppert, Resolving semantic heterogeneity in schema integration: an ontology based approach, 2001.

  111. N. Noy and M. Musen, An algorithm for merging and aligning ontologies: automation and tool support, presented at Workshop on Ontology Management, Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, FL, 1999.

  112. G. Stumme and A. Maedche, FCA-Merge: a bottom-up approach for merging ontologies, presented at 17th International Joint Conference on Artificial Intelligence (IJCAI’01), San Francisco, 2001.

  113. S. Dasmahapatra, Interpretation functions for web semantics via machine learning, presented at Machine Learning for the Semantic Web, Schloss Dagstuhl, Waden, Germany, 2005.

  114. T. Dietterich, Machine learning research: Four current directions, AI Magazine 18 (1997) 97–136.

    Google Scholar 

  115. N. Guarino, Understanding, building and using ontologies, International Journal of Human-Computer Studies 46 (1997) 293–310.

    Article  Google Scholar 

  116. D.B. Lenat, CYC: toward programs with common sense, Communications of ACM 33 (1995) 30–49.

    Article  Google Scholar 

  117. A. Condamines and J. Rebeyrolles, CTKB: a corpus based approach for terminological knowledge base, presented at Workshop COGNITERM’98 associated to COLING-98, Montreal, Canada, 1998.

  118. W.H.E. Davies and P. Edwards, Distributed learning: an agent-based approach to data-mining, presented at ML95 Workshop on Agents that Learn from Other Agents, Tahoe City, California, 1995.

  119. L.-K. Soh, Multiagent distributed ontology learning, presented at the First International Joint Conference on Autonomous Agents and Multi-Agent Systems, Workshop on Ontologies in Agent System, Bologna, Italy, 2002.

  120. A.B. Williams and C. Tsatsoulis, An instance-based approach for identifying candidate ontology relations within a multi-agent system, presented at Fourteenth European Conference on Artificial Intelligence, Ontology Learning ECAI-2000 Workshop, Berlin, Germany, 2000.

  121. A.B. Williams, Learning to share meaning in a multi-agent system, Journal of Autonomous Agents and Multi-Agent Systems 8 (2004) 165–193.

    Article  Google Scholar 

  122. U. Guntzer, G. Juttner, G. Seegmuller and F. Sarre, Automatic thesaurus construction by machine learning from retrieval sessions, Information Processing & Management 25 (1989) 265–273.

    Article  Google Scholar 

  123. S. Santini, A. Gupta and R. Jain, Emergent semantics through interaction in image databases, IEEE Transactions on Knowledge and Data Engineering 13 (2001) 337–351.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lina Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, L. Ontology learning: state of the art and open issues. Inf Technol Manage 8, 241–252 (2007). https://doi.org/10.1007/s10799-007-0019-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10799-007-0019-5

Keywords

Navigation