Abstract
Ontology is one of the fundamental cornerstones of the semantic Web. The pervasive use of ontologies in information sharing and knowledge management calls for efficient and effective approaches to ontology development. Ontology learning, which seeks to discover ontological knowledge from various forms of data automatically or semi-automatically, can overcome the bottleneck of ontology acquisition in ontology development. Despite the significant progress in ontology learning research over the past decade, there remain a number of open problems in this field. This paper provides a comprehensive review and discussion of major issues, challenges, and opportunities in ontology learning. We propose a new learning-oriented model for ontology development and a framework for ontology learning. Moreover, we identify and discuss important dimensions for classifying ontology learning approaches and techniques. In light of the impact of domain on choosing ontology learning approaches, we summarize domain characteristics that can facilitate future ontology learning effort. The paper offers a road map and a variety of insights about this fast-growing field.
Similar content being viewed by others
References
T.R. Gruber, A translation approach to portable ontologies, Knowledge Acquisition 5 (1993) 199–220.
N.F. Noy and M.A. Musen, Ontology versioning in an ontology management framework, Intelligent Systems, IEEE [see also IEEE Intelligent Systems and Their Applications] 19 (2004) 6–13.
A.T. Schreiber, B. Dubbeldam, J. Wielemaker and B. Wielinga, Ontology-based photo annotation, Intelligent Systems, IEEE [see also IEEE Intelligent Systems and Their Applications] 16 (2001) 66–74.
Z. Duo, L. Juan-Zi and X. Bin, Web Service Annotation Using Ontology Mapping, presented at Service-Oriented System Engineering, 2005. SOSE 2005. IEEE International Workshop (2005).
D. Fensel, Ontology-based knowledge management, Computer 35 (2002) 56–59.
M. Baziz, M. Boughanem, N. Aussenac-Gilles and C. Chrisment, Semantic cores for representing documents in IR. Proceedings of the 2005 ACM symposium on Applied computing, Santa Fe, New Mexico, ACM Press, 2005, pp. 1011–1017.
L. Khan, D. McLeod and E. Hovy, Retrieval effectiveness of an ontology-based model for information selection, The VLDB Journal 13 (2004) 71–85.
J. Hendler and D.L. McGuinness, DARPA agent markup language, IEEE Intelligent Systems 15 (2001) 72–73.
H. Takeda, K. Iino and T. Nishida, Ontology-supported agent communication, presented at Working Notes of the AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments (1995).
H. Takeda, K. Iwata, M. Takaai, A. Sawada and T. Nishida, An ontology-based cooperative environment for real-world agents, presented at Second International Conference on Multiagent Systems (1996).
M.F. Lopez, A. Gomez-Perez, J.P. Sierra and A.P. Sierra, Building a chemical ontology using methontology and the ontology design environment, Intelligent Systems and Their Applications, IEEE [see also IEEE Intelligent Systems] 14 (1999) 37–46.
R. Navigli, P. Velardi and A. Gangemi, Ontology learning and its application to automated terminology translation, Intelligent Systems, IEEE [see also IEEE Intelligent Systems and Their Applications] 18 (2003) 22–31.
N. Guarino, Formal ontology, conceptual analysis and knowledge representation, International Journal of Human-Computer Studies 43 (1995) 625–640.
R. Studer, R. Benjamins and D. Fensel, Knowledge engineering: Principles and methods, Data and Knowledge Engineering 25 (1998) 161–197.
D. Brickley and R. Guha, Resource description framework (RDF) schema specification 1.0, vol. 2000: W3C recommendation (2000).
D.L. McGuinness and F.v. Harmelen, Web Ontology Language (OWL): Overview (2003).
M. Minskey, A framework for representing knowledge, in: P.H. Winston, (ed.) The Psychology of Computer Vision (McGraw-Hill, New York, 1975).
Y. Ding, Ontology research and development part1 – A review of ontology generation, Journal of Information Science, 28 (2002) 123–136.
A. Gomez-Perez, Some ideas and examples to evaluate ontologies, presented at Artificial Intelligence for Applications, 1995. Proceedings of 11th Conference on 1995.
S. Staab, A. Gomez-Perez, W. Daelemana, M.-L. Reinberger and N.F. Noy, Why evaluate ontology technologies? Because it works!, Intelligent Systems, IEEE 19 (2004) 74–81.
Y. Sure, S. Staab and R. Studer, Methodology for development and employment of ontology based knowledge management applications, SIGMOD Rec 31 (2002) 18–23.
L. Ding, T. Finin, A. Joshi, R. Pan, R.S. Cost, J. Sachs, V. Doshi, P. Reddivari and Y. Peng, Swoogle: A Search and Metadata Engine for the Semantic Web, presented at Thirteenth ACM Conference on Information and Knowledge Management (CIKM), Washington DC, 2004.
A. Maedche and S. Staab, Ontology learning for the Semantic Web, IEEE Intelligent Systems: Special Issue on the Semantic Web 16 (2001) 72–79.
M. Uschold and M. Gruninger, Ontologies: principles, methods, and applications, Knowledge Engineering Review 11 (1996) 93–155.
D. Faure and C. Nedellec, A corpus-based conceptual clustering method for verb frames and ontology, presented at LREC Workshop on Adapting lexical and corpus resources to sublanguages and applications, 1998.
G. Bisson, C. Nedellec and L. Canamero, Designing clustering methods for ontology building – The Mo’K workbench, presented at ECAI Ontology Learning Workshop, Seattle, WA, 2000.
P. Buitelaar, D. Olejnik and M. Sintek, A protégé plug-in for ontology extraction from text based on linguistic analysis, presented at the 1st European Semantic Web Symposium (ESWS), Heraklion, Greece, 2004.
M. Missikoff, R. Navigli and P. Velardi, Integrated approach to web ontology learning and engineering, IEEE Computer (2002) 60–63.
M. Denny, Ontology building: A survey of editing tools, 2002.
A. Farquhar, R. Fikes and J. Rice, The Ontolingua server: a tool for collaborative ontology construction, International Journal of Human-Computer Studies 46 (1997) 707–728.
M. Missikoff and X.F. Wang, Consys – a group decision-making support system for collaborative ontology building, presented at International Conference on Group Decision and Negotiation, 2001.
L. Zhou, Q.E. Booker and D. Zhang, ROD – toward rapid ontology development for underdeveloped domains, presented at 35th Hawaii International Conference on System Sciences, Big Island, Hawaii, 2002.
N.F. Noy, M. Sintek, S. Decker, M. Crubezy, R.W. Fergerson and M.A. Musen, Creating semantic web contents with protege-2000, IEEE Intelligent Systems 16 (2001) 60–71.
L. Zhou, Q. Booker and D. Zhang, ROD – toward rapid ontology development for underdeveloped domains, presented at 35th Hawaii International Conference on System Sciences (HICSS’35), Hawaii, USA, 2002.
S. Soderland, Learning to extract text-based information from the world wide web, presented at Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997.
J. Allen, Natural Language Understanding (The Benjamin /Cummings Publishing Company, Inc., 1995).
G. Salton and M. McGill, Introduction to Modern Information Retrieval (McGraw-Hill Book Company, 1983).
T. Hofmann and J. Puzicha, Statistical Models for Co-occurrence Data, AIM-1625 (1998) 21.
B. Roark and E. Charniak, Noun-Phrase Co-occurrence Statistics for Semi-Automatic Semantic Lexicon Construction, presented at COLING-ACL, 1998.
E. Riloff and J. Shepherd, A corpus-based bootstrapping algorithm for semi-automated semantic lexicon construction, Journal of Natural Language Engineering 5 (1999) 147–156.
P. Jacob and U. Zernik, Acquiring lexical knowledge from text: A case study, presented at 7th National Conference on Artificial Intelligence, 1988.
J.-U. Kietz, A. Mädche and R. Volz, A method for semi-automatic ontology acquisition from a corporate intranet, presented at Learning Language in Logic Workshop (LLL-2000), New Brunswick, NJ, 2000.
W.B. Dolan, L. Vanderwende and S.D. Richardson, Automatically deriving structured knowledge bases from on-line dictionaries, presented at 1st Conference of the Pacific Association for Computational Linguistics, Vancouver, 1993.
A. Shaikevich, Automatic construction of a thesaurus from explanatory dictionaries, Automatic Documentation and Mathematical Linguistics 19 (1985) 76–89.
S. Staab, An overview on machine learning for the semantic web, presented at Machine Learning for the Semantic Web, Schloss Dagstuhl, Waden, Germany, 2005.
M.E. Califf and R.J. Mooney, Bottom-up relational learning of pattern matching rules for information extraction, Machine Learning Research 4 (2004) 177–210.
M. Sanderson and B. Croft, Deriving concept hierarchies from text pages, presented at 22nd ACM SIGIR Conference, 1999.
K. Church and P. Hanks, Word association norms, mutual information, and lexicography, Computational Linguistics 16 (1989).
S.A. Caraballo, Automatic construction of a hypernym-labled noun hierarchy, presented at 37th Annual Meeting of the Association for Computational Linguistics, 1999.
A. Maedche and S. Staab, Mining ontologies from text, presented at 12th International Workshop on Knowledge Engineering and Knowledge Management, French Riviera, 2000.
F. Xu, D. Kurz, J. Piskorski and S. Schmeier, Term extraction and mining of term relations from unrestricted texts in the financial domain, presented at Business Information Systems, Poznan, Poland, 2002.
D. Lin, Automatic retrieval and clustering of similar words, presented at COLING-ACL’98, Montreal, Canada, 1998.
K. Morik, S. Wrobel, J.-U. Kietz and W. Emde, Knowledge Acquisition and Machine Learning: Theory, Methods, and Applications. (Academic Press, London, 1993).
C. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. (MIT Press, Cambridge, MA, 1999).
L. Marquez, Machine learning and natural language processing, 2000.
G. Ruge, Experiments on linguistically based term associations, Information Processing & Management 28 (1992) 317–332.
H. Schutze and J. Pederssn, A cooccurrence-based thesaurus and two applications to information retrieval, presented at RIAO’94, New York, 1994.
C. Aone and S.W. Bennett, Evaluating automated and manual acquisition of anaphora resolution strategies, presented at Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, 1995.
C. Barbu and R. Mitkov, Evaluation tool for rule-based anaphora resolution methods, presented at Meeting of the Association for Computational Linguistics, Toulouse, 2001.
S.K.M. Wong, Y.J. Cai and Y.Y. Yao, Computation of term associations by a neural network, presented at SIGIR’93, Pittsburgh, PA, 1993.
K.W. Church and W.A. Gale, A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams, Computer Speech and Language 5 (1991) 19–54.
F. Jelinek and R. Mercer, Probability distribution estimation from sparse data, IBM Technical Disclosure Bulletin 28 (1985) 2591–2594.
P. Brown, V.D. Pietra, P. deSouza, J. Lai and R. Mercer, Class-based n-gram models of natural language, Computational Linguistics 18 (1992) 467–479.
D. Hindle, Noun Classification from predicate-argument structures, presented at ACL-90, Pittsburg, Pennsylvania, 1990.
I. Dagan, S. Marcus and S. Markovitch, Contextual word similarity and estimation from sparse data, presented at ACL’93, Columbus, Ohio, 1993.
F.C.N. Pereira, N. Tishby and L. Lee, Distributional clustering of English words, presented at 30th Annual Meeting of the ACL, 1993.
E. Glover, D.M. Pennock, S. Lawrence and R. Krovetz, Inferring hierarchical descriptions, presented at 11th International Conference on Information and Knowledge Management (McLean, Virginia, 2002).
H. Li and N. Abe, Clustering words with the MDL principle, Journal of Natural Language Processing 4 (1997).
A. Doan, P. Domingos and A.Y. Levy, Learning source descriptions for data integration, presented at the International Workshop on The Web and Databases (WebDB), Berlin, 2000.
Z. Tari, O.A. Bukhres, J. Stokes and S. Hammoudi, The reengineering of relational databases based on key and data correlations, presented at 7th Conference Database Semantics (DS-7), Chapman & Hall, 1997.
M.A. Hearst, Automatic acquisition of hyponyms from large text corpora, presented at Proceedings of the Fourteenth International Conference on Computational Linguistics, Nantes, France, 1992.
M.A. Hearst, Automated discovery o WordNet relations, in: C. Fellbaum (ed.) WordNet: An Electronic Lexical Database (MIT Press, 1998).
E. Riloff and J. Shepherd, A corpus-based approach for building semantic lexicon, presented at the Second Conference on Empirical Methods in Natural Language Processing, 1997.
N. Aussenac-Gilles, Supervised text analysis for ontology and terminology engineering, presented at Machine Learning for the Semantic Web, Schloss Dagstuhl, Waden, Germany, 2005.
P. Wiemer-Hastings, A.C. Graesser and K. Wiemer-Hastings, Inferring the meaning of verbs from context, presented at 20th Annual Conference of the Cognitive Science Society, Mahwah, NJ, 1998.
U. Hahn, K. Schnattinger, Ontology engineering via text understanding, presented at IFIP’98, 15th World Computer Congress, Vienna and Budapest, 1998.
U. Hahn and K. Schnattinger, Towards text knowledge engineering, presented at 15th National Conference on Artificial Intelligence (AAAI ‘98), Madison, Wisconsin, 1998.
M.-L. Reinberger, Unsupervised text mining for ontology learning, presented at Machine Learning for the Semantic Web, Schloss Dagstuhl, Waden, Germany, 2005.
E. Riloff, Automatically constructing a dictionary for information extraction tasks, presented at 11th National Conference on Artificial Intelligence (AAAI’93), Washington, DC, 1993.
J. Jannink and G. Wiederhold, Thesaurus entry extraction from an on-line dictionary, presented at 2nd International Conference on Information Fusion (Fusion-99), Omnipress, Wisconsin, 1999.
S. Schlobach, Assertional Mining in Description Logics, in: Description Logics, (2000) 237–246.
F. Ciravegna, S. Chapman, A. Dingli and Y. Wilks, Learning to harvest information for the semantic web, presented at 1st European Semantic Web Symposium (ESWS), Heraklion, Greece, 2004.
D. Faure and C. Nedellec, A corpus-based conceptual clustering method for verb frames and ontology acquisition, presented at LREC-98 Workshop on Adapting Lexical and Corpus Resources to Sublan-guages and Applications, Paris, 1998.
H. Assadi, Construction of a regional ontology from text and its use within a documentary system, presented at International conference on Formal Ontology in Information Systems (FOIS’98), Amsterdam, 1998.
D. Faure and C. Nedellec, Knowledge acquisition of predicate-argument structures from technical texts using machine learning, presented at Current Developments in Knowledge Acquisition (EKAW-99), 1999.
A. Thanopoulos, N. Fakotakis and G. Kokkinakis, Automatic extraction of semantic relations from specialized corpora, presented at COLING’2000, Saarbrucken, 2000.
P. Cimiano, A. Hotho and S. Staab, Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text, presented at European Conference on Artificial Intelligence, Valencia, Spain, 2004.
M. Finkelstein-Landau and E. Morin, Extracting semantic relationships between terms: supervised vs. unsupervised methods, presented at International Workshop on Ontological Engineering on the Global Information Infrastructure, Dagstuhl Castle, Germany, 1999.
D.Z. Inkpen and G. Hirst, Automatic sense disambiguation of the near-synonyms in a dictionary entry, presented at Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2003), Mexico City, 2003.
G. Grefenstette, Explorations in Automatic Thesaurus Discovery (Kluwer Academic Publishers, 1994).
S. Rydin, Building a hyponymy lexicon with hierarchical structure, presented at SIGLEX Workshop on Unsupervised Lexical Acquisition, ACL’02, Philadelphia, Pennsylvania, 2002.
J.H. Gennari, S.W. Tu, T.E. Rothenfluh and M.A. Musen, Mapping domains to methods in support of reuse, International Journal of Human-Computer Studies 41 (1994) 399–424.
K. Eilbeck, S.E. Lewis, C.J. Mungall, M. Yandell, L. Stein, R. Durbin and M. Ashburner, The sequence ontology: a tool for the unification of genome annotations, Genome Biology 6 (2005).
M. Berland and E. Charniak, Finding parts in very large corpora, presented at ACL-99, 1999.
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery, Learning to extract symbolic knowledge from the World Wide Web, presented at the Fifteenth National Conference on Artificial Intelligence (AAAI98), 1998.
H. Cui, M.-Y. Kan and T.-S. Chua, Unsupervised learning of soft patterns for generating definitions from online news, presented at the 13th World Wide Web Conference, 2004.
P. Brown, S.D. Pietra, V.D. Pietra and R. Mercer, Word sense disambiguation using statistical methods, presented at Annual Meeting of the ACL, 1991.
H. Li and N. Abe, Word clustering and disambiguation based on co-occurrence data, presented at COLING-ACL’98, 1998.
H.T. Ng and H.B. Lee, Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach, presented at 34th Annual Meeting of the Association for Computational Linguistics (ACL), 1996.
T. Pedersen and R. Bruce, Knowledge lean word-sense disambiguation, presented at 15th National Conference on Artificial Intelligence, 1998.
H. Schutze, Automatic word sense discrimination, Computational Linguistics 24 (1998) 97–123.
M. Stevenson and Y. Wilks, The interaction of knowledge sources in word sense disambiguation, Computational Linguistics 27 (2001) 321–350.
J.R. Quinlan, Learning logical definitions from relations, Machine Learning Journal 5 (1990) 239–266.
C.J. Crouch and B. Yang, Experiments in automatic statistical thesaurus construction, presented at 15th ACM Annual International SIGIR, Denmark, 1992.
N. Ge, J. Hale and E. Charniak, A statistical approach to anaphora resolution, presented at Proceedings of the Sixth Workshop on Very Large Corpora (COLING-ACL ‘98), Montreal, Canada, 1998.
W.M. Soon, D. Chung and Y. Lim, A machine learning approach to coreference resolution of noun phrases, Computational Linguistics 27 (2001) 521–544.
B. Swartout, R. Patil, K. Knight and T. Russ, Toward distributed use of large-scale ontologies, presented at AAAI-97 Symposium on Ontological Engineering, 1997.
P. Buitelaar, Position paper on machine learning for the semantic web, presented at Machine Learning for the Semantic Web, Schloss Dagstuhl, Waden, Germany, 2005.
F. Ciravegna, A. Dingli, D. Guthrie and Y. Wilks, Integrating information to bootstrap information extraction from web sites, presented at IJCAI Workshop on Information Integration on the Web, Acapulco, Mexico, 2003.
F. Hakimpour and A. Geppert, Resolving semantic heterogeneity in schema integration: an ontology based approach, 2001.
N. Noy and M. Musen, An algorithm for merging and aligning ontologies: automation and tool support, presented at Workshop on Ontology Management, Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, FL, 1999.
G. Stumme and A. Maedche, FCA-Merge: a bottom-up approach for merging ontologies, presented at 17th International Joint Conference on Artificial Intelligence (IJCAI’01), San Francisco, 2001.
S. Dasmahapatra, Interpretation functions for web semantics via machine learning, presented at Machine Learning for the Semantic Web, Schloss Dagstuhl, Waden, Germany, 2005.
T. Dietterich, Machine learning research: Four current directions, AI Magazine 18 (1997) 97–136.
N. Guarino, Understanding, building and using ontologies, International Journal of Human-Computer Studies 46 (1997) 293–310.
D.B. Lenat, CYC: toward programs with common sense, Communications of ACM 33 (1995) 30–49.
A. Condamines and J. Rebeyrolles, CTKB: a corpus based approach for terminological knowledge base, presented at Workshop COGNITERM’98 associated to COLING-98, Montreal, Canada, 1998.
W.H.E. Davies and P. Edwards, Distributed learning: an agent-based approach to data-mining, presented at ML95 Workshop on Agents that Learn from Other Agents, Tahoe City, California, 1995.
L.-K. Soh, Multiagent distributed ontology learning, presented at the First International Joint Conference on Autonomous Agents and Multi-Agent Systems, Workshop on Ontologies in Agent System, Bologna, Italy, 2002.
A.B. Williams and C. Tsatsoulis, An instance-based approach for identifying candidate ontology relations within a multi-agent system, presented at Fourteenth European Conference on Artificial Intelligence, Ontology Learning ECAI-2000 Workshop, Berlin, Germany, 2000.
A.B. Williams, Learning to share meaning in a multi-agent system, Journal of Autonomous Agents and Multi-Agent Systems 8 (2004) 165–193.
U. Guntzer, G. Juttner, G. Seegmuller and F. Sarre, Automatic thesaurus construction by machine learning from retrieval sessions, Information Processing & Management 25 (1989) 265–273.
S. Santini, A. Gupta and R. Jain, Emergent semantics through interaction in image databases, IEEE Transactions on Knowledge and Data Engineering 13 (2001) 337–351.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, L. Ontology learning: state of the art and open issues. Inf Technol Manage 8, 241–252 (2007). https://doi.org/10.1007/s10799-007-0019-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10799-007-0019-5