Ontology Creation: Extraction of Domain Knowledge from Web Documents

Storey, Veda C.; Chiang, Roger; Chen, G. Lily

doi:10.1007/11568322_17

Veda C. Storey²⁰,
Roger Chiang²¹ &
G. Lily Chen²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3716))

Included in the following conference series:

International Conference on Conceptual Modeling

1233 Accesses
5 Citations

Abstract

Considerable research has gone into developing ontologies and applying them to a variety of applications. The extraction of domain knowledge for developing these ontologies is often performed on a manual basis. The World Wide Web contains a wealth of knowledge about an application domain; however it is embedded within web pages. This research presents a methodology for semi-automatically extracting knowledge from the World Wide Web and organizing it into domain ontologies. Initial semantics of a target domain are provided by a set of keywords. From these, web pages are identified that contain relevant information for the subject domain using search engines. Web data extraction techniques are employed to extract information from these web pages and infer how the information is related. Extracted knowledge is then organized into a domain ontology. Testing of the methodology on various application domains illustrates the feasibility of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Information Extraction for Learning Expressive Ontologies

A survey of methods for the extraction of information from Web resources

Article 16 September 2016

Ontology Engineering

References

Chiang, R., Chua, E.H., Storey, V.C.: A Smart Web Query Engine for Semantic Retrieval of Web Data. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds.) NLDB 2000. LNCS, vol. 1959, p. 215. Springer, Heidelberg (2001)
Chapter Google Scholar
Embley, D.W.: Toward Semantic Understanding: An Approach Based on Information Extraction Ontologies. Presented at ACM International Conference Proceeding Series; Proceedings of the fifteenth conference on Australasian database, Dunedin, New Zealand (2004)
Google Scholar
Gruber, T.R.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5, 199–220 (1993)
Article Google Scholar
Weber, R.: Ontological Issues in Accounting Information Systems. In: Sutton, S.A.A. (ed.) Researching Accounting as an Information Systems Discipline. American Accounting Association, Sarasota (2002)
Google Scholar
Dahlgren, K.: A Linguistic Ontology. International Journal of Human-Computer Studies 43, 809–818 (1995)
Article Google Scholar
Kedad, Z., Métais, E.: Dealing with Semantic Heterogeneity During Data Integration. In: Akoka, J., Bouzeghoub, M., Comyn-Wattiau, I., Métais, E. (eds.) ER 1999. LNCS, vol. 1728, pp. 325–339. Springer, Heidelberg (1999)
Google Scholar
Bergholtz, M., Johannesson, P.: Classifying the Semantics of Relationships in Conceptual Modeling by Categorization of Roles, Madrid, Spain, June 28-29 (2001)
Google Scholar
Storey, V.C.: Classifying and Comparing Relationships in Conceptual Modeling. IEEE Transactions on Knowledge and Data Engineering (forthcoming, 2005)
Google Scholar
Laender, A.H.F., Ribeiro-Neto, B.A., de Silva, A.S., Teixeira, J.S.: A Brief Survey of Web Data Extraction Tools. ACM SIGMOD Record 31, 84–93 (2002)
Article Google Scholar
Califf, A.M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. Presented at Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, Orlando, Florida (1999)
Google Scholar
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Machine Learning 39, 169–202 (2000)
Article MATH Google Scholar
Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34, 233–272 (1999)
Article MATH Google Scholar
Embley, D.W., Campbell, D.M., Jiang, Y.S., Ng, Y.-K., Smith, R.D., Liddle, S.W., Quass, D.W.: Conceptual-model-based data extraction from multiple-record Web pages. Data & Knowledge Engineering 31, 227–251 (1999)
Article MATH Google Scholar
Etzioni, O.: The World-Wide Web: Quagmire or Gold Mine? Communications of the ACM archive 39, 65–68 (1996)
Article Google Scholar
Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2, 1–15 (2000)
Article Google Scholar
Xavier (2005), http://www.xavier.edu/library/xututor/evaluating/types_of_websites.cfm
Fellbaum, C.: Introduction. In: WordNet: An Electronic Lexical Database, pp. 1–19. The MIT Press, Cambridge (1998)
Google Scholar
Burton-Jones, A., Storey, V.C., Sugumaran, V., Purao, S.: A Heuristic-based Methodology for Semantic Augmentation of User Queries on the Web. In: Song, I.-Y., Liddle, S.W., Ling, T.-W., Scheuermann, P. (eds.) ER 2003. LNCS, vol. 2813, pp. 476–489. Springer, Heidelberg (2003)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Information Systems, J. Mack Robinson College of Business, Georgia State University, Box 4015, Atlanta, GA, 30302
Veda C. Storey & G. Lily Chen
Information Systems Department, College of Business, University of Cincinnati, Cincinnati, Ohio, 45221-0211
Roger Chiang

Authors

Veda C. Storey
View author publications
You can also search for this author in PubMed Google Scholar
Roger Chiang
View author publications
You can also search for this author in PubMed Google Scholar
G. Lily Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Portland State University, 97207, Portland, OR
Lois Delcambre
Institute for Applied Informatics, Alpen-Adria-Universität Klagenfurt, Austria
Christian Kop & Heinrich C. Mayr &
Dept. of Information Engineering and Computer Science,
John Mylopoulos
Department of Information Systems and Computation, Technical University of Valencia, Camino de Vera s/n, 46022, Valencia, Spain
Oscar Pastor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Storey, V.C., Chiang, R., Chen, G.L. (2005). Ontology Creation: Extraction of Domain Knowledge from Web Documents. In: Delcambre, L., Kop, C., Mayr, H.C., Mylopoulos, J., Pastor, O. (eds) Conceptual Modeling – ER 2005. ER 2005. Lecture Notes in Computer Science, vol 3716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11568322_17

Download citation

DOI: https://doi.org/10.1007/11568322_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29389-7
Online ISBN: 978-3-540-32068-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Ontology Creation: Extraction of Domain Knowledge from Web Documents

Abstract

Access this chapter

Preview

Similar content being viewed by others

Information Extraction for Learning Expressive Ontologies

A survey of methods for the extraction of information from Web resources

Ontology Engineering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Ontology Creation: Extraction of Domain Knowledge from Web Documents

Abstract

Access this chapter

Preview

Similar content being viewed by others

Information Extraction for Learning Expressive Ontologies

A survey of methods for the extraction of information from Web resources

Ontology Engineering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation