Abstract
This paper presents an infrastructure and methodology to extract conceptual structure from Web pages, which are mainly constructed by HTML tags and incomplete text. Human beings can easily read Web pages and grasp an idea about the conceptual structure of underlying data, but cannot handle excessive amounts of data due to lack of patience and time. However, it is extremely difficult for machines to accurately determine the content of Web pages due to lack of understanding of context and semantics. Our work provides a methodology and infrastructure to process Web data and extract the underlying conceptual structure, in particular relationships between ontological concepts using Inductive Logic Programming in order to help with automating the processing of the excessive amount of Web data by capturing its conceptual structures.
Similar content being viewed by others
References
Agichtein, E. and Gravano, L. (2001). Snowball: Extracting Relations from Large Plain-Text Collections. In Proceedings of the 5th ACM International Conference on Digital Libraries.
Brill, E. (1995). Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics.
Bray, T., Paoli, J., Sperberg-McQueen, C.M., and Maler, E. (2000). Extensible Markup Language (XML) 1.0, W3C Recommendation.
Brin, S. (1998). Extracting Patterns and Relations from the World Wide Web. ACM WebDB Workshop.
Califf, M.E. (1998). Relational Learning Techniques for Natural Language Information Extraction. PhD Thesis, The University of Texas at Austin, TX.
Craven, M., Dipasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., and Slattery, S. (1999). Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence.
Craven, M. and Slattery, S. (2001). Relational Learning with Statistical Predicate Invention: Better Models for Hypertext. Machine Learning, 43, 97-119.
Elmasri, R., and Navathe, S.B. (2000). Fundamentals of Database Systems. Addison-Wesley.
Embley, D.W., Kai, Y., and Xu, L. (2001). Recognizing Target-Ontology-Applicable Multiple-Record Web Documents. In Proceedings of the 20th International Conference on Conceptual Modeling.
Embley, D.W., Campbell, D.M., Jiang, Y.S., Ng, Y., Smith, R.D., Liddle, S.W., and Quass, D.W. (1998). A Conceptual-Modeling Approach to Extracting Data from the Web. In Proceedings of the 2nd International Conference on Conceptual Modeling.
Florescu, D., Levy, A., and Mendelzon, A. (1998). Database Techniques for theWorldWideWeb: A Survey. ACM SIGMOD RECORD, 27(3).
Freitag, D. (1998). Machine Learning for Information Extraction in Informal Domains. PhD Thesis, Carnegie Mellon University, PA.
Han, H. (2002). Conceptual Modeling and Ontology Extraction forWeb Information. PhD Thesis, The University of Texas at Arlington, TX.
Kosala, R. and Blockeel, H. (2000). Web Mining Research: A Survey. SIGKDD Explorations, 2, 1-15.
Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers.
Maedche, A. and Staab, S. (2000). Discovering Conceptual Relations from Text. Proceedings of the 14th European Conference on Artificial Intelligence, Amsterdam.
Manning, C. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing. The MIT Press.
Mitchell, T. (1997). Machine Learning. WCB/McGraw-Hill.
Montgomery, D.C. and Runger, G.C. (1994). Applied Statistics and Probability for Engineers. John Wiley Sons, Inc.
Muggleton, S. (2001). CProgol4.4: A Tutorial Introduction. In Inductive Logic Programming and Knowledge Discovery in Databases. Springer-Verlag.
Nienhuys-Cheng, S. and Wold, R. (1997). Foundations of Inductive Logic Programming. Springer.
Parson, R. and Muggleton, S. (1998). An Experiment with Browers That Learn, Machine Intelligence, Vol. 15.
Raggett, D., Hors, A.L., and Jacobs, I. (1999). HTML 4.01 Specification, W3C Recommendation.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Han, H., Elmasri, R. Learning Rules for Conceptual Structure on the Web. Journal of Intelligent Information Systems 22, 237–256 (2004). https://doi.org/10.1023/B:JIIS.0000019278.84222.b7
Issue Date:
DOI: https://doi.org/10.1023/B:JIIS.0000019278.84222.b7