Skip to main content
Log in

Learning Rules for Conceptual Structure on the Web

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

This paper presents an infrastructure and methodology to extract conceptual structure from Web pages, which are mainly constructed by HTML tags and incomplete text. Human beings can easily read Web pages and grasp an idea about the conceptual structure of underlying data, but cannot handle excessive amounts of data due to lack of patience and time. However, it is extremely difficult for machines to accurately determine the content of Web pages due to lack of understanding of context and semantics. Our work provides a methodology and infrastructure to process Web data and extract the underlying conceptual structure, in particular relationships between ontological concepts using Inductive Logic Programming in order to help with automating the processing of the excessive amount of Web data by capturing its conceptual structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agichtein, E. and Gravano, L. (2001). Snowball: Extracting Relations from Large Plain-Text Collections. In Proceedings of the 5th ACM International Conference on Digital Libraries.

  • Brill, E. (1995). Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics.

  • Bray, T., Paoli, J., Sperberg-McQueen, C.M., and Maler, E. (2000). Extensible Markup Language (XML) 1.0, W3C Recommendation.

  • Brin, S. (1998). Extracting Patterns and Relations from the World Wide Web. ACM WebDB Workshop.

  • Califf, M.E. (1998). Relational Learning Techniques for Natural Language Information Extraction. PhD Thesis, The University of Texas at Austin, TX.

    Google Scholar 

  • Craven, M., Dipasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., and Slattery, S. (1999). Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence.

  • Craven, M. and Slattery, S. (2001). Relational Learning with Statistical Predicate Invention: Better Models for Hypertext. Machine Learning, 43, 97-119.

    Google Scholar 

  • Elmasri, R., and Navathe, S.B. (2000). Fundamentals of Database Systems. Addison-Wesley.

  • Embley, D.W., Kai, Y., and Xu, L. (2001). Recognizing Target-Ontology-Applicable Multiple-Record Web Documents. In Proceedings of the 20th International Conference on Conceptual Modeling.

  • Embley, D.W., Campbell, D.M., Jiang, Y.S., Ng, Y., Smith, R.D., Liddle, S.W., and Quass, D.W. (1998). A Conceptual-Modeling Approach to Extracting Data from the Web. In Proceedings of the 2nd International Conference on Conceptual Modeling.

  • Florescu, D., Levy, A., and Mendelzon, A. (1998). Database Techniques for theWorldWideWeb: A Survey. ACM SIGMOD RECORD, 27(3).

  • Freitag, D. (1998). Machine Learning for Information Extraction in Informal Domains. PhD Thesis, Carnegie Mellon University, PA.

    Google Scholar 

  • Han, H. (2002). Conceptual Modeling and Ontology Extraction forWeb Information. PhD Thesis, The University of Texas at Arlington, TX.

    Google Scholar 

  • Kosala, R. and Blockeel, H. (2000). Web Mining Research: A Survey. SIGKDD Explorations, 2, 1-15.

    Google Scholar 

  • Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers.

  • Maedche, A. and Staab, S. (2000). Discovering Conceptual Relations from Text. Proceedings of the 14th European Conference on Artificial Intelligence, Amsterdam.

  • Manning, C. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing. The MIT Press.

  • Mitchell, T. (1997). Machine Learning. WCB/McGraw-Hill.

  • Montgomery, D.C. and Runger, G.C. (1994). Applied Statistics and Probability for Engineers. John Wiley Sons, Inc.

  • Muggleton, S. (2001). CProgol4.4: A Tutorial Introduction. In Inductive Logic Programming and Knowledge Discovery in Databases. Springer-Verlag.

  • Nienhuys-Cheng, S. and Wold, R. (1997). Foundations of Inductive Logic Programming. Springer.

  • Parson, R. and Muggleton, S. (1998). An Experiment with Browers That Learn, Machine Intelligence, Vol. 15.

  • Raggett, D., Hors, A.L., and Jacobs, I. (1999). HTML 4.01 Specification, W3C Recommendation.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, H., Elmasri, R. Learning Rules for Conceptual Structure on the Web. Journal of Intelligent Information Systems 22, 237–256 (2004). https://doi.org/10.1023/B:JIIS.0000019278.84222.b7

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:JIIS.0000019278.84222.b7

Navigation