Abstract
We present a general framework for the information extraction from web pages based on a special wrapper language, called token-templates. By using token-templates in conjunction with logic programs we are able to reason about web page contents, search and collect facts and derive new facts from various web pages. We give a formal definition for the semantics of logic programs extended by token-templates and define a general answer-complete calculus for these extended programs. These methods and techniques are used to build intelligent mediators and web information systems.
Similar content being viewed by others
References
Ashish, N. and Knoblock, C. (1997). Wrapper Generation For Semistructured Internet Sources. In Proceedings of the Workshop on Management of Semi-structured Data.
Baumgartner, P. (1997). Theory Reasoning in Connection Calculi and the Linearizing Completion Approach. Ph.D. Thesis, University Koblenz.
Brewka, G. (Ed.) (1996). Principles of Knowledge Representation, CSLI Publications.
Califf, M.E. and Mooney, R.J. (1997). Relational Learning of Pattern-Match Rules for Information Extraction. In Working Papers of the ACL-97, Workshop in Natural Language Learning.
Carpenter, B. (1991). Typed Feature Structures: An Extension of First-Order Terms. In Proceedings of the International Symposium on Logic Programming, San Diego.
Chawathe, S., Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, Y., and Widom, J. (1994). The TSIMMIS Project: Integration of Heterogeneous Information Sources. In Proceedings of IPSJ, Japan.
Doorenbos, R., Etzioni, O., and Weld, D. (1997). A Scalable Comparison-Shopping Agent for the World-Wide Web. In Proceedings of Autonomous Agents (pp. 39–48). New York: Association of Computing Machinery.
ECLiPSe User Manual/Extensions User Manual-Release 3.6 (1997). International Computers Limited and ICParc. Two volumes. http://www.ecrc.de/eclipse/.
Genesereth, M.R., Keller, A.M., and Duschka, O. (1997). Infomaster: An Information Integration System. In Proceedings of ACM SIGMOD Conference.
Gruser, J., Raschid, L., Vidal, M., and Bright, L. (1998). A Wrapper Generation Toolkit to Specify and Construct Wrappers for Web Accesible Data. Technical Report, UMIACS, University of Maryland.
Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., and Crespo, A. (1997). Extracting Semistructured Information from the Web. In Proceedings of the Workshop on Management of Semistructured Data.
Knight, K. (1989). Unification: A Multidisciplinary Survey, ACM Computing Surveys. 21(1), 93–124.
Konopnicki, D. and Shmueli, O. (1995). W3QS: A Query System for the World-Wide Web. In Proceedings of VLDB'95.
Kowalski, R. and Kuehner, D. (1971). Linear Resolution with Selection Function, Artificial Intelligence, 2, 227–260.
Kushmerick, N., Weld, D.S., and Doorenbos, R. (1997). Wrapper Induction for Information Extraction. In M.E. Pollack,(Ed.), Fifteenth International Joint Conference on Artificial Intelligence, Japan (Vol. 1, pp. 729–735).
Levine, J., Mason, T., and Brown, D. (1990). Lex & Yacc, O'Reilly and Associates.
Levy, A.Y., Rajaraman, A., and Ordille, J.J. (1996a). Query-Answering Algorithms for Information Angents. In Proceedings of the 13th National Conference on Artificial Intelligence, Portland, Oregon, USA.
Levy, A.Y., Rajaraman, A., and Ordille, J.J. (1996b). Querying Heterogeneous Information Sources Using Source Descriptions. In Proceedings of the 22nd VLDB Conference, Mumbai (Bombay), India.
Lloyd, J.W. (1987). Foundations of Logic Programming, 2nd edn., Springer-Verlag.
Neugebauer, G. and Schäfer, D. (1997). GLUE-Opening the World to Theorem Provers. In Proceedings of Logic Programming and Non-Monotonic Reasoning97, Dagstuhl, Germany.
Raschid, L.,Vidal, M.E., and Gruser, J.-R. (1997).AFlexible Meta-Wrapper Interface for Autonomous Distributed Information Sources. Technical Report AR 309, University of Maryland, Institute for Advanced Computer Studies-Dept. of Computer Science.
Shakes, J., Langheinrich, M., and Etzioni, O. (1997). Dynamic Reference Sifting: A Case Study in the Homepage Domain. In Proceedings of the Sixth International World Wide Web Conference (pp. 189–200).
Shieber, S.M. (1986). An Introduction to Unification-Based Approaches to Grammar. CSLI, Leland Stanford Junior University, CSLI Lecture Notes 4.
Smolka, G. and Treinen, R. (1994). Records for Logic Programming, Journal of Logic Programming, 18, 229–258.
Stickel, M.E. (1985). Automated Deduction by Theory Resolution, Journal of Automated Reasoning, 1, 333–355.
Subrahmanian, V.S., Adali, S., Brink, A., Emery, R., Lu, J.J., Rajput, A., Rogers, T.J., Ross, R., and Ward, C. (1996). HERMES: A Heterogeneous Reasoning and Mediator System. http://www.cs.umd.edu//projects/hermes/overview/paper/index.html.
Thomas, B. (1998). Token-Templates und Logisches Programmieren im World-Wide-Web. Master's Thesis, University of Koblenz-Landau, Abteilung Landau, Institut für Informatik.
Thomas, B. (1998). The txw3–module. http://www.uni-oblenz.de/»bthomas/TXW3.html.
Tomasic, A., Raschid, L., and Valduriez, P. (1995). Scaling Heterogeneous Databases and the Design of Disco. Technical Report 2704, INRIA, Institut National De Recherche En Informatique Et En Automatique.
Wiederhold, G. (1992). Mediators in the Architecture of Future Information Systems, IEEE Computer, 38–49.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Thomas, B. Token-Templates and Logic Programs for Intelligent Web Search. Journal of Intelligent Information Systems 14, 241–261 (2000). https://doi.org/10.1023/A:1008792020665
Issue Date:
DOI: https://doi.org/10.1023/A:1008792020665