Abstract
In this paper, we propose a novel class of wrappers (logic wrappers) inspired by the logic prog- ramming paradigm. The developed Logic wrappers (L-wrapper) have declarative semantics, and therefore: (i) their specification is decoupled from their implementation and (ii) they can be generated using inductive logic programming. We also define a convenient way for mapping L-wrappers to XSLT for efficient processing using available XSLT processing engines.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Anton T (2005) XPath-wrapper induction by generalizing tree traversal patterns. In: Mathias Bauer, Boris Brandherm, Johannes Fürnkranz, Gunter Grieser, Andreas Hotho, Andreas Jedlitschka, Alexander Krner (eds) Lernen, Wissensentdeckung und Adaptivitt (LWA) 2005. GI Workshops, Saarbrcken, pp 126–133
Baumgartner R, Flesca S, Gottlob G (2001) The Elog web extraction language. In: Nieuwenhuis R, Voronkov A (eds) Proceedings of LPAR’2001, LNAI 2250. Springer, Berlin Heidelberg New York, pp 548–560
Baumgartner R, Frolich O, Gottlob G, Harz P, Herzog M, Lehmann P (2005) Web data extraction for business intelligence: the Lixto approach. In: Gottfried Vossen, Frank Leymann, Peter C. Lockemann, Wolffried Stucky (eds) Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GI-Fachbereichs “Datenbanken und ” (DBIS), Karslrhue, Germany, 2005. Lecture Notes in Informatics, vol 65, GI, pp 30–47
Bădică C, Bădică A (2004) Rule learning for feature values extraction from HTML product information sheets. In: Boley H, Antoniou G (eds) Proceedings RuleML’04, Hiroshima LNCS, 3323. Springer, Berlin Heidelberg New York, pp 37–48
Bădică C, Popescu E, Bădică A (2005a) Learning logic wrappers for information extraction from the Web. In: Papazoglou M, Yamazaki, K (eds) Proceedings of the SAINT’2005 Workshops. Computer Intelligence for Exabyte Scale Data Explosion. IEEE Computer Society Press, Trento pp 336–339
Bădică C, Bădică A, Popescu E (2005b) Tuples extraction from HTML using logic wrappers and inductive logic programming. In: Szczepaniak, PS, Kacprzyk J, Niewiadomski A (eds) Proceedings of the AWIC’05, Lodz, Poland LNAI 3528. Springer, Berlin Heidelberg New York, pp 44–50
Bădică C, Bădică A (2005) Logic wrappers and XSLT transformations for tuples extraction from HTML. In: Bressan S, Ceri S, Hunt E, Ives ZG, Bellahsene Z, Rys M, Unland R, (eds) Proceedings, 3rd international XML database symposium XSym’05, Trondheim LNCS 3671. Springer, Berlin Heidelberg New York, pp 177–191
Bernardoni C, Fiumara G, Marchi M, Provetti A (2006) Declarative Web data extraction and annotation. 20th workshop on logic programming, WLP. Vienna, Austria
Bex GJ, Maneth S, Neven F (2002) A formal model for an expressive fragment of XSLT. Inf syst Elsevier 27: 21–39
Chakrabarti S (2003) Mining the Web. Discovering knowledge from hypertext data. Morgan Kaufmann Publishers
Chidlovskii B (2003) Information extraction from Tree documents by learning subtree delimiters. Proceedings of the IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), Acapulco, Mexico pp 3–8
Clark J (1999) XSLT transformation (XSLT) 1.0. W3C recommendation, 16 November 1999, http://www.w3. org/TR/xslt2
Cormen TH, Leiserson CE, Rivest RR (1990) Introduction to Algorithms. MIT Press, Cambridge
Freitag D (1998) Information extraction from HTML: application of a general machine learning approach. In: Proceedings of AAAI’98, pp 517–523
Gottlob G, Koch C, Schulz KU (2004) Conjunctive queries over trees. In: Proceedings of the PODS’2004, Paris, France. ACM Press, pp 189–200
Gottlob G, Koch C (2004) Monadic datalog and the expressive power of languages for Web information extraction. J ACM 51 (1):74–113
Knoblock C (2002) Agents for gathering, integrating, and monitoring information for travel planning. In: Intelligent systems for tourism. IEEE Intell Syst Nov./Dec.:53–66
Kushmerick N (2000) Wrapper induction: efficiency and expressiveness. Artif intell, Elsevier 118:15–68
Laender AHF, Ribeiro-Neto B, Silva AS, Teixeira, JS (2002) A brief survey of Web data extraction tools. In: SIGMOD record, ACM Press, 31(2): 84–93
Laender AHF, Ribeiro-Neto B, Silva AS (2002b) DEByE – data extraction by example. Data Knowl Eng 40 (2):121–154
Laudon KC, Traver CG (2004) E-commerce business technology society (2nd edn). Pearson Addison-Wesley, location
Lenhert W, Sundheim B (1991) A performance evaluation of text-analysis technologies. AI Mag 12(3):81–94
Liu B, Grossman R, Zhai Y(2004) Mining web pages for data records. IEEE Intell Syst Nov./Dec.:49–55
Mitchell TM (1997) Machine learning, McGraw-Hill, location
Oxygen XML Editor. http://www.oxygenxml.com/2
Quinlan JR, Cameron-Jones RM (1995) Induction of logic programs: FOIL and related systems. New Generation Comput 13:287–312
Sakamoto H, Arimura H, Arikawa S (2002) Knowledge discovery from semistructured texts. In: Arikawa S, Shinohara A (eds) Progress in discovery science Lecture Notes in Computer Science, 2281. Springer, Berlin Heidelberg New York, pp 586–599
Thomas B (2000) Token-templates and logic programs for intelligent web search Intelligent Information Systems. Special Issue: Methodologies Intell Inf Syst 14(2/3):241–261
Xiao L, Wissmann D, Brown M, Jablonski S (2001) Information extraction from HTML: combining XML and standard techniques IE from the Web. In: Monostori L, Vancza J, Ali M (eds) Proceedings of IEA/AIE 2001 Lecture Notes in Artificial Intelligence, 2070, Springer, Berlin Heidelberg New York, 165–174
XML Path Language (XPath) Version 1.0 http://www.w3.2. org/TR/xslt2
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bădică, C., Bădică, A., Popescu, E. et al. L-wrappers: concepts, properties and construction. Soft Comput 11, 753–772 (2007). https://doi.org/10.1007/s00500-006-0118-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-006-0118-y