L-wrappers: concepts, properties and construction

Bădică, Costin; Bădică, Amelia; Popescu, Elvira; Abraham, Ajith

doi:10.1007/s00500-006-0118-y

L-wrappers: concepts, properties and construction

A declarative approach to data extraction from web sources

Focus
Published: 06 July 2006

Volume 11, pages 753–772, (2007)
Cite this article

Soft Computing Aims and scope Submit manuscript

Costin Bădică¹,
Amelia Bădică²,
Elvira Popescu³ &
…
Ajith Abraham⁴

64 Accesses
8 Citations
Explore all metrics

Abstract

In this paper, we propose a novel class of wrappers (logic wrappers) inspired by the logic prog- ramming paradigm. The developed Logic wrappers (L-wrapper) have declarative semantics, and therefore: (i) their specification is decoupled from their implementation and (ii) they can be generated using inductive logic programming. We also define a convenient way for mapping L-wrappers to XSLT for efficient processing using available XSLT processing engines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Templating the Semantic Web via RSLT

Towards Higher-order OWL

Article 25 June 2020

Semantic Web Languages: Expressivity of SWL

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Anton T (2005) XPath-wrapper induction by generalizing tree traversal patterns. In: Mathias Bauer, Boris Brandherm, Johannes Fürnkranz, Gunter Grieser, Andreas Hotho, Andreas Jedlitschka, Alexander Krner (eds) Lernen, Wissensentdeckung und Adaptivitt (LWA) 2005. GI Workshops, Saarbrcken, pp 126–133
Google Scholar
Baumgartner R, Flesca S, Gottlob G (2001) The Elog web extraction language. In: Nieuwenhuis R, Voronkov A (eds) Proceedings of LPAR’2001, LNAI 2250. Springer, Berlin Heidelberg New York, pp 548–560
Google Scholar
Baumgartner R, Frolich O, Gottlob G, Harz P, Herzog M, Lehmann P (2005) Web data extraction for business intelligence: the Lixto approach. In: Gottfried Vossen, Frank Leymann, Peter C. Lockemann, Wolffried Stucky (eds) Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GI-Fachbereichs “Datenbanken und ” (DBIS), Karslrhue, Germany, 2005. Lecture Notes in Informatics, vol 65, GI, pp 30–47
Bădică C, Bădică A (2004) Rule learning for feature values extraction from HTML product information sheets. In: Boley H, Antoniou G (eds) Proceedings RuleML’04, Hiroshima LNCS, 3323. Springer, Berlin Heidelberg New York, pp 37–48
Google Scholar
Bădică C, Popescu E, Bădică A (2005a) Learning logic wrappers for information extraction from the Web. In: Papazoglou M, Yamazaki, K (eds) Proceedings of the SAINT’2005 Workshops. Computer Intelligence for Exabyte Scale Data Explosion. IEEE Computer Society Press, Trento pp 336–339
Bădică C, Bădică A, Popescu E (2005b) Tuples extraction from HTML using logic wrappers and inductive logic programming. In: Szczepaniak, PS, Kacprzyk J, Niewiadomski A (eds) Proceedings of the AWIC’05, Lodz, Poland LNAI 3528. Springer, Berlin Heidelberg New York, pp 44–50
Google Scholar
Bădică C, Bădică A (2005) Logic wrappers and XSLT transformations for tuples extraction from HTML. In: Bressan S, Ceri S, Hunt E, Ives ZG, Bellahsene Z, Rys M, Unland R, (eds) Proceedings, 3rd international XML database symposium XSym’05, Trondheim LNCS 3671. Springer, Berlin Heidelberg New York, pp 177–191
Google Scholar
Bernardoni C, Fiumara G, Marchi M, Provetti A (2006) Declarative Web data extraction and annotation. 20th workshop on logic programming, WLP. Vienna, Austria
Bex GJ, Maneth S, Neven F (2002) A formal model for an expressive fragment of XSLT. Inf syst Elsevier 27: 21–39
Article MATH Google Scholar
Chakrabarti S (2003) Mining the Web. Discovering knowledge from hypertext data. Morgan Kaufmann Publishers
Chidlovskii B (2003) Information extraction from Tree documents by learning subtree delimiters. Proceedings of the IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), Acapulco, Mexico pp 3–8
Clark J (1999) XSLT transformation (XSLT) 1.0. W3C recommendation, 16 November 1999, http://www.w3. org/TR/xslt2
Cormen TH, Leiserson CE, Rivest RR (1990) Introduction to Algorithms. MIT Press, Cambridge
MATH Google Scholar
Freitag D (1998) Information extraction from HTML: application of a general machine learning approach. In: Proceedings of AAAI’98, pp 517–523
Gottlob G, Koch C, Schulz KU (2004) Conjunctive queries over trees. In: Proceedings of the PODS’2004, Paris, France. ACM Press, pp 189–200
Gottlob G, Koch C (2004) Monadic datalog and the expressive power of languages for Web information extraction. J ACM 51 (1):74–113
Article MathSciNet Google Scholar
Knoblock C (2002) Agents for gathering, integrating, and monitoring information for travel planning. In: Intelligent systems for tourism. IEEE Intell Syst Nov./Dec.:53–66
Kushmerick N (2000) Wrapper induction: efficiency and expressiveness. Artif intell, Elsevier 118:15–68
MATH MathSciNet Google Scholar
Laender AHF, Ribeiro-Neto B, Silva AS, Teixeira, JS (2002) A brief survey of Web data extraction tools. In: SIGMOD record, ACM Press, 31(2): 84–93
Laender AHF, Ribeiro-Neto B, Silva AS (2002b) DEByE – data extraction by example. Data Knowl Eng 40 (2):121–154
Article MATH Google Scholar
Laudon KC, Traver CG (2004) E-commerce business technology society (2nd edn). Pearson Addison-Wesley, location
Google Scholar
Lenhert W, Sundheim B (1991) A performance evaluation of text-analysis technologies. AI Mag 12(3):81–94
Google Scholar
Liu B, Grossman R, Zhai Y(2004) Mining web pages for data records. IEEE Intell Syst Nov./Dec.:49–55
Mitchell TM (1997) Machine learning, McGraw-Hill, location
Oxygen XML Editor. http://www.oxygenxml.com/2
Quinlan JR, Cameron-Jones RM (1995) Induction of logic programs: FOIL and related systems. New Generation Comput 13:287–312
Article Google Scholar
Sakamoto H, Arimura H, Arikawa S (2002) Knowledge discovery from semistructured texts. In: Arikawa S, Shinohara A (eds) Progress in discovery science Lecture Notes in Computer Science, 2281. Springer, Berlin Heidelberg New York, pp 586–599
Google Scholar
Thomas B (2000) Token-templates and logic programs for intelligent web search Intelligent Information Systems. Special Issue: Methodologies Intell Inf Syst 14(2/3):241–261
Google Scholar
Xiao L, Wissmann D, Brown M, Jablonski S (2001) Information extraction from HTML: combining XML and standard techniques IE from the Web. In: Monostori L, Vancza J, Ali M (eds) Proceedings of IEA/AIE 2001 Lecture Notes in Artificial Intelligence, 2070, Springer, Berlin Heidelberg New York, 165–174
XML Path Language (XPath) Version 1.0 http://www.w3.2. org/TR/xslt2

Download references

Author information

Authors and Affiliations

Software Engineering Department, University of Craiova, Bvd.Decebal 107, Craiova, 200440, Romania
Costin Bădică
Business Information Systems Department, University of Craiova, A.I.Cuza 13, Craiova, 200585, Romania
Amelia Bădică
Software Engineering Department, University of Craiova, Bvd.Decebal, Craiova, 200440, Romania
Elvira Popescu
IITA Professorship Program, School of Computer Science and Engineering, Chung-Ang University, 221, Heukseok-dong, Dongjak-gu Seoul, 156-756, Republic of Korea
Ajith Abraham

Authors

Costin Bădică
View author publications
You can also search for this author in PubMed Google Scholar
Amelia Bădică
View author publications
You can also search for this author in PubMed Google Scholar
Elvira Popescu
View author publications
You can also search for this author in PubMed Google Scholar
Ajith Abraham
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Costin Bădică.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bădică, C., Bădică, A., Popescu, E. et al. L-wrappers: concepts, properties and construction. Soft Comput 11, 753–772 (2007). https://doi.org/10.1007/s00500-006-0118-y

Download citation

Published: 06 July 2006
Issue Date: June 2007
DOI: https://doi.org/10.1007/s00500-006-0118-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

L-wrappers: concepts, properties and construction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Templating the Semantic Web via RSLT

Towards Higher-order OWL

Semantic Web Languages: Expressivity of SWL

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

L-wrappers: concepts, properties and construction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Templating the Semantic Web via RSLT

Towards Higher-order OWL

Semantic Web Languages: Expressivity of SWL

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation