Skip to main content

WetDL: A Web Information Extraction Language

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3261))

Abstract

Many online information sources are available on the Web. Giving machine access to such sources leads to many interesting applications, such as using web data in mediators or software agents. Up to now most work in the field of information extraction from the web has concentrated on building wrappers, i.e. programs allowing to reformat presentational data in HTML into a more machine comprehensible format. While being an important part of a web information extraction application such wrappers are not sufficient to fully access a source. Indeed, it is necessary to setup an infrastructure allowing to build queries, fetch pages, extract specific links, etc. In this paper we propose a language called WetDL allowing to describe an information extraction task as a network of operators whose execution performs the desired extraction task.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence (2000)

    Google Scholar 

  2. Chang, C.H., Hsu, C.N., Lui, S.C.: Automatic Information Extraction from Semi-Structured Web Pages by Pattern Discovery. Decision Support Systems Journal 35 (2003)

    Google Scholar 

  3. Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. The VLDB Journal, 109–118 (2001)

    Google Scholar 

  4. Hsu, C.N., Dung, M.T.: Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web. Information Systems 23 (1998)

    Google Scholar 

  5. Muslea, I., Minton, S., Knoblock, C.A.: Hierarchical Wrapper Induction for Semistructured Information Sources. Autonomous Agents and Multi-Agent System 4 (2001)

    Google Scholar 

  6. Habegger, B., Quafafou, M.: Building web information extraction tasks. In: WI 2004. Proceedings of the ACM/IEEE Web Intelligence Conference, Beijing, China (2004) (to appear)

    Google Scholar 

  7. Seo, H., Yang, J., Choi, J.: Knowledge-based Wrapper Generation by Using XML. In: IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle, Washington (2001)

    Google Scholar 

  8. Habegger, B., Quafafou, M.: Multi-pattern wrappers for relation extraction. In: van Harmelan, F. (ed.) ECAI 2002. Proceedings of the 15th European Conference on Artificial Intelligence, IOS Press, Amsterdam (2002)

    Google Scholar 

  9. Kushmerick, N.: Learning to Invoke Web Forms. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS/DOA/ODBASE, Catania, Sicily, Italy. LNCS, pp. 997–1013. Springer, Heidelberg (2003)

    Google Scholar 

  10. Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: Apers, P.M.G., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R.T. (eds.) VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, pp. 129–138. Morgan Kaufmann, Roma (2001)

    Google Scholar 

  11. May, W., Lausen, G.: A uniform framework for integration of information from the web. Information Systems 29 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Habegger, B., Quafafou, M. (2004). WetDL: A Web Information Extraction Language. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2004. Lecture Notes in Computer Science, vol 3261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30198-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30198-1_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23478-4

  • Online ISBN: 978-3-540-30198-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics