Abstract
Many online information sources are available on the Web. Giving machine access to such sources leads to many interesting applications, such as using web data in mediators or software agents. Up to now most work in the field of information extraction from the web has concentrated on building wrappers, i.e. programs allowing to reformat presentational data in HTML into a more machine comprehensible format. While being an important part of a web information extraction application such wrappers are not sufficient to fully access a source. Indeed, it is necessary to setup an infrastructure allowing to build queries, fetch pages, extract specific links, etc. In this paper we propose a language called WetDL allowing to describe an information extraction task as a network of operators whose execution performs the desired extraction task.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence (2000)
Chang, C.H., Hsu, C.N., Lui, S.C.: Automatic Information Extraction from Semi-Structured Web Pages by Pattern Discovery. Decision Support Systems Journal 35 (2003)
Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. The VLDB Journal, 109–118 (2001)
Hsu, C.N., Dung, M.T.: Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web. Information Systems 23 (1998)
Muslea, I., Minton, S., Knoblock, C.A.: Hierarchical Wrapper Induction for Semistructured Information Sources. Autonomous Agents and Multi-Agent System 4 (2001)
Habegger, B., Quafafou, M.: Building web information extraction tasks. In: WI 2004. Proceedings of the ACM/IEEE Web Intelligence Conference, Beijing, China (2004) (to appear)
Seo, H., Yang, J., Choi, J.: Knowledge-based Wrapper Generation by Using XML. In: IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle, Washington (2001)
Habegger, B., Quafafou, M.: Multi-pattern wrappers for relation extraction. In: van Harmelan, F. (ed.) ECAI 2002. Proceedings of the 15th European Conference on Artificial Intelligence, IOS Press, Amsterdam (2002)
Kushmerick, N.: Learning to Invoke Web Forms. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS/DOA/ODBASE, Catania, Sicily, Italy. LNCS, pp. 997–1013. Springer, Heidelberg (2003)
Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: Apers, P.M.G., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R.T. (eds.) VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, pp. 129–138. Morgan Kaufmann, Roma (2001)
May, W., Lausen, G.: A uniform framework for integration of information from the web. Information Systems 29 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Habegger, B., Quafafou, M. (2004). WetDL: A Web Information Extraction Language. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2004. Lecture Notes in Computer Science, vol 3261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30198-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-30198-1_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23478-4
Online ISBN: 978-3-540-30198-1
eBook Packages: Computer ScienceComputer Science (R0)