Abstract
With the rapid growth of information on the Web, a means to combat information overload is critical. In this paper, we present ViDE (Visual Data Extraction), an interactive web data extraction environment that supports eÆcient hierarchical data wrapping of multiple web pages. ViDE has two unique features that differentiate it from other extraction mechanisms. First, data extraction rules can be easily specified in a graphical user interface that is seamlessly integrated with a web browser. Second, ViDE introduces the concept of grouping which unites the extraction rules for a set of documents with the navigational patterns that exist among them. This paper describes our initial development of the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Document Object Model(DOM) 1.0. W3C recommendation, 1998
Ling Liu, Calton Pu, Wei Han. XWRAP: An XML-enabled Wrapper Construction System for Web Information Sources, Proceedings of the 16th International Conference on Data Engineering, San Diego CA, USA, 2000.
Kevin D. Munroe, Yannis Papakonstantinou. BBQ: A Visual Interface for Integrated Browsing and Querying of XML. In Visual Database Systems (VDB) 2000
G. Mecca, P. Atzeni, P. Merialdo, A. Masci, and G. Sindoni. From Databases to Web-Bases: The ARANEUS Experience. Technical Report RT-DIA-34-1998, Universita Degli Studi Di Roma Tre, May 1998
P. Atzeni and G. Mecca. Cut and paste. In Proceedings of 16th ACM SIGMOD Symposion on Principles of Database Systems, 1997
G. Wiederhold. Mediator in the Architecture of Future Information Systems. In IEEE Computer 253, pp. 38–49.
B. Adelberg. NoDoSE-a tool for semi-automatic data extraction from text files. Technical Report, Computer Science Department, Northwestern University, 1998.
Y. Li. Information Extraction and Integration on the Web: A Practical Approach. Technical Report, School of Computer Engineering, Nanyang Technological University, March 2000.
Udi Manber, Mike Smith, Burra Gopal. WebGlimpse-Combining Browsing and Searching. 1997 Usenix Technical Conference, Jan, 1997.
Arnaud Sahuguet and Fabien Azavant. Building light-weight wrappers for legacy Web data-sources using W4F. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), 1999
Laks V. S. Lakshmanan, Fereidoon Sadri, and Iyer N. Subramanian. A declarative Language for Querying and Restructuring theWeb. In Proc. of 6th. Int. Workshop on Research Issues in Data Engineering, RIDE’96, New Orleans, Feb, 1996.
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS Project: Integration of Heterogeneous Information Source. In Proc. of the 100th IPSJ, Tokyo, Japan, Oct, 1994.
M.L. Barja, T. Bratvold, J. Myllymaki and G. Sonnenberger. Informia: a Mediator for Integrated Access to Heterogeneous Information Sourse. CIKM 98, Bethesda, MD USA
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Y., Keong Ng, W., Lim, EP. (2001). ViDE: A Visual Data Extraction Environment for the Web. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds) Database and Expert Systems Applications. DEXA 2001. Lecture Notes in Computer Science, vol 2113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44759-8_57
Download citation
DOI: https://doi.org/10.1007/3-540-44759-8_57
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42527-4
Online ISBN: 978-3-540-44759-7
eBook Packages: Springer Book Archive