ViDE: A Visual Data Extraction Environment for the Web

Li, Yi; Keong Ng, Wee; Lim, Ee-Peng

doi:10.1007/3-540-44759-8_57

Yi Li⁸,
Wee Keong Ng⁸ &
Ee-Peng Lim⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2113))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

546 Accesses
1 Citations

Abstract

With the rapid growth of information on the Web, a means to combat information overload is critical. In this paper, we present ViDE (Visual Data Extraction), an interactive web data extraction environment that supports eÆcient hierarchical data wrapping of multiple web pages. ViDE has two unique features that differentiate it from other extraction mechanisms. First, data extraction rules can be easily specified in a graphical user interface that is seamlessly integrated with a web browser. Second, ViDE introduces the concept of grouping which unites the extraction rules for a set of documents with the navigational patterns that exist among them. This paper describes our initial development of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Information Extraction from the Web by Matching Visual Presentation Patterns

A Building-Block Approach to the Diversity of Visualization Types – Each Type Expressed Visually, and as a Systematically Generated Sentence

Knowde: A Visual Search Interface

References

Document Object Model(DOM) 1.0. W3C recommendation, 1998
Google Scholar
Ling Liu, Calton Pu, Wei Han. XWRAP: An XML-enabled Wrapper Construction System for Web Information Sources, Proceedings of the 16th International Conference on Data Engineering, San Diego CA, USA, 2000.
Google Scholar
Kevin D. Munroe, Yannis Papakonstantinou. BBQ: A Visual Interface for Integrated Browsing and Querying of XML. In Visual Database Systems (VDB) 2000
Google Scholar
G. Mecca, P. Atzeni, P. Merialdo, A. Masci, and G. Sindoni. From Databases to Web-Bases: The ARANEUS Experience. Technical Report RT-DIA-34-1998, Universita Degli Studi Di Roma Tre, May 1998
Google Scholar
P. Atzeni and G. Mecca. Cut and paste. In Proceedings of 16th ACM SIGMOD Symposion on Principles of Database Systems, 1997
Google Scholar
G. Wiederhold. Mediator in the Architecture of Future Information Systems. In IEEE Computer 253, pp. 38–49.
Google Scholar
B. Adelberg. NoDoSE-a tool for semi-automatic data extraction from text files. Technical Report, Computer Science Department, Northwestern University, 1998.
Google Scholar
Y. Li. Information Extraction and Integration on the Web: A Practical Approach. Technical Report, School of Computer Engineering, Nanyang Technological University, March 2000.
Google Scholar
Udi Manber, Mike Smith, Burra Gopal. WebGlimpse-Combining Browsing and Searching. 1997 Usenix Technical Conference, Jan, 1997.
Google Scholar
Arnaud Sahuguet and Fabien Azavant. Building light-weight wrappers for legacy Web data-sources using W4F. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), 1999
Google Scholar
Laks V. S. Lakshmanan, Fereidoon Sadri, and Iyer N. Subramanian. A declarative Language for Querying and Restructuring theWeb. In Proc. of 6th. Int. Workshop on Research Issues in Data Engineering, RIDE’96, New Orleans, Feb, 1996.
Google Scholar
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS Project: Integration of Heterogeneous Information Source. In Proc. of the 100th IPSJ, Tokyo, Japan, Oct, 1994.
Google Scholar
M.L. Barja, T. Bratvold, J. Myllymaki and G. Sonnenberger. Informia: a Mediator for Integrated Access to Heterogeneous Information Sourse. CIKM 98, Bethesda, MD USA
Google Scholar
http://cnnfn.cnn.com/news/technology/techstocks/
http://www.w3.org/XML/

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University, Singapore, 639798, SINGAPORE
Yi Li, Wee Keong Ng & Ee-Peng Lim

Authors

Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
Wee Keong Ng
View author publications
You can also search for this author in PubMed Google Scholar
Ee-Peng Lim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Klagenfurt, IFI -IWAS Universitaetsstr. 65, 9020, Klagenfurt, Austria
Heinrich C. Mayr
Faculty of Electrical Engineering, Czech Technical University, Technicka 2, 166 27, Prague 6, Czech Republic
Jiri Lazansky
School of Computer and Information Science, University of South Australia, Mawson Lakes Campus, Mawson Lakes, SA, 5095
Gerald Quirchmayr
Department of Information Systems, Technical University of Munich, Orleanstr. 34, 81667, Munich, Germany
Pavel Vogel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Keong Ng, W., Lim, EP. (2001). ViDE: A Visual Data Extraction Environment for the Web. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds) Database and Expert Systems Applications. DEXA 2001. Lecture Notes in Computer Science, vol 2113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44759-8_57

Download citation

DOI: https://doi.org/10.1007/3-540-44759-8_57
Published: 28 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42527-4
Online ISBN: 978-3-540-44759-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics