Skip to main content

ViDE: A Visual Data Extraction Environment for the Web

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2113))

Included in the following conference series:

Abstract

With the rapid growth of information on the Web, a means to combat information overload is critical. In this paper, we present ViDE (Visual Data Extraction), an interactive web data extraction environment that supports eÆcient hierarchical data wrapping of multiple web pages. ViDE has two unique features that differentiate it from other extraction mechanisms. First, data extraction rules can be easily specified in a graphical user interface that is seamlessly integrated with a web browser. Second, ViDE introduces the concept of grouping which unites the extraction rules for a set of documents with the navigational patterns that exist among them. This paper describes our initial development of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Document Object Model(DOM) 1.0. W3C recommendation, 1998

    Google Scholar 

  2. Ling Liu, Calton Pu, Wei Han. XWRAP: An XML-enabled Wrapper Construction System for Web Information Sources, Proceedings of the 16th International Conference on Data Engineering, San Diego CA, USA, 2000.

    Google Scholar 

  3. Kevin D. Munroe, Yannis Papakonstantinou. BBQ: A Visual Interface for Integrated Browsing and Querying of XML. In Visual Database Systems (VDB) 2000

    Google Scholar 

  4. G. Mecca, P. Atzeni, P. Merialdo, A. Masci, and G. Sindoni. From Databases to Web-Bases: The ARANEUS Experience. Technical Report RT-DIA-34-1998, Universita Degli Studi Di Roma Tre, May 1998

    Google Scholar 

  5. P. Atzeni and G. Mecca. Cut and paste. In Proceedings of 16th ACM SIGMOD Symposion on Principles of Database Systems, 1997

    Google Scholar 

  6. G. Wiederhold. Mediator in the Architecture of Future Information Systems. In IEEE Computer 253, pp. 38–49.

    Google Scholar 

  7. B. Adelberg. NoDoSE-a tool for semi-automatic data extraction from text files. Technical Report, Computer Science Department, Northwestern University, 1998.

    Google Scholar 

  8. Y. Li. Information Extraction and Integration on the Web: A Practical Approach. Technical Report, School of Computer Engineering, Nanyang Technological University, March 2000.

    Google Scholar 

  9. Udi Manber, Mike Smith, Burra Gopal. WebGlimpse-Combining Browsing and Searching. 1997 Usenix Technical Conference, Jan, 1997.

    Google Scholar 

  10. Arnaud Sahuguet and Fabien Azavant. Building light-weight wrappers for legacy Web data-sources using W4F. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), 1999

    Google Scholar 

  11. Laks V. S. Lakshmanan, Fereidoon Sadri, and Iyer N. Subramanian. A declarative Language for Querying and Restructuring theWeb. In Proc. of 6th. Int. Workshop on Research Issues in Data Engineering, RIDE’96, New Orleans, Feb, 1996.

    Google Scholar 

  12. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS Project: Integration of Heterogeneous Information Source. In Proc. of the 100th IPSJ, Tokyo, Japan, Oct, 1994.

    Google Scholar 

  13. M.L. Barja, T. Bratvold, J. Myllymaki and G. Sonnenberger. Informia: a Mediator for Integrated Access to Heterogeneous Information Sourse. CIKM 98, Bethesda, MD USA

    Google Scholar 

  14. http://cnnfn.cnn.com/news/technology/techstocks/

  15. http://www.w3.org/XML/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Y., Keong Ng, W., Lim, EP. (2001). ViDE: A Visual Data Extraction Environment for the Web. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds) Database and Expert Systems Applications. DEXA 2001. Lecture Notes in Computer Science, vol 2113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44759-8_57

Download citation

  • DOI: https://doi.org/10.1007/3-540-44759-8_57

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42527-4

  • Online ISBN: 978-3-540-44759-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics