Skip to main content

An Example-Based Environment for Wrapper Generation

  • Conference paper
  • First Online:
Conceptual Modeling for E-Business and the Web (ER 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1921))

Included in the following conference series:

Abstract

In the so-called Web information systems, the role of extracting data of interest from Web sites is played by software components generically known as wrappers. As a result, the existence of flexible tools for designing, developing and maintaining wrappers is crucial. In this paper, we present WByE (Wrapping By Example), a user-oriented set of tools for helping the user to build wrappers. WByE is based on information implicitly provided by the user by means of suitable and intuitive interfaces. It includes two components: the ASByE tool, used for generating specifications on how to fetch desired pages (be them static or dynamic), and the DEByE tool, used for the extraction of data implicitly present in the fetched pages.

This work is supported by Project SIAM (grant MCT/FINEP/PRONEX 76.97.1016.00) and by individual research grants from CNPq and CAPES.

On leave from the University of Amazonas, Brazil.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adelberg, B. NoDoSE-A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents. In Proceedings of the ACM SIG-MOD Conference on Management of Data (Seattle, Washington, 1998), pp. 283–294.

    Google Scholar 

  2. Atzeni, P., Mecca, G., and Merialdo, P. Semistructured und Structured Data in the Web: Going Back and Forth. SIGMOD Record 26, 4 (1997), 16–23.

    Article  Google Scholar 

  3. DB&LP. DB&LP’s Index to ACM TODS. http://www.informatik.uni-trier.de/_ley/db/journals/tods/index.html.

  4. Hammer, J., Garcia-Molina, H., Nestorov, S., Yerneni, R., Breunig, M.,and Vassalos, V. Template-Based Wrappers in the TSIMMIS Experience. In Proceedings of the ACM SIGMOD Conference on Management of Data (Tucson, Arizona, 1997), pp. 532–535.

    Google Scholar 

  5. Hasan, M., Mendelzon, A., and Vista, D. Applying Database Visualization to the World Wide Web. ACM SIGMOD Record 25, 4 (1996), 40–44.

    Google Scholar 

  6. Konopnicki, D., and Shmueli, O. Information Gathering in the World-Wide Web: The W3QL Query Language and the W3QS System. ACM Transactions on Database Systems (TODS) 23, 4 (1998), 369–410.

    Google Scholar 

  7. Laender, A. H. F., Ribeiro-Neto, B.,da Silva, A. S., and Silva, E. S. Re-presenting Web Data as Complex Objects. In Proceedings of the First International Conference on Electronic Commerce and Web Technologies-EC-Web 2000 (Gre-enwich, UK, 2000), S. Madria and G. Pernull, Eds., Lecture Notes in ComputerScience.

    Google Scholar 

  8. Lawrence, S., and Giles, C. Searching the World Wide Web. Science 280, 4 (1998), 98–100.

    Article  Google Scholar 

  9. Liu, L., Pu, C., and Han, W. XWRAP: An XML-enabled Wrapper Construc-tion System for Web Information Sources. In Proceeding of the 16th International Conference on Data Engineering (San Diego, California, 2000), pp. 611–621.

    Google Scholar 

  10. Ludäscher, B., Himmeröder, R., Lausen, G., May, W., and Schelepphorst, C. Managing Semistrucutured Data with FLORID: a Deductice Object-Oriented Approach. Information Systems 23, 8 (1998), 589–614.

    Article  Google Scholar 

  11. Muslea, I., Minton, S., and Knoblock, C. A hierarchical approach to wrap-per induction. In Proceedings of the 3rd Conference on Autonomous Agents (Seattle,Washington, 1999), pp. 190–199.

    Google Scholar 

  12. Quass, D., Widom, J., Goldman, R., Haas, K., Luo, Q., McHugh, J., Ne-storov, S., Rajaraman, A., Rivero, H., Abiteboul, S., Ullman, J. D., and Wiener, J. L. LORE: A Lightweight Object REpository for Semistructured Data. In Proceedings of the International ACM SIGMOD Conference on Management of Data (Montreal, Canada, 1996), p. 549.

    Google Scholar 

  13. Ribeiro-Neto, B., Laender, A. H. F., and da Silva, A. S. Extracting Semi-Structured Data Through Examples. In Proceedings of the Eighth ACM Internatio-nal Conference on Information and Knowledge Management-CIKM'99 (Kansas City, Missouri, 1999), pp. 94–101.

    Google Scholar 

  14. Sahuguet, A., and Azavant, F. Web Ecology: Recycling HTML pages as XML documents using W4F. In Proceedings of the Second International Workshop on the Web and Databases (Philadelphia, Pennsylvania, 1999), pp. 31–26.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Golgher, P.B., Laender, A.H.F., da Silva, A.S., Ribeiro-Neto, B. (2000). An Example-Based Environment for Wrapper Generation. In: Liddle, S.W., Mayr, H.C., Thalheim, B. (eds) Conceptual Modeling for E-Business and the Web. ER 2000. Lecture Notes in Computer Science, vol 1921. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45394-6_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-45394-6_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41073-7

  • Online ISBN: 978-3-540-45394-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics