Skip to main content

WrapIt: Automated Integration of Web Databases with Extensional Overlaps

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2593))

Abstract

The world wide web does not longer consist of static web pages. Instead, more and more web pages are created dynamically from user request and database content. Conventional search engines do not consider these dynamic pages, as user input cannot be simulated, thus providing often insufficient results.

A new approach for online integration of web databases will be presented in this paper. Providing only one sample HTML result page for a source, result pages for new requests will be found by structural recognition. Once structural recognition is established for one source, other web databases of the same universe (e.g. movie databases) can be integrated on the fly by content-based recognition. Thus, the user receives results from various sources.

Global schemata will not be produced at all. Instead, the heterogeneity of the single sources will be preserved. The only requirement is given by the existence of an extensional overlap of the databases.

Part of this work was supported by the Berlin-Brandenburg Graduate School in Distributed Information Systems (DFG grant no. GRK 316)

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Robert Baumgartner, Sergio Flesca, and Georg Gottlob. Declarative information extraction,Web crawling, and recursive wrapping with lixto. Lecture Notes in Computer Science, 2173, 2001.

    Google Scholar 

  2. Valter Crescenzi, Giansalvatore Mecca, and Paolo Merialdo. Roadrunner: Towards automatic data extraction from large web sites. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB’ 01), pages 109–118, Orlando, September 2001. Morgan Kaufmann.

    Google Scholar 

  3. William W. Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proceedings of the 1998 ACM SIGMOD, Seattle, Washington, 1998.

    Google Scholar 

  4. Janet L. Wiener Marc Najork. Breadth-first search crawling yields highquality pages. In Proceedings of Tenth International World Wide Web Conference, Hong Kong, May 2001.

    Google Scholar 

  5. Sriram Raghavan and Hector Garcia-Molina. Crawling the hidden web. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB’ 01), pages 129–138, Orlando, September 2001. Morgan Kaufmann.

    Google Scholar 

  6. Arnaud Sahuguet and Fabien Azavant. Building light-weight wrappers for legacy web data-sources using w4f. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB’ 99), 1999.

    Google Scholar 

  7. Gerald Salton, editor. Automatic Text Processing. Addison-Wesley, Reading, Massachusetts, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neiling, M., Schaal, M., Schumann, M. (2003). WrapIt: Automated Integration of Web Databases with Extensional Overlaps. In: Chaudhri, A.B., Jeckle, M., Rahm, E., Unland, R. (eds) Web, Web-Services, and Database Systems. NODe 2002. Lecture Notes in Computer Science, vol 2593. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36560-5_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-36560-5_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00745-6

  • Online ISBN: 978-3-540-36560-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics