Skip to main content

Storage Management of a Historical Web Warehousing System

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1873))

Abstract

In this paper, we present the storage management of the WHOWEDA web warehousing system, which warehouses historical web information. To facilitate inter-table and intra-table sharing of web pages, we propose a three-layer storage architecture, that consists of tuple, table, and pool layers of storage modules storing different parts of ware-housed web information. To improve retrieval efficiency, we have chosen to replicate some node attributes across web tables in the table layer while keeping only unique copies of web pages at the pool layer. The separation of table and pool layer storage also allows different valid times to be maintained by multiple web tables for the same web pages due to different schedules of global coupling across web tables. As the sharing of web pages may lead to valid time inconsistency between different web tables, we propose an update synchronization scheme to resolve the valid time differences on user request.

This work was supported in part by the Nanyang Technological University, Ministry of Education (Singapore) under Academic Research Fund #4-12034-5060, #4-12034-3012, #4-12034-6022. Any opinions, findings, and recommendations in this paper are those of the authors and do not reflect the views of the funding agencies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www.archive.org.

  2. S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener. The Lorel query language for semistructured data. International Journal on Digital Libraries, 1(1):68–88, April 1997.

    Article  Google Scholar 

  3. G. Arocena and A. Mendelzon. WebOQL: Restructuring documents, databases and webs. In Proceedings of ICDE/rs98, Orlando, Florida, February 1998.

    Google Scholar 

  4. G. Arocena, A. Mendelzon, and G. Mihaila. Applications of a web query language. In Proceedings of the 6th International WWW Conference, Santa Clara, April 1997.

    Google Scholar 

  5. P. Atzeni, G. Mecca, and P. Merialdo. To weave the web. In Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997.

    Google Scholar 

  6. P. Buneman, S. Davidson, and G. Hillebrand. A querying language and optimization techniques for unstructured data. In Proceedings of ACM SIGMOD Conference on Management of Data, pages 505–516, Montreal, Canada, 1996.

    Google Scholar 

  7. Y.Y. Cao, E.P. Lim, and W.K. Ng. On warehousing historical web information. Technical report, Centre for Advanced Information Systems, Nanyang Technological University, Singapore, September 1999.

    Google Scholar 

  8. J. Clifford and A. Croker. The historical relational data model (HRDM) and algebra based on lifespans. In Proceedings of the International Conference on Data Engineering, pages 528–537. IEEE Computer Society, February 1987.

    Google Scholar 

  9. M. Fernandez, D. Florescu, J. Kang, and A. Levy. Catching the boat with Strudel: Experiences with a web-site management system. In Proceedings of ACM SIGMOD Conference on Management of Data, Seattle, WA, 1998.

    Google Scholar 

  10. D. Florescu, A. Levy, and A. Mendelzon. Database techniques for the world-wide web: A survey. ACM SIGMOD Record, 27(3):59–74, September 1998.

    Article  Google Scholar 

  11. R. Himmeroder, G. Lausen, B. Ludascher, and C. Schlepphorst. On a declarative semantics for web queries. In Proceedings of the 5th International Conference on Deductive and Object-Oriented Databases, Montreux, Switzerland, December 1997.

    Google Scholar 

  12. J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. WebBase: a repository of web pages. Technical report, Stanford University, 1999.

    Google Scholar 

  13. D. Konopnicki and O. Shmueli. W3QS: A query system for the world wide web. In Proceedings of the 21st VLDB Conference, Zurich, Switzerland, 1995.

    Google Scholar 

  14. L. V. S. Lakshmanan, F. Sadri, and L. N. Subramanian. A declarative language for querying and restructuring the web. In Proceedings of the 6th International Workshop on Research Issues in Data Engineering, RIDE’ 96, New Orleans, February 1996.

    Google Scholar 

  15. A. Mendelzon, G. Mihaila, and T. Milo. Querying the world wide web. International Journal on Digital Libraries, 1(1):54–67, April 1997.

    Article  Google Scholar 

  16. S.B. Navathe and R. Ahmed. A temporal relational model and a query language. Information Sciences, 49(1–3):147–175, 1989.

    Article  Google Scholar 

  17. W.-K. Ng, E.-P. Lim, C.-T Huang, S.S. Bhowmick, and F.-Q. Qin. Web warehousing: An algebra for web information. In Proceedings of IEEE International Conference on Advances in Digital Libraries (ADL’ 98), April 1998.

    Google Scholar 

  18. Richard Snodgrass. The temporal query language TQuel. ACM Transactions on Database Systems, 12(2):247–298, June 1987.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cao, Y., Lim, EP., Ng, WK. (2000). Storage Management of a Historical Web Warehousing System. In: Ibrahim, M., Küng, J., Revell, N. (eds) Database and Expert Systems Applications. DEXA 2000. Lecture Notes in Computer Science, vol 1873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44469-6_43

Download citation

  • DOI: https://doi.org/10.1007/3-540-44469-6_43

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67978-3

  • Online ISBN: 978-3-540-44469-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics