Abstract
In this paper, we present the storage management of the WHOWEDA web warehousing system, which warehouses historical web information. To facilitate inter-table and intra-table sharing of web pages, we propose a three-layer storage architecture, that consists of tuple, table, and pool layers of storage modules storing different parts of ware-housed web information. To improve retrieval efficiency, we have chosen to replicate some node attributes across web tables in the table layer while keeping only unique copies of web pages at the pool layer. The separation of table and pool layer storage also allows different valid times to be maintained by multiple web tables for the same web pages due to different schedules of global coupling across web tables. As the sharing of web pages may lead to valid time inconsistency between different web tables, we propose an update synchronization scheme to resolve the valid time differences on user request.
This work was supported in part by the Nanyang Technological University, Ministry of Education (Singapore) under Academic Research Fund #4-12034-5060, #4-12034-3012, #4-12034-6022. Any opinions, findings, and recommendations in this paper are those of the authors and do not reflect the views of the funding agencies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener. The Lorel query language for semistructured data. International Journal on Digital Libraries, 1(1):68–88, April 1997.
G. Arocena and A. Mendelzon. WebOQL: Restructuring documents, databases and webs. In Proceedings of ICDE/rs98, Orlando, Florida, February 1998.
G. Arocena, A. Mendelzon, and G. Mihaila. Applications of a web query language. In Proceedings of the 6th International WWW Conference, Santa Clara, April 1997.
P. Atzeni, G. Mecca, and P. Merialdo. To weave the web. In Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997.
P. Buneman, S. Davidson, and G. Hillebrand. A querying language and optimization techniques for unstructured data. In Proceedings of ACM SIGMOD Conference on Management of Data, pages 505–516, Montreal, Canada, 1996.
Y.Y. Cao, E.P. Lim, and W.K. Ng. On warehousing historical web information. Technical report, Centre for Advanced Information Systems, Nanyang Technological University, Singapore, September 1999.
J. Clifford and A. Croker. The historical relational data model (HRDM) and algebra based on lifespans. In Proceedings of the International Conference on Data Engineering, pages 528–537. IEEE Computer Society, February 1987.
M. Fernandez, D. Florescu, J. Kang, and A. Levy. Catching the boat with Strudel: Experiences with a web-site management system. In Proceedings of ACM SIGMOD Conference on Management of Data, Seattle, WA, 1998.
D. Florescu, A. Levy, and A. Mendelzon. Database techniques for the world-wide web: A survey. ACM SIGMOD Record, 27(3):59–74, September 1998.
R. Himmeroder, G. Lausen, B. Ludascher, and C. Schlepphorst. On a declarative semantics for web queries. In Proceedings of the 5th International Conference on Deductive and Object-Oriented Databases, Montreux, Switzerland, December 1997.
J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. WebBase: a repository of web pages. Technical report, Stanford University, 1999.
D. Konopnicki and O. Shmueli. W3QS: A query system for the world wide web. In Proceedings of the 21st VLDB Conference, Zurich, Switzerland, 1995.
L. V. S. Lakshmanan, F. Sadri, and L. N. Subramanian. A declarative language for querying and restructuring the web. In Proceedings of the 6th International Workshop on Research Issues in Data Engineering, RIDE’ 96, New Orleans, February 1996.
A. Mendelzon, G. Mihaila, and T. Milo. Querying the world wide web. International Journal on Digital Libraries, 1(1):54–67, April 1997.
S.B. Navathe and R. Ahmed. A temporal relational model and a query language. Information Sciences, 49(1–3):147–175, 1989.
W.-K. Ng, E.-P. Lim, C.-T Huang, S.S. Bhowmick, and F.-Q. Qin. Web warehousing: An algebra for web information. In Proceedings of IEEE International Conference on Advances in Digital Libraries (ADL’ 98), April 1998.
Richard Snodgrass. The temporal query language TQuel. ACM Transactions on Database Systems, 12(2):247–298, June 1987.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cao, Y., Lim, EP., Ng, WK. (2000). Storage Management of a Historical Web Warehousing System. In: Ibrahim, M., Küng, J., Revell, N. (eds) Database and Expert Systems Applications. DEXA 2000. Lecture Notes in Computer Science, vol 1873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44469-6_43
Download citation
DOI: https://doi.org/10.1007/3-540-44469-6_43
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67978-3
Online ISBN: 978-3-540-44469-5
eBook Packages: Springer Book Archive