A Data Cube Model for Prediction-Based Web Prefetching

Yang, Qiang; Huang, Joshua Zhexue; Ng, Michael

doi:10.1023/A:1020990805004

A Data Cube Model for Prediction-Based Web Prefetching

Published: January 2003

Volume 20, pages 11–30, (2003)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Qiang Yang¹,
Joshua Zhexue Huang² &
Michael Ng³

149 Accesses
28 Citations
Explore all metrics

Abstract

Reducing the web latency is one of the primary concerns of Internet research. Web caching and web prefetching are two effective techniques to latency reduction. A primary method for intelligent prefetching is to rank potential web documents based on prediction models that are trained on the past web server and proxy server log data, and to prefetch the highly ranked objects. For this method to work well, the prediction model must be updated constantly, and different queries must be answered efficiently. In this paper we present a data-cube model to represent Web access sessions for data mining for supporting the prediction model construction. The cube model organizes session data into three dimensions. With the data cube in place, we apply efficient data mining algorithms for clustering and correlation analysis. As a result of the analysis, the web page clusters can then be used to guide the prefetching system. In this paper, we propose an integrated web-caching and web-prefetching model, where the issues of prefetching aggressiveness, replacement policy and increased network traffic are addressed together in an integrated framework. The core of our integrated solution is a prediction model based on statistical correlation between web objects. This model can be frequently updated by querying the data cube of web server logs. This integrated data cube and prediction based prefetching framework represents a first such effort in our knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Almeida, V., Bestavros, A., Crovella, M., and Oliveira, A. (1996). Characterizing Reference Locality in theWWW. In Proceedings of the International Conference in Parallel and Distributed Information Systems, Miami Beach, FL, pp. 92–103.
Arlitt, M. and Williamson, C. (1996). Web Server Workload Characterization: The Search for Invariants. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems.
Bestavros, A., Cunha, C., and Crovella, M. (1995). Characteristics of WWW Client-Based Traces. Technical Report, Boston University.
Cao, P. and Irani, S. (1997). Cost-Aware WWW Proxy Caching Algorithms. In USENIX Symposium on Internet Technologies and Systems, Monterey, CA.
Cherkasova,L. (1998). Improving WWW Proxies Performance with Greedy-Dual-Size-Frequency Caching Policy. In HP Technical Report, Palo Alto.
Cooley, R., Mobasher, B., and Srivastava, J. (1999). Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems, 1(1), 1–27.
Google Scholar
Duchamp, D. (1999). Prefetching Hyperlinks. In Proceedings of the Second USENIX Symposium on Internet Technologies and Systems, Boulder, CO.
Glassman, S. (1994). A Caching Relay for the World Wide Web. In The first International World Wide Web Conferencing, Geneva, Switzerland.
Huang, Z. (1998). Extensions to the k-means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery, 2(3), 283–304.
Google Scholar
Jain, A.K. and Dubes, R.C. (1988). Algorithms for Clustering Data. Prentice Hall.
Kimball, R. and Merx, R. (2000). The Data Webhouse Toolkit–Building Web-Enabled Data Warehouse. Wiley Computer Publishing.
Markatos, E. and Chironaki, C. (1998). A Top 10 Approach for Prefetching the Web. In Proceedings of INET'98 Conference, Geneva, Switzerland.
Nasraoui, O., Frigui, H., Joshi, A., and Krishnapuram, R. (1999). Mining Web Access Logs Using Relational Competitive Fuzzy Clustering. In Proceedings of the Eight International Fuzzy Systems Association Congress.
Padmanabhan, V. and Mogul, J. (1996). Using Predictive Prefetching to Improve World Wide Web Latency. Computer Communication Review, 26(3), 22–36.
Google Scholar
Palpanas, T. and Mendelzon, A. (1999).Web Prefetching Using Partial Match Prediction. Web CachingWorkshop, San Diego, CA.
Shahabi, C., Faisal, A., Kashani, F.B., and Faruque, J. (2000). INSITE: A Tool for Real-Time Knowledge Discovery from Users Web Navigation. In Proceedings of VLDB2000, Cairo, Egypt.
Spiliopoulou, M. and Faulstich, L.C. (1998). WUM: A Web Utilization Miner. In EDBT Workshop WebDB98, Valencia, Spain, Springer.
Taha, T. (1991). Operations Research, 3rd edn., Collier Macmillan, N.Y., USA.
Google Scholar
Williams, S., Abrams, M., Standridge, C., Abdulla, G., and Fox, E. (1996). Removal Policies in Network Caches for World Wide Web Documents. In Proceedings of ACM SIGCOMM, Stanford, CA, pp. 293–305.
Wooster, R. and Abrams, M. (1997). Proxy Caching that Estimates Page Load Delays. In Proceedings of the Sixth International World Wide Web Conference, Santa Clara, CA, pp. 325–334.
Zaiane, O.R., Xin, M., and Han, J. (1998). Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs. In Proceedings of Advances in Digital Libraries Conference (ADL'98), Santa Barbara, CA, pp. 19–29.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong
Qiang Yang
E-Business Technology Institute, The University of Hong Kong, Hong Kong
Joshua Zhexue Huang
Department of Mathematics, The University of Hong Kong, Hong Kong
Michael Ng

Authors

Qiang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Zhexue Huang
View author publications
You can also search for this author in PubMed Google Scholar
Michael Ng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Ng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Q., Huang, J.Z. & Ng, M. A Data Cube Model for Prediction-Based Web Prefetching. Journal of Intelligent Information Systems 20, 11–30 (2003). https://doi.org/10.1023/A:1020990805004

Download citation

Issue Date: January 2003
DOI: https://doi.org/10.1023/A:1020990805004

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Data Cube Model for Prediction-Based Web Prefetching

Abstract

Access this article

Similar content being viewed by others

A Novel Approach for Semantic Prefetching Using Semantic Information and Semantic Association

A Novel Approach for Prefetching of Web Pages through Clustering of Web Users to Reduce the Web Latency

Reduction of Web Latency: An Integrated Proxy Prefetch-Cache System Framework

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

A Data Cube Model for Prediction-Based Web Prefetching

Abstract

Access this article

Similar content being viewed by others

A Novel Approach for Semantic Prefetching Using Semantic Information and Semantic Association

A Novel Approach for Prefetching of Web Pages through Clustering of Web Users to Reduce the Web Latency

Reduction of Web Latency: An Integrated Proxy Prefetch-Cache System Framework

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation