Abstract
As the population of web users grows, the variety of user behaviors on accessing information also grows, which has a great impact on the network utilization. Recently, many efforts have been made to analyze user behaviors on the WWW. In this paper, we represent user behaviors by sequences of consecutive web page accesses, derived from the access log of a proxy server. Moreover, the frequent sequences are discovered and organized as an index. Based on the index, we propose a scheme for predicting user requests and a proxy-based framework for prefetching web pages. We perform experiments on real data. The results show that our approach makes the predictions with a high degree of accuracy with little overhead. In the experiments, the best hit ratio of the prediction achieves 75.69%, while the longest time to make a prediction only requires 2.3 ms.
Similar content being viewed by others
References
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of VLDB Conference, 1994, pp. 487–499.
R. Agrawal and R. Srikant, “Mining sequential patterns,” in Proceedings of IEEE Conference on Data Engineering, 1995, pp. 3–14.
V. Almeida, A. Bestavros, M. Crovella, and A. Oliveira, “Characterizing reference locality in the WWW,” in Proceedings of IEEE Conference on Parallel and Distributed Information Systems, 1996, pp. 92–103.
A. Belloum and L. O. Hertzberger, “Scalable federation of Web cache servers,” World Wide Web 4, 2001, 255–275.
P. Berkhin, J. D. Becher, and D. J. Randall, “Interactive path analysis of Web site traffic,” in Proceedings of ACM SIGKDD Conference, 2001, pp. 414–419.
A. Bestavros, “Speculative data dissemination and service to reduce server load, network traffic and service time for distributed information systems,” in Proceedings of IEEE Conference on Data Engineering, 1996, pp. 180–187.
A. Büchner and M. D. Mulvenna, “Discovering Internet marketing intelligence through online analytical Web usage mining,” in ACM SIGMOD Record 27(4), December 1998, 54–61.
M. S. Chen, J. S. Park, and P. S. Yu, “Efficient data mining for path traversal patterns,” IEEE Transactions on Knowledge and Data Engineering 10(2), March/April 1998, 209–220.
M. Crovella and P. Barford, “The network effects of prefetching,” in Proceedings of IEEE INFOCOM Conference, 1998.
M. Crovella and A. Bestavros, “Self-similarity in World Wide Web traffic: Evidence and possible causes,” in Proceedings of ACM SIGMETRICS Conference, May 1996.
C. R. Cunha and C. F. B. Jaccoud, “Determining WWW user's next access and its applications to prefetching,” in Proceedings of IEEE International Symposium on Computers and Communications, July 1997, pp. 1–3.
J. Griffioen and R. Appleton, “Automatic prefetching in a WAN,” in Proceedings of IEEE Workshop on Advances in Parallel and Distributed Systems, 1993.
T. Joachims, D. Freitag, and T. Mitchell, “WebWatcher: A tour guide for the World Wide Web,” in Proceedings of International Joint Conference on Artificial Intelligence, August 1997.
R. P. Klemm, “WebCompanion: A friendly client-side Web prefetching agent,” IEEE Transactions on Knowledge and Data Engineering 11(4), July/August 1999, 577–594.
A. Kraiss and G. Weikum, “Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions,” VLDB Journal 7, 1998, 141–162.
F. Masseglia, P. Poncelet, and M. Teisseire, “Using data mining techniques on Web access logs to dynamically improve hypertext structure,” ACM SIGWEB Newsletter 8(3), October 1999, 13–19.
B. Mobasher, R. Cooley, and J. Srivastava, “Automatic personalization based on Web usage mining,” Communications of the ACM 43(8), August 2000, 142–151.
J. S. Park, M. S. Chen, and P. S. Yu, “An effective hash based algorithm for mining association rules,” in Proceedings of ACM SIGMOD Conference, 1995, pp. 175–186.
M. Perkowitz and O. Etzioni, “Adaptive Web sites,” Communications of the ACM 43(8), August 2000, 152–158.
C. Shahabi, A. M. Zarkesh, J. Adibi, and V. Shah, “Knowledge discovery from user Web-page navigation,” in Proceedings of Workshop on Research Issues in Data Engineering, 1997, pp. 20–29.
M. Spiliopoulou, “Web usage mining for Web site evaluation,” Communications of the ACM 43(8), August 2000, 127–134.
J. Srivastava, R. Cooley, M. Deshpande, and P. N. Tan, “Web usage mining: Discovery and applications of usage patterns from Web data,” SIGKDD Explorations 1(2), 2000, 12–23.
A. Vakali, “Proxy cache replacement algorithms: A history-based approach,” World Wide Web 4, 2001, 277–297.
K. Wang, ”Discovering patterns from large and dynamic sequential,” Journal of Intelligent Information Systems 9, 1997, 33–56.
Y. H. Wu, Y. H. Chen, and A. L. P. Chen, “Querying and browsing the resources in Internet,” in Proceedings of International Computer Symposium, 1996, pp. 9–16.
T. W. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal, “From user access patterns to dynamic hypertext linking,” in Proceedings of International WWW Conference, May 1996.
Q. Yang and H. H. Zhang, “Integrating Web prefetching and caching using prediction models,” World Wide Web 4, 2001, 299–321.
S. J. Yen and A. L. P. Chen, “An efficient approach to discovering knowledge from large databases,” in Proceedings of International Conference on Parallel and Distributed Information Systems, 1995, pp. 8–18.
O. R. Zaïane, M. Xin, and J. W. Han, “Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs,” in Proceedings of IEEE Conference on Advances in Digital Libraries, 1998, pp. 19–29.
A. M. Zarkesh, J. Adibi, C. Shahabi et al., “Analysis and design of server informative WWW-sites,” in Proceedings of ACM Conference on Information and Knowledge Management, 1997, pp. 254–261.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wu, YH., Chen, A.L.P. Prediction of Web Page Accesses by Proxy Server Log. World Wide Web 5, 67–88 (2002). https://doi.org/10.1023/A:1015750423727
Issue Date:
DOI: https://doi.org/10.1023/A:1015750423727