Skip to main content
Log in

Prediction of Web Page Accesses by Proxy Server Log

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

As the population of web users grows, the variety of user behaviors on accessing information also grows, which has a great impact on the network utilization. Recently, many efforts have been made to analyze user behaviors on the WWW. In this paper, we represent user behaviors by sequences of consecutive web page accesses, derived from the access log of a proxy server. Moreover, the frequent sequences are discovered and organized as an index. Based on the index, we propose a scheme for predicting user requests and a proxy-based framework for prefetching web pages. We perform experiments on real data. The results show that our approach makes the predictions with a high degree of accuracy with little overhead. In the experiments, the best hit ratio of the prediction achieves 75.69%, while the longest time to make a prediction only requires 2.3 ms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of VLDB Conference, 1994, pp. 487–499.

  2. R. Agrawal and R. Srikant, “Mining sequential patterns,” in Proceedings of IEEE Conference on Data Engineering, 1995, pp. 3–14.

  3. V. Almeida, A. Bestavros, M. Crovella, and A. Oliveira, “Characterizing reference locality in the WWW,” in Proceedings of IEEE Conference on Parallel and Distributed Information Systems, 1996, pp. 92–103.

  4. A. Belloum and L. O. Hertzberger, “Scalable federation of Web cache servers,” World Wide Web 4, 2001, 255–275.

    Google Scholar 

  5. P. Berkhin, J. D. Becher, and D. J. Randall, “Interactive path analysis of Web site traffic,” in Proceedings of ACM SIGKDD Conference, 2001, pp. 414–419.

  6. A. Bestavros, “Speculative data dissemination and service to reduce server load, network traffic and service time for distributed information systems,” in Proceedings of IEEE Conference on Data Engineering, 1996, pp. 180–187.

  7. A. Büchner and M. D. Mulvenna, “Discovering Internet marketing intelligence through online analytical Web usage mining,” in ACM SIGMOD Record 27(4), December 1998, 54–61.

    Google Scholar 

  8. M. S. Chen, J. S. Park, and P. S. Yu, “Efficient data mining for path traversal patterns,” IEEE Transactions on Knowledge and Data Engineering 10(2), March/April 1998, 209–220.

    Google Scholar 

  9. M. Crovella and P. Barford, “The network effects of prefetching,” in Proceedings of IEEE INFOCOM Conference, 1998.

  10. M. Crovella and A. Bestavros, “Self-similarity in World Wide Web traffic: Evidence and possible causes,” in Proceedings of ACM SIGMETRICS Conference, May 1996.

  11. C. R. Cunha and C. F. B. Jaccoud, “Determining WWW user's next access and its applications to prefetching,” in Proceedings of IEEE International Symposium on Computers and Communications, July 1997, pp. 1–3.

  12. J. Griffioen and R. Appleton, “Automatic prefetching in a WAN,” in Proceedings of IEEE Workshop on Advances in Parallel and Distributed Systems, 1993.

  13. T. Joachims, D. Freitag, and T. Mitchell, “WebWatcher: A tour guide for the World Wide Web,” in Proceedings of International Joint Conference on Artificial Intelligence, August 1997.

  14. R. P. Klemm, “WebCompanion: A friendly client-side Web prefetching agent,” IEEE Transactions on Knowledge and Data Engineering 11(4), July/August 1999, 577–594.

    Google Scholar 

  15. A. Kraiss and G. Weikum, “Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions,” VLDB Journal 7, 1998, 141–162.

    Google Scholar 

  16. F. Masseglia, P. Poncelet, and M. Teisseire, “Using data mining techniques on Web access logs to dynamically improve hypertext structure,” ACM SIGWEB Newsletter 8(3), October 1999, 13–19.

    Google Scholar 

  17. B. Mobasher, R. Cooley, and J. Srivastava, “Automatic personalization based on Web usage mining,” Communications of the ACM 43(8), August 2000, 142–151.

    Google Scholar 

  18. J. S. Park, M. S. Chen, and P. S. Yu, “An effective hash based algorithm for mining association rules,” in Proceedings of ACM SIGMOD Conference, 1995, pp. 175–186.

  19. M. Perkowitz and O. Etzioni, “Adaptive Web sites,” Communications of the ACM 43(8), August 2000, 152–158.

    Google Scholar 

  20. C. Shahabi, A. M. Zarkesh, J. Adibi, and V. Shah, “Knowledge discovery from user Web-page navigation,” in Proceedings of Workshop on Research Issues in Data Engineering, 1997, pp. 20–29.

  21. M. Spiliopoulou, “Web usage mining for Web site evaluation,” Communications of the ACM 43(8), August 2000, 127–134.

    Google Scholar 

  22. J. Srivastava, R. Cooley, M. Deshpande, and P. N. Tan, “Web usage mining: Discovery and applications of usage patterns from Web data,” SIGKDD Explorations 1(2), 2000, 12–23.

    Google Scholar 

  23. A. Vakali, “Proxy cache replacement algorithms: A history-based approach,” World Wide Web 4, 2001, 277–297.

    Google Scholar 

  24. K. Wang, ”Discovering patterns from large and dynamic sequential,” Journal of Intelligent Information Systems 9, 1997, 33–56.

    Google Scholar 

  25. Y. H. Wu, Y. H. Chen, and A. L. P. Chen, “Querying and browsing the resources in Internet,” in Proceedings of International Computer Symposium, 1996, pp. 9–16.

  26. T. W. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal, “From user access patterns to dynamic hypertext linking,” in Proceedings of International WWW Conference, May 1996.

  27. Q. Yang and H. H. Zhang, “Integrating Web prefetching and caching using prediction models,” World Wide Web 4, 2001, 299–321.

    Google Scholar 

  28. S. J. Yen and A. L. P. Chen, “An efficient approach to discovering knowledge from large databases,” in Proceedings of International Conference on Parallel and Distributed Information Systems, 1995, pp. 8–18.

  29. O. R. Zaïane, M. Xin, and J. W. Han, “Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs,” in Proceedings of IEEE Conference on Advances in Digital Libraries, 1998, pp. 19–29.

  30. A. M. Zarkesh, J. Adibi, C. Shahabi et al., “Analysis and design of server informative WWW-sites,” in Proceedings of ACM Conference on Information and Knowledge Management, 1997, pp. 254–261.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, YH., Chen, A.L.P. Prediction of Web Page Accesses by Proxy Server Log. World Wide Web 5, 67–88 (2002). https://doi.org/10.1023/A:1015750423727

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1015750423727

Navigation