Skip to main content
Log in

Incremental click-stream tree model: Learning from new users for web page prediction

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Predicting the next request of a user has gained importance as Web-based activity increases in order to guide Web users during their visits to Web sites. Previously proposed methods for recommendation use data collected over time in order to extract usage patterns. However, these patterns may change over time, because each day new log entries are added to the database and old entries are deleted. Thus, over time it is highly desirable to perform the update of the recommendation model incrementally. In this paper, we propose a new model for modeling and predicting Web user sessions which attempt to reduce the online recommendation time while retaining predictive accuracy. Since it is very easy to modify the model, it is updated during the recommendation process. The incremental algorithm yields a better prediction accuracy as well as a shorter online recommendation time. A performance evaluation of Incremental Click-Stream Tree model over two different Web server access logs indicate that the proposed incremental model yields significant speed-up of recommendation time and improvement of the prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. R. Agrawal and R. Srikant, “Effective prediction of web-user accesses: A data mining approach,” in : Proc. of the International Conference on Data Engineering (ICDE), Taipei, Taiwan, 1995.

  2. C.C. Aggarwal, J.L. Wolf, and P.S. Yu, “Caching on the world wide web,” IEEE Transactions on Knowledge and Data Enginnering, Vol. 11, No.1, 1999, pp. 95–107.

    Google Scholar 

  3. C.R. Anderson, P. Domingos, and D.S. Weld, “Relational markov models and their application to adaptive web navigation,” in Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 2002, pp. 143–152.

  4. A. Banerjee and J. Ghosh, “Clickstream clustering using weighted longest common subsequences,” in Proc. of the Wokshop on Web Mining, SIAM Conference on Data Mining, Chicago, IL, USA, 2001, pp. 33–40.

  5. J.S. Breese, D. Heckerman, and C. Kadie, “Empirical analysis of predictive algorithms for collaborative filtering,” in Proc. of the Fourteenth Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, USA, 1998, pp. 43–52.

  6. B. Berendt, “Web usage mining, site semantics, and the support of navigation,” in : Proc. of the Web Mining for e-Commenrce—Challengs and Opportinities Workshop (WEBKDD'00), Boston, MA, USA, August 2000.

  7. B. Berendt, “Understanding web usage at different levels of abstraction: Coarsening and visualizing sequences,” in: Proc. of the Mining Log Data Across All Customer TouchPoints Workshop (WEBKDD'01), San Francisco, CA, USA, August 2001.

  8. B. Berendt and M. Spiliopoulou, “Analysis of navigation behaviour in web sites integrating multiple information systems,” VLDB Journal, vol. 9, no. 1, 2000 (special issue on ”Databases and the Web”), pp. 56–75.

    Article  Google Scholar 

  9. S. Brin, and L. Pagepp, “The anatomy of large-scale hypertextual web search engine,” in: Proc. Int. World Wide Web Conference, 1998, pp. 107–117.

  10. K. Cahrter, J. Schaeffer, and D. Szafron, “Sequence alignment using fastlsa,” in: Proc. Int. Conf. on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS'2000) 2000, pp. 239–245.

  11. D. Cosley, S. Lawrence, and D.M. Pennock, “REFEREE: An open framework for practical testing of recommender systems using ResearchIndex,” in Proc. of 28th International Conference on Very Large Databases, VLDB 2002, Hong Kong, 2002.

  12. H. Dai and B. Mobasher, “Using ontologies to discover domain-level web usage profiles,” in: Proc. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, Helsinki, Finland, August 2002.

  13. A. Demiriz, “webSPADE: A parallel sequence mining algorithm to analyze the web log data,” in Proc. of The 2002 IEEE International Conference on Data Mining (ICDM '02), Maebashi City, Japan, 2002, pp. 755–759.

  14. A. P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of Royal Statistical Society, vol. 39, no. 1, 1977, pp. 1–38.

    MathSciNet  Google Scholar 

  15. M. Deshpande, and G. Karypis, “Selective markov models for predicting web-page accesses,” in Proc. of the First SIAM International Conference on Data Mining (SDM'2001), Chicago, IL, USA, 2001.

  16. C. Ding, X. He, H. Zha, M. Gu, and H. Simon, “Spectral min-max cut for graph partitioning and data clustering,” Technical Report TR-2001-XX, Lawrence Berkeley National Laboratory, University of California Berkeley, CA., 2001.

  17. O. Etzioni, “The world wide web: Quagmire or gold mine,” Communications of the ACM, vol. 39, no. 11, 1996, pp. 65–68.

    Article  Google Scholar 

  18. E. Frias-Martinez and V. Karamcheti, “A predicition model for user access sequences,” in Proc. International WEBKDD Workshop – Web Mining for Usage Patterns and User Profiles, 2002, Edmonton, Canada.

  19. Ş. Gündüz and M.T. Özsu, “A user interest model for web page navigation,” in Proc. of International Workshop on Data Mining for Actionable Knowledge (DMAK), Seoul, Korea 2003, pp. 46–57.

  20. Ş. Gündüz and M.T. Özsu, “A web page prediction model based on click-stream tree representation of user behavior,” in: Proc. Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'03, Washinton DC, USA, 2003, pp. 535–540, .

  21. Ş. Gündüz and M.T. Özsu, “A poisson model for user accesses to web pages,” in: Proc. Eighteenth International Symposium on Computer and Information Sciences, ISCIS'03, Lecture Notes in Computer Science, vol. 2869, (Springer-Berlin 2003), pp. 332–339.

  22. Ş. Gündüz Öğüdücü and A.Ş. Uyar, “A graph based clustering method using a hybrid evolutionary algorithm,” WSEAS Transactions on Mathematics, vol. 3, no. 3, 2004, pp. 731–736

    MathSciNet  Google Scholar 

  23. X. Huang, A. An, N. Cercone, and G. Promhouse, “Discovery of interesting association rules from livelink web log data, in Proc. of The 2002 IEEE International Conference on Data Mining (ICDM'02), Maebashi City, Japan 2002, pp. 763–766.

  24. R. Kosala and H. Blockeel, “Web mining research: A survey,” ACM SIGKDD Explorations, vol .2, no. 1, 2000, pp. 1–15.

    Google Scholar 

  25. J.Li and O. R. Zaïane, “Combining usage, content, and structure data to improve web site recommendation,” 5th International Conference on Electronic Commerce and Web Technologies (EC-Web 04), Springer Verlag LNCS 3182, Zaragoza, Spain, August 30- September 3, 2004, pp. 305–315.

  26. S.K. Madria, S.S. Bhowmick, W.K. Ng, and E.-P. Lim, “Research issues in web data mining,” in: Proc. 1st Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK'99), November, 1999, pp. 303–312.

  27. B. Mobasher, H. Dai,T. Luo, and M. Nakagawa, “Discovery of aggregate usage profiles for web personalization,” in Proc. of the Web Mining for E-Commerce Workshop (WebKDD'2000), Boston, MA, USA, 2000.

  28. B. Mobasher, H. Dai,T. Luo, and M. Nakagawa, “Effective personalization based on association rule discovery from web usage data,” in Proc. of the 3rd ACM Workhop on Web Information and Data Management, Atlanta, GA, USA, 2001, pp. 9–15.

  29. B. Mobasher, H. Dai, T. Luo, Y. Sun, and J. Zhu, “Combining web usage and content mining for more effective personalization,” in: Proc. of the Intl. Conf. on ECommerce and Web Technologies (ECWeb), Greenwich, UK, September 2000.

  30. M. Nakagawa and B. Mobasher, “A hybrid web personalization model based on site connectivity,” in: Proc. In Proceedings of the WebKDD Workshop at the ACM SIGKKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 2003.

  31. A. Nanopoulos, D. Katsaros, and Y. Manolopoulos, “Effective prediction of web-user accesses: A data mining approach,” in Proc. of the WEBKDD Workshop, San Francisco, CA, USA, 2001.

  32. O. Nasraoui, R. Krishnapuram, and A. Joshi, “Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator,” in Proc. of Eight International World Wide Web Conference, Toronto, Canada, 1999.

  33. P. Pirolli, J. Pitkow, and R. Rao, “Silk from a sow's ear: Extracting usable structures from the web,” in: Proc. of CHI'96, 1996, pp.118–125.

  34. J. Pitkow, and P.Pirolli, “Mining longest repeating subsequences to predict world wide web surfing,” in: Proc. USENIX Symp. on Internet Technologies and Systems (USITS'99), 1999.

  35. R.R. Sarukkai, “Link prediction and path analysis using markov chains,” in Proc. of the Ninth International World Wide Web Conference, Amsterdam, Holland, 2000.

  36. B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Riedl, “Application of dimensionality reduction in recommender system – A case study,” in Proc. of the WEBKDD 2000 Workshop at the ACM SIGKDD 2000, Boston, MA, USA, 2000.

  37. C. Shahabi, A. Zarkesh, J. Adibi, and V. Shah, “Knowledge discovery from users web-page navigation,” in: Proc. 7th Int. Workshop on Research Issues in Data Engineering, 1997, pp. 20–29.

  38. J. Shim, P. Scheuermann, and R. Vingralek, “Proxy cache algorithms: Design, implementation and performance,” IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 4, 1999, pp. 549–562.

    Article  Google Scholar 

  39. R. Srikant and R. Agrawal, “Mining generalized association rules,” Future Generation Computer Systems, vol. 13, no. (2–3), 1997, pp. 161–180.

    Google Scholar 

  40. J. Srivastava, R. Cooley, M. Deshpande, and P. N. Tan, “Web usage mining: Discovery and application of usage patterns from web data,” ACM SIGKDD Explorations, vol. 1, no. 2, 2000, pp. 12–23.

    Google Scholar 

  41. A. Ş. Uyar and Ş. Gündüz Öğüdücü, “A new graph-based evolutionary approach to sequence clustering,” to be appear in: Proc. of the Fourth International Conference on Machine Learning and Applications (ICMLA'05), 2005.

  42. W. Wang and O. R. Zaïane, “Clustering web sessions by sequence alignment,” in Proc. of 13th International Workshop on Database and Expert Systems Applications, DEXA'02, Aix en Provence, France, 2002.

  43. Q. Yang, H.H. Zhang, and I.T. Yi Li, “Mining web logs for prediction models in WWW caching and prefetching,” in Proc. of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, USA, 2001, pp. 473–478.

  44. NASA Kennedy Space Center Log, http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html.

  45. ClarkNet WWW Server Log, http://ita.ee.lbl.gov/html/contrib/ClarkNet-HTTP.html.

  46. Cluto, http://www-users.cs.umn.edu/~karypis/cluto/index.html.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ş. G. ÖĞÜdÜcÜ.

Rights and permissions

Reprints and permissions

About this article

Cite this article

ÖĞÜdÜcÜ, Ş.G., Özsu, M.T. Incremental click-stream tree model: Learning from new users for web page prediction. Distrib Parallel Databases 19, 5–27 (2006). https://doi.org/10.1007/s10619-006-6284-1

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-006-6284-1

Keywords

Navigation