Incremental click-stream tree model: Learning from new users for web page prediction

ÖĞÜdÜcÜ, Ş. G.; Özsu, M. Tamer

doi:10.1007/s10619-006-6284-1

Incremental click-stream tree model: Learning from new users for web page prediction

Published: January 2006

Volume 19, pages 5–27, (2006)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Ş. G. ÖĞÜdÜcÜ¹ &
M. Tamer Özsu²

223 Accesses
11 Citations
3 Altmetric
Explore all metrics

Abstract

Predicting the next request of a user has gained importance as Web-based activity increases in order to guide Web users during their visits to Web sites. Previously proposed methods for recommendation use data collected over time in order to extract usage patterns. However, these patterns may change over time, because each day new log entries are added to the database and old entries are deleted. Thus, over time it is highly desirable to perform the update of the recommendation model incrementally. In this paper, we propose a new model for modeling and predicting Web user sessions which attempt to reduce the online recommendation time while retaining predictive accuracy. Since it is very easy to modify the model, it is updated during the recommendation process. The incremental algorithm yields a better prediction accuracy as well as a shorter online recommendation time. A performance evaluation of Incremental Click-Stream Tree model over two different Web server access logs indicate that the proposed incremental model yields significant speed-up of recommendation time and improvement of the prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Developing Transferable Clickstream Analytic Models Using Sequential Pattern Evaluation Indices

WBPL: An Open-Source Library for Predicting Web Surfing Behaviors

Web Page Recommendations Based Web Navigation Prediction

References

R. Agrawal and R. Srikant, “Effective prediction of web-user accesses: A data mining approach,” in : Proc. of the International Conference on Data Engineering (ICDE), Taipei, Taiwan, 1995.
C.C. Aggarwal, J.L. Wolf, and P.S. Yu, “Caching on the world wide web,” IEEE Transactions on Knowledge and Data Enginnering, Vol. 11, No.1, 1999, pp. 95–107.
Google Scholar
C.R. Anderson, P. Domingos, and D.S. Weld, “Relational markov models and their application to adaptive web navigation,” in Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 2002, pp. 143–152.
A. Banerjee and J. Ghosh, “Clickstream clustering using weighted longest common subsequences,” in Proc. of the Wokshop on Web Mining, SIAM Conference on Data Mining, Chicago, IL, USA, 2001, pp. 33–40.
J.S. Breese, D. Heckerman, and C. Kadie, “Empirical analysis of predictive algorithms for collaborative filtering,” in Proc. of the Fourteenth Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, USA, 1998, pp. 43–52.
B. Berendt, “Web usage mining, site semantics, and the support of navigation,” in : Proc. of the Web Mining for e-Commenrce—Challengs and Opportinities Workshop (WEBKDD'00), Boston, MA, USA, August 2000.
B. Berendt, “Understanding web usage at different levels of abstraction: Coarsening and visualizing sequences,” in: Proc. of the Mining Log Data Across All Customer TouchPoints Workshop (WEBKDD'01), San Francisco, CA, USA, August 2001.
B. Berendt and M. Spiliopoulou, “Analysis of navigation behaviour in web sites integrating multiple information systems,” VLDB Journal, vol. 9, no. 1, 2000 (special issue on ”Databases and the Web”), pp. 56–75.
Article Google Scholar
S. Brin, and L. Pagepp, “The anatomy of large-scale hypertextual web search engine,” in: Proc. Int. World Wide Web Conference, 1998, pp. 107–117.
K. Cahrter, J. Schaeffer, and D. Szafron, “Sequence alignment using fastlsa,” in: Proc. Int. Conf. on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS'2000) 2000, pp. 239–245.
D. Cosley, S. Lawrence, and D.M. Pennock, “REFEREE: An open framework for practical testing of recommender systems using ResearchIndex,” in Proc. of 28th International Conference on Very Large Databases, VLDB 2002, Hong Kong, 2002.
H. Dai and B. Mobasher, “Using ontologies to discover domain-level web usage profiles,” in: Proc. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, Helsinki, Finland, August 2002.
A. Demiriz, “webSPADE: A parallel sequence mining algorithm to analyze the web log data,” in Proc. of The 2002 IEEE International Conference on Data Mining (ICDM '02), Maebashi City, Japan, 2002, pp. 755–759.
A. P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of Royal Statistical Society, vol. 39, no. 1, 1977, pp. 1–38.
MathSciNet Google Scholar
M. Deshpande, and G. Karypis, “Selective markov models for predicting web-page accesses,” in Proc. of the First SIAM International Conference on Data Mining (SDM'2001), Chicago, IL, USA, 2001.
C. Ding, X. He, H. Zha, M. Gu, and H. Simon, “Spectral min-max cut for graph partitioning and data clustering,” Technical Report TR-2001-XX, Lawrence Berkeley National Laboratory, University of California Berkeley, CA., 2001.
O. Etzioni, “The world wide web: Quagmire or gold mine,” Communications of the ACM, vol. 39, no. 11, 1996, pp. 65–68.
Article Google Scholar
E. Frias-Martinez and V. Karamcheti, “A predicition model for user access sequences,” in Proc. International WEBKDD Workshop – Web Mining for Usage Patterns and User Profiles, 2002, Edmonton, Canada.
Ş. Gündüz and M.T. Özsu, “A user interest model for web page navigation,” in Proc. of International Workshop on Data Mining for Actionable Knowledge (DMAK), Seoul, Korea 2003, pp. 46–57.
Ş. Gündüz and M.T. Özsu, “A web page prediction model based on click-stream tree representation of user behavior,” in: Proc. Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'03, Washinton DC, USA, 2003, pp. 535–540, .
Ş. Gündüz and M.T. Özsu, “A poisson model for user accesses to web pages,” in: Proc. Eighteenth International Symposium on Computer and Information Sciences, ISCIS'03, Lecture Notes in Computer Science, vol. 2869, (Springer-Berlin 2003), pp. 332–339.
Ş. Gündüz Öğüdücü and A.Ş. Uyar, “A graph based clustering method using a hybrid evolutionary algorithm,” WSEAS Transactions on Mathematics, vol. 3, no. 3, 2004, pp. 731–736
MathSciNet Google Scholar
X. Huang, A. An, N. Cercone, and G. Promhouse, “Discovery of interesting association rules from livelink web log data, in Proc. of The 2002 IEEE International Conference on Data Mining (ICDM'02), Maebashi City, Japan 2002, pp. 763–766.
R. Kosala and H. Blockeel, “Web mining research: A survey,” ACM SIGKDD Explorations, vol .2, no. 1, 2000, pp. 1–15.
Google Scholar
J.Li and O. R. Zaïane, “Combining usage, content, and structure data to improve web site recommendation,” 5th International Conference on Electronic Commerce and Web Technologies (EC-Web 04), Springer Verlag LNCS 3182, Zaragoza, Spain, August 30- September 3, 2004, pp. 305–315.
S.K. Madria, S.S. Bhowmick, W.K. Ng, and E.-P. Lim, “Research issues in web data mining,” in: Proc. 1st Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK'99), November, 1999, pp. 303–312.
B. Mobasher, H. Dai,T. Luo, and M. Nakagawa, “Discovery of aggregate usage profiles for web personalization,” in Proc. of the Web Mining for E-Commerce Workshop (WebKDD'2000), Boston, MA, USA, 2000.
B. Mobasher, H. Dai,T. Luo, and M. Nakagawa, “Effective personalization based on association rule discovery from web usage data,” in Proc. of the 3rd ACM Workhop on Web Information and Data Management, Atlanta, GA, USA, 2001, pp. 9–15.
B. Mobasher, H. Dai, T. Luo, Y. Sun, and J. Zhu, “Combining web usage and content mining for more effective personalization,” in: Proc. of the Intl. Conf. on ECommerce and Web Technologies (ECWeb), Greenwich, UK, September 2000.
M. Nakagawa and B. Mobasher, “A hybrid web personalization model based on site connectivity,” in: Proc. In Proceedings of the WebKDD Workshop at the ACM SIGKKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 2003.
A. Nanopoulos, D. Katsaros, and Y. Manolopoulos, “Effective prediction of web-user accesses: A data mining approach,” in Proc. of the WEBKDD Workshop, San Francisco, CA, USA, 2001.
O. Nasraoui, R. Krishnapuram, and A. Joshi, “Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator,” in Proc. of Eight International World Wide Web Conference, Toronto, Canada, 1999.
P. Pirolli, J. Pitkow, and R. Rao, “Silk from a sow's ear: Extracting usable structures from the web,” in: Proc. of CHI'96, 1996, pp.118–125.
J. Pitkow, and P.Pirolli, “Mining longest repeating subsequences to predict world wide web surfing,” in: Proc. USENIX Symp. on Internet Technologies and Systems (USITS'99), 1999.
R.R. Sarukkai, “Link prediction and path analysis using markov chains,” in Proc. of the Ninth International World Wide Web Conference, Amsterdam, Holland, 2000.
B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Riedl, “Application of dimensionality reduction in recommender system – A case study,” in Proc. of the WEBKDD 2000 Workshop at the ACM SIGKDD 2000, Boston, MA, USA, 2000.
C. Shahabi, A. Zarkesh, J. Adibi, and V. Shah, “Knowledge discovery from users web-page navigation,” in: Proc. 7th Int. Workshop on Research Issues in Data Engineering, 1997, pp. 20–29.
J. Shim, P. Scheuermann, and R. Vingralek, “Proxy cache algorithms: Design, implementation and performance,” IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 4, 1999, pp. 549–562.
Article Google Scholar
R. Srikant and R. Agrawal, “Mining generalized association rules,” Future Generation Computer Systems, vol. 13, no. (2–3), 1997, pp. 161–180.
Google Scholar
J. Srivastava, R. Cooley, M. Deshpande, and P. N. Tan, “Web usage mining: Discovery and application of usage patterns from web data,” ACM SIGKDD Explorations, vol. 1, no. 2, 2000, pp. 12–23.
Google Scholar
A. Ş. Uyar and Ş. Gündüz Öğüdücü, “A new graph-based evolutionary approach to sequence clustering,” to be appear in: Proc. of the Fourth International Conference on Machine Learning and Applications (ICMLA'05), 2005.
W. Wang and O. R. Zaïane, “Clustering web sessions by sequence alignment,” in Proc. of 13th International Workshop on Database and Expert Systems Applications, DEXA'02, Aix en Provence, France, 2002.
Q. Yang, H.H. Zhang, and I.T. Yi Li, “Mining web logs for prediction models in WWW caching and prefetching,” in Proc. of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, USA, 2001, pp. 473–478.
NASA Kennedy Space Center Log, http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html.
ClarkNet WWW Server Log, http://ita.ee.lbl.gov/html/contrib/ClarkNet-HTTP.html.
Cluto, http://www-users.cs.umn.edu/~karypis/cluto/index.html.

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Istanbul Technical University, Istanbul, Turkey, 34390
Ş. G. ÖĞÜdÜcÜ
School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1
M. Tamer Özsu

Authors

Ş. G. ÖĞÜdÜcÜ
View author publications
You can also search for this author in PubMed Google Scholar
M. Tamer Özsu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ş. G. ÖĞÜdÜcÜ.

Rights and permissions

Reprints and permissions

About this article

Cite this article

ÖĞÜdÜcÜ, Ş.G., Özsu, M.T. Incremental click-stream tree model: Learning from new users for web page prediction. Distrib Parallel Databases 19, 5–27 (2006). https://doi.org/10.1007/s10619-006-6284-1

Download citation

Issue Date: January 2006
DOI: https://doi.org/10.1007/s10619-006-6284-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental click-stream tree model: Learning from new users for web page prediction

Abstract

Access this article

Similar content being viewed by others

Developing Transferable Clickstream Analytic Models Using Sequential Pattern Evaluation Indices

WBPL: An Open-Source Library for Predicting Web Surfing Behaviors

Web Page Recommendations Based Web Navigation Prediction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation