Abstract
We present two methods for testing the predictive power of a variable length Markov chain induced from a collection of user web navigation sessions. The collection of sessions is split into a training and a test set. The first method uses a χ2 statistical test to measure the significance of the distance between the distribution of the probabilities assigned to the test trails by a Markov model build from the full collection of sessions and a model built from the training set. The statistical test measures the ability of the model to generalise its predictions to the unseen sessions from the test set. The second method evaluates the model ability to predict the last page of a navigation session based on the preceding pages viewed by recording the mean absolute error of the rank of the last occurring page among the predictions provided by the model. Experimental results conducted on both real and random data sets are reported and the results show that in most cases a second-order model is able to capture sufficient history to predict the next link choice with high accuracy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Anderson CR, Domingos P, Weld DS (2002) Relational markov models and their application to adaptive web navigation. In: KDD ’02: Proceedings of the 8th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, New York, pp 143–152
Bejerano G (2004) Algorithms for variable length Markov chain modelling. Bioinformatics 20:788–789
Borges J, Levene M (2000) Data mining of user navigation patterns. In: Masand B, Spiliopoulou M (eds). Web Usage Analysis and User Profiling. Lecture Notes in Artificial Intelligence (LNAI 1836), Springer, Berlin Heidelberg Newyork, pp. 92–111
Borges J, Levene M (2005a) A clustering-based approach for modelling user navigation with increased accuracy. In: Proceedings of the 2nd international workshop on knowledge discovery from data streams, Porto, Portugal, pp 77–86
Borges J, Levene M (2005b) Generating dynamic higher-order markov models in web usage mining. In: Jorge A, Torgo L, Brazdil P, Camacho R, Gama J (eds) Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases (PKDD), Springer, Porto, Portugal, Lecture Notes in Artificial Intelligence (LNAI 3271), pp 34–45
Canal L (2005) A normal approximation for the chi-square distribution. Comput Stat Data Anal 48:803–808
Chen Z, Fowler R, Fu AC, Wang C (2003) Fast construction of generalized suffix trees over a very large alphabet. In: Proceedings of international conference on computing and combinatorics (COCOON), Big Sky, MT, pp 284–293
Deshpande M, Karypis G (2004) Selective Markov models for predicting web page accesses. ACM Trans Internet Technol 4:163–184
Dongshan X, Junyi S (2002) A new Markov model for web access prediction. IEEE Comput Sci Eng 4:34–39
Jespersen S, Pedersen T, Thorhauge J (2003) Evaluating the markov assumption for web usage mining. In: Proceedings of the 5th ACM international workshop on Web information and data management, pp 82–89
Kemeny J, Snell J (1960) Finite Markov chains. D. Van Nostrand, Princeton
Levene M, Loizou G (2003) Computing the entropy of user navigation in the web. Int J Inform Technol Decision Making 2:459–476
Mitchell T (1997) Machine Learning. McGraw-Hill, New York
Mobasher B (2004) Web usage mining and personalization. In: Singh MP (ed) Practical Handbook of Internet Computing. Chapman Hall CRC Press, Baton Rouge
Perkowitz M, Etzioni O (2000) Towards adaptive web sites: Conceptual framework and case study. Artif Intell 118:245–275
Sarukkai RR (2000) Link prediction and path analysis using markov chains. Comput Netw 33(1–6):377–386
Schechter S, Krishnan M, Smith M (1998) Using path profiles to predict HTTP requests. Comput Netw ISDN Syst 30:457–467
Siegel S, Castellan N Jr (1988) Nonparametric Statistics for the Behavioral Sciences. 2nd edn. McGraw-Hill, New York
Spiliopoulou M, Mobasher B, Berendt B, Nakagawa M (2003) A framework for the evaluation of session reconstruction heuristics in web usage analysis. IN-FORMS J Comput 15: 171–190
Wilson E, Hilftery M (1931) The distribution of chi-square. In:Proceedings of the National Academy of Sciences of the United States of America 17:684–688
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Borges, J., Levene, M. Testing the Predictive Power of Variable History Web Usage. Soft Comput 11, 717–727 (2007). https://doi.org/10.1007/s00500-006-0115-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-006-0115-1