Skip to main content
Log in

Testing the Predictive Power of Variable History Web Usage

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

We present two methods for testing the predictive power of a variable length Markov chain induced from a collection of user web navigation sessions. The collection of sessions is split into a training and a test set. The first method uses a χ2 statistical test to measure the significance of the distance between the distribution of the probabilities assigned to the test trails by a Markov model build from the full collection of sessions and a model built from the training set. The statistical test measures the ability of the model to generalise its predictions to the unseen sessions from the test set. The second method evaluates the model ability to predict the last page of a navigation session based on the preceding pages viewed by recording the mean absolute error of the rank of the last occurring page among the predictions provided by the model. Experimental results conducted on both real and random data sets are reported and the results show that in most cases a second-order model is able to capture sufficient history to predict the next link choice with high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson CR, Domingos P, Weld DS (2002) Relational markov models and their application to adaptive web navigation. In: KDD ’02: Proceedings of the 8th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, New York, pp 143–152

  • Bejerano G (2004) Algorithms for variable length Markov chain modelling. Bioinformatics 20:788–789

    Article  Google Scholar 

  • Borges J, Levene M (2000) Data mining of user navigation patterns. In: Masand B, Spiliopoulou M (eds). Web Usage Analysis and User Profiling. Lecture Notes in Artificial Intelligence (LNAI 1836), Springer, Berlin Heidelberg Newyork, pp. 92–111

    Google Scholar 

  • Borges J, Levene M (2005a) A clustering-based approach for modelling user navigation with increased accuracy. In: Proceedings of the 2nd international workshop on knowledge discovery from data streams, Porto, Portugal, pp 77–86

  • Borges J, Levene M (2005b) Generating dynamic higher-order markov models in web usage mining. In: Jorge A, Torgo L, Brazdil P, Camacho R, Gama J (eds) Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases (PKDD), Springer, Porto, Portugal, Lecture Notes in Artificial Intelligence (LNAI 3271), pp 34–45

  • Canal L (2005) A normal approximation for the chi-square distribution. Comput Stat Data Anal 48:803–808

    Article  MathSciNet  Google Scholar 

  • Chen Z, Fowler R, Fu AC, Wang C (2003) Fast construction of generalized suffix trees over a very large alphabet. In: Proceedings of international conference on computing and combinatorics (COCOON), Big Sky, MT, pp 284–293

  • Deshpande M, Karypis G (2004) Selective Markov models for predicting web page accesses. ACM Trans Internet Technol 4:163–184

    Article  Google Scholar 

  • Dongshan X, Junyi S (2002) A new Markov model for web access prediction. IEEE Comput Sci Eng 4:34–39

    Google Scholar 

  • Jespersen S, Pedersen T, Thorhauge J (2003) Evaluating the markov assumption for web usage mining. In: Proceedings of the 5th ACM international workshop on Web information and data management, pp 82–89

  • Kemeny J, Snell J (1960) Finite Markov chains. D. Van Nostrand, Princeton

    MATH  Google Scholar 

  • Levene M, Loizou G (2003) Computing the entropy of user navigation in the web. Int J Inform Technol Decision Making 2:459–476

    Article  Google Scholar 

  • Mitchell T (1997) Machine Learning. McGraw-Hill, New York

    MATH  Google Scholar 

  • Mobasher B (2004) Web usage mining and personalization. In: Singh MP (ed) Practical Handbook of Internet Computing. Chapman Hall CRC Press, Baton Rouge

    Google Scholar 

  • Perkowitz M, Etzioni O (2000) Towards adaptive web sites: Conceptual framework and case study. Artif Intell 118:245–275

    Article  MATH  Google Scholar 

  • Sarukkai RR (2000) Link prediction and path analysis using markov chains. Comput Netw 33(1–6):377–386

    Article  Google Scholar 

  • Schechter S, Krishnan M, Smith M (1998) Using path profiles to predict HTTP requests. Comput Netw ISDN Syst 30:457–467

    Article  Google Scholar 

  • Siegel S, Castellan N Jr (1988) Nonparametric Statistics for the Behavioral Sciences. 2nd edn. McGraw-Hill, New York

    Google Scholar 

  • Spiliopoulou M, Mobasher B, Berendt B, Nakagawa M (2003) A framework for the evaluation of session reconstruction heuristics in web usage analysis. IN-FORMS J Comput 15: 171–190

    Article  Google Scholar 

  • Wilson E, Hilftery M (1931) The distribution of chi-square. In:Proceedings of the National Academy of Sciences of the United States of America 17:684–688

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Borges.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Borges, J., Levene, M. Testing the Predictive Power of Variable History Web Usage. Soft Comput 11, 717–727 (2007). https://doi.org/10.1007/s00500-006-0115-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-006-0115-1

Keywords

Navigation