Skip to main content
Log in

Predicting WWW surfing using multiple evidence combination

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The improvement of many applications such as web search, latency reduction, and personalization/ recommendation systems depends on surfing prediction. Predicting user surfing paths involves tradeoffs between model complexity and predictive accuracy. In this paper, we combine two classification techniques, namely, the Markov model and Support Vector Machines (SVM), to resolve prediction using Dempster’s rule. Such fusion overcomes the inability of the Markov model in predicting the unseen data as well as overcoming the problem of multiclassification in the case of SVM, especially when dealing with large number of classes. We apply feature extraction to increase the power of discrimination of SVM. In addition, during prediction we employ domain knowledge to reduce the number of classifiers for the improvement of accuracy and the reduction of prediction time. We demonstrate the effectiveness of our hybrid approach by comparing our results with widely used techniques, namely, SVM, the Markov model, and association rule mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Yang, Q., Zhang, H., Li, T.: Mining web logs for prediction models in WWW caching and prefetching. In: 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD, August 26–29, pp. 473–478 (2001)

  2. Chinen, K., Yamaguchi, S.: An interactive prefetching proxy server for improvement of WWW latency. In: Proceedings of the Seventh Annual Conference of the Internet Society (INEt’97), Kuala Lumpur, June 1997

  3. Duchamp, D.: Prefetching hyperlinks. In: Proceedings of the Second USENIX Symposium on Internet Technologies and Systems (USITS), Boulder, CO, pp. 127–138 (1999)

  4. Teng W.-G., Chang C.-Y., Chen M.-S. (2005). Integrating Web caching and web prefetching in client-side proxies. IEEE Trans. Parallel Distrib. Syst. 16(5): 444-455

    Article  Google Scholar 

  5. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th Internatinal WWW Conference, Brisbane, Australia, pp. 107–117 (1998)

  6. Burke R. (2002). Hybrid recommender systems: survey and experiments. User Model. User-Adapted Interact. 12(4): 331-370

    Article  MATH  Google Scholar 

  7. Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Effective personalization based on association rule discovery from Web usage data. In: Proceedings of the ACM Workshop on Web Information and Data Management (WIDM01), pp. 9–15 (2001)

  8. Sarwar, B.M., Karypis, G., Konstan, J., Riedl, J.: Analysis of recommender algorithms for e-commerce. In: Proceedings of the 2nd ACM E-Commerce Conference (EC’00), October 2000, Minneapolis, Minnesota, pp. 158–167 (2000)

  9. Pitkow, J., Pirolli, P.: Mining longest repeating subsequences to predict World Wide Web surfing. In: Proceedings of the 2nd USENIX Symposium on Internet Technologies and Systems (USITS’99), Boulder, Colorado, October 1999, pp. 139–150 (1999)

  10. Grcar, M., Fortuna, B., Mladenic, D.: kNN versus SVM in the collaborative filtering framework. In: WebKDD ’05, August 21, Chicago, Illinois, USA

  11. Chung V., Li C.H., Kwok J. (2004). Dissimilarity learning for nominal data, Pattern Recognition 37(7): 1471-1477

    Article  Google Scholar 

  12. Lalmas, M.: Dempster–Shafer’s theory of evidence applied to structured documents: modelling uncertainty. In: Proceedings of the 20th Annual International ACM SIGIR, Philadelphia, PA, pp. 110–118 (1997)

  13. Pandey, A., Srivastava, J., Shekhar, S.: A Web intelligent prefetcher for dynamic pages using association rules – a summary of results. In: SIAM Workshop on Web Mining (2001)

  14. Su, Z., Yang, Q., Lu, Y., Zhang, H.: Whatnext: a prediction system for web requests using n-gram sequence models. In: Proceedings of the First International Conference on Web Information System and Engineering Conference, Hong Kong, June 2000, pp. 200–207 (2000)

  15. Chang, C.-Y., Chen, M.-S.: A new cache replacement algorithm for the integration of web caching and prefetching. In: Proceedings of the ACM 11th International Conference on Information and Knowledge Management (CIKM-02), November 4–9, pp. 632–634 (2002)

  16. Nasraoui, O., Pavuluri, M.: Complete this puzzle: a connectionist approach to accurate web recommendations based on a committee of predictors. In: Mobasher, B., Liu, B., Masand, B., Nasraoui, O. (eds.) Proceedings of WebKDD 2004, Workshop on Web Mining and Web Usage Analysis, part of the ACM KDD: Knowledge Discovery and Data Mining Conference, Seattle, WA (2004)

  17. Nasraoui, O., Petenes, C.: Combining web usage mining and fuzzy inference for website personalization. In: Proceedings of WebKDD, pp. 37–46 (2003)

  18. Nasraoui, O., Krishnapuram, R.: One step evolutionary mining of context sensitive associations and Web navigation patterns. In: SIAM International Conferince on Data Mining, Arlington , VA, April 2002, pp. 531–547 (2002)

  19. Kraft D.H., Chen J., Martin-Bautista M.J., Vila M.A.(2002).Textual information retrieval with user profiles using fuzzy clusering and inferencing. In: Szczepaniak P.S., Segovia J., Kacprzyk J., Zadeh L.A.(eds.) Intelligent Exploration of the Web. Physica-Verlag, Hiedelberg

    Google Scholar 

  20. Nasraoui O., Krishnapuram R.(2002). An evolutionary approach to mining robust multi-resolution web profiles and context sensitive URL Associations. International Journal of Computational Intelligence and Applications 2(3): 339-348

    Article  Google Scholar 

  21. Joachims, T., Freitag, D., Mitchell, T.: Webwatcher: a tour guide for the World Wide Web. In: Proceedings of the IJCAI-97, pp. 770–777 (1997)

  22. Cristianini N., Shawe-Taylor J. (2000). Introduction to Support Vector Machines. Cambridge University Press, Cambridge, pp. 93–122

    Google Scholar 

  23. Vapnik V.(1998). Statistical Learning Theory. Wiley, New York

    MATH  Google Scholar 

  24. Platt, J.: Probabilities for SV machines. In: Smola, A., Bartlett, P., Schlkopf, B, Schuurmans, D. (eds.) Advances in Large Margin Classifiers. Original Title: “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods”, pp. 61–74, MIT Press, Cambridge (1999)

  25. Wahba, G.: Multivariate function and operator estimation, based on smoothing splines and reproducing kernels. In: Casdagli, M., Eubank, S. (eds.) Nonlinear Modeling and Forecasting, SFI Studies in Sciences of Complexity, vol XII, pp. 95–112 (1992)

  26. Hastie, T., Tibshirani, R.: Classifiaction by pairwise coupling. In: Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems 10, Denver, Colorado, pp: 507–513 (1997)

  27. Press W.H., Teukolsky S.A., Vetterling W.T., Flannery B.P. (1992). Numerical Recipes in C: The Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge

    Google Scholar 

  28. Aslandogan, Y.A., Yu, C.T.: Evaluating strategies and systems for content based indexing of person images on the Web. In: Proceedings of the eighth ACM International Conference on Multimedia, Marina del Rey, California, United States, pp. 313–321 (2000)

  29. Shafer G. (1976). A Mathematical Theory of Evidence. Princeton University Press, Princeton

    MATH  Google Scholar 

  30. Bendjebbour A., Delignon Y., Fouque L., Samson V., Pieczynski W. (2001). Multisensor image segmentation using Dempster-shafer fusion in Markov fields context. IEEE Trans. Geosci. Remote Sens. 39(8): 1789-1798

    Article  Google Scholar 

  31. Aslandogan, Y.A., Mahajani, G.A., Taylor, S.: Evidence combination in medical data mining. In: Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’04), vol. 2, 465 pp. (2004)

  32. Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining World Wide Web browsing patterns. J. Knowl. Inf. Syst.1(1) (1999)

  33. Pirolli, P., Pitkow, J., Rao, R.: Silk from a sows ear: extracting usable structures from the web. In: Proceedings of 1996 Conference on Human Factors in Computing Systems (CHI-96), Vancouver, British Columbia, Canada, pp. 118–125 (1996)

  34. Chang, C., Lin, C.: LIBSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm (2001)

  35. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, San Francisco, CA, pp. 487–499 (1994)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mamoun Awad.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Awad, M., Khan, L. & Thuraisingham, B. Predicting WWW surfing using multiple evidence combination. The VLDB Journal 17, 401–417 (2008). https://doi.org/10.1007/s00778-006-0014-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-006-0014-1

Keywords

Navigation