Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Cadez, Igor; Heckerman, David; Meek, Christopher; Smyth, Padhraic; White, Steven

doi:10.1023/A:1024992613384

Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Published: October 2003

Volume 7, pages 399–424, (2003)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Igor Cadez¹,
David Heckerman²,
Christopher Meek²,
Padhraic Smyth³ &
…
Steven White²

906 Accesses
145 Citations
3 Altmetric
Explore all metrics

Abstract

We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we first partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach we employ is model-based (as opposed to distance-based) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of first-order Markov models using the Expectation-Maximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data; and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on user-traffic data from msnbc.com.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

References

Anderson, C., Domingos, P., and Weld, D. 2001. Adaptive Web navigation for wireless devices. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, San Francisco, CA: Morgan Kaufmann, pp. 879–884.
Google Scholar
Banfield, J. and Raftery, A. 1993. Model-based Gaussian and non-Gaussian clustering. Biometrics, 49:803–821.
Google Scholar
Bernardo, J. 1979. Expected information as expected utility. Annals of Statistics, 7:686–690.
Google Scholar
Bernardo, J. and Smith, A. 1994. Bayesian Theory. New York: John Wiley and Sons.
Google Scholar
Bestavros, A. 1996. Speculative data dissemination and service to reduce server load, network traffic, and service time in distributed information systems. In Proceedings of the Twelfth International Conference on Data Engineering, (S. Y. W. Su (Ed.)), IEEE Computer Society, pp. 180–187.
Borges, J. and Levene, M. 2000. Data mining of user navigation patterns. In Web Usage Analysis and User Profiling, (B. Masand, and M., Spiliopoulou (Eds.)). Berlin: Springer, pp. 92–111.
Google Scholar
Cadez, I. and Smyth, P. 1999. Probabilistic clustering using hierarchical models. Technical Report 99-16, Information and Computer Science, University of California, Irvine.
Cheeseman, P. and Stutz, J. 1995. Bayesian classification (AutoClass): Theory and results. In Advances in Knowledge Discovery and Data Mining, (U. Fayyad, G. Piatesky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.)). Menlo Park, CA: AAAI Press, pp. 153–180.
Google Scholar
Chen, M.-S., Park, J., and Yu, P. 1998. Efficient data mining for traversal patterns. IEEE Transactions on Knowledge and Data Engineering, 10:209–221.
Google Scholar
Cooley, R., Tan, P.-N., and Srivastava, J. 2000. Websift: the Web site information filter system. In Web Usage Analysis and User Profiling, (B. Masand, and M. Spiliopoulou (Eds.)). Berlin: Springer, pp. 163–182.
Google Scholar
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B 39:1–38.
Google Scholar
Deshpande, M. and Karypis, G. 2003. Selective Markov models for predicting web-page accesses. ACM Transactions on Internet Technology. To appear.
Fraley, C. and Raftery, A. 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. Computer Journal, 41:578–588.
Google Scholar
Fu, Y., Sandhu, K., and Shih, M.-Y. 2000. Clustering of Web users based on access patterns. In Web Usage Analysis and User Profiling, (B. Masand and M. Spiliopoulou (Eds.)). Berlin: Springer, pp. 21–38.
Google Scholar
Good, I. 1965. The Estimation of Probabilities. Cambridge, MA: MIT Press.
Google Scholar
Huberman, B., Pirolli, P., Pitkow, J., and Lukose, R. 1997. Strong regularities in World Wide Web surfing. Science, 280:95–97.
Google Scholar
Krogh, A. 1994. Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235:1501–1531.
Google Scholar
McLachlan, G. and Basford, K. 1988. Mixture Models: Inference and Applications to Clustering. Marcel Dekker.
Minar, N. and Donath, J. 1999. Visualizing crowds at a Web site. In Conference on Human Factors in Computing Systems; CHI99, pp. 186–187.
Padmanabhan, V. and Mogul, J. 1996. Using predictive pre-fetching to improve world wide web latency. ACM Computer Communication Review, 26:22–36.
Google Scholar
Pirolli, P. and Pitkow, J. 1999. Distribution of surfer's paths through the world wide web. World Wide Web, 2:29–45.
Google Scholar
Poulsen, C. 1990. Mixed Markov and latent Markov modelling applied to brand choice behavior. International Journal of Research in Marketing, 7:5–19.
Google Scholar
Rabiner, L., Lee, C., Juang, B., and Wilpon, L. 1989.HMM clustering for connected word recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Los Alamitos, CA: IEEE Computer Society Press, pp. 405–408.
Google Scholar
Ridgeway, G. and Altschuler, S. 1998. Clustering finite discrete Markov chains. In Proceedings of the Section on Physical and Engineering Sciences, pp. 228–229.
Sarukkai, R. 2000. Link prediction and path analysis using Markov chains. Computer Networks, 33(1–6):377–386.
Google Scholar
Sen, R. and Hansen, M. 2003. Predicting a Web user's next access based on log data. Journal of Computational Graphics and Statistics, 12(1):143–155.
Google Scholar
Smyth, P. 1997. Clustering sequences using hidden Markov models. In Advances in Neural Information Processing Systems 9, (M. Mozer, M. Jordan, and T. Petsche (Eds.)). MIT Press, pp. 648–654.
Smyth, P., Ide, K., and Ghil, M. 1999. Multiple regimes in Northern hemisphere height fields via mixture model clustering. Journal of the Atmospheric Sciences, 56:3704–3723.
Google Scholar
Smyth, P. 1999. Probabilistic model-based clustering of multivariate and sequential data. In Proceedings of Seventh International Workshop on Artificial Intelligence and Statistics, San Francsico, CA: Morgan Kaufmann, pp. 299–304.
Google Scholar
Spiliopoulou, M., Pohle, C., and Faulstich, L. 2000. Improving the effectiveness of a web site with Web usage mining. In Web Usage Analysis and User Profiling, (B. Masand and M. Spiliopoulou (Eds.)). Berlin: Springer, pp. 142–162.
Google Scholar
Thiesson, B., Meek, C., Chickering, D., and Heckerman, D. 1999. Computationally efficient methods for selecting among mixtures of graphical models, with discussion. In Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting, Oxford: Clarendon Press, pp. 631–656.
Google Scholar
Wedel, M. and Kamakura, W. 1998. Market Segmentation: Conceptual and Methodological Foundations. Kluwer Academic Publishers.
Wexelblat, A. and Maes, P. 1999. Footprints: History-rich tools for information foraging. In Proceedings of ACMCHI 99 Conference on Human Factors in Computing Systems, pp. 270–277.
Yan, T., Jacobsen, M., Garcia-Molina, H., and Dayal, U. 1996. From user access patterns to dynamic hypertext linking. Computer Networks, 28(7–11):1007–1014.
Google Scholar
Zaiane, O., Xin, M., and Han, J. 1998. Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs. In Proceedings of the Advances in Digital Libraries Conference, pp. 19–29.
Zuckerman, I., Albrecht, D., and Nicholson, A. 1999. Predicting user's requests on the WWW. In Proceedings of the Seventh International Conference on User Modeling, Springer Wien, pp. 275–284.

Download references

Author information

Authors and Affiliations

Sparta Inc., 23382 Mill Creek Drive, #100 Laguna Hills, CA, 92653, USA
Igor Cadez
Microsoft Research, One Microsoft Way, Redmond, WA, 98052-6399, USA
David Heckerman, Christopher Meek & Steven White
School of Information and Computer Science, University of California, Irvine, CA, 92697-3425, USA
Padhraic Smyth

Authors

Igor Cadez
View author publications
You can also search for this author in PubMed Google Scholar
David Heckerman
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Meek
View author publications
You can also search for this author in PubMed Google Scholar
Padhraic Smyth
View author publications
You can also search for this author in PubMed Google Scholar
Steven White
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Heckerman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cadez, I., Heckerman, D., Meek, C. et al. Model-Based Clustering and Visualization of Navigation Patterns on a Web Site. Data Mining and Knowledge Discovery 7, 399–424 (2003). https://doi.org/10.1023/A:1024992613384

Download citation

Issue Date: October 2003
DOI: https://doi.org/10.1023/A:1024992613384

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation