Skip to main content

A Divergence-Oriented Approach for Web Users Clustering

  • Conference paper
Computational Science and Its Applications - ICCSA 2006 (ICCSA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3981))

Included in the following conference series:

Abstract

Clustering web users based on their access patterns is a quite significant task in Web Usage Mining. Further to clustering it is important to evaluate the resulted clusters in order to choose the best clustering for a particular framework. This paper examines the usage of Kullback-Leibler divergence, an information theoretic distance, in conjuction with the k-means clustering algorithm. It compares KL-divergence with other well known distance measures (Euclidean, Standardized Euclidean and Manhattan) and evaluates clustering results using both objective function’s value and Davies-Bouldin index. Since it is imperative to assess whether the results of a clustering process are susceptible to noise, especially in noisy environments such as Web environment, our approach takes the impact of noise into account. The clusters obtained with KL approach seem to be superior to those obtained with the other distance measures in case our data have been corrupted by noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. In: SIGKDD Exploratios, vol. 1(2) (January 2000)

    Google Scholar 

  2. Petridou, S., Pallis, G., Vakali, A., Papadimitriou, G., Pomportsis, A.: Web Data Accessing and the Web Searching Process. In: ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 2003), Tunis, Tunisia, July 14-18 (2003)

    Google Scholar 

  3. Vakali, A., Papadimitriou, G.: Web Engineering: The Evolution of New Technologies. Guest Editorial in IEEE Computing in Science and Engineering 6(4), 10–11 (2004)

    Google Scholar 

  4. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  5. McQueen, J.B.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proc. 5th Berkley Symposium on Mathematical Statistics and Probability, Statistics, vol. I, pp, 281–297 (1994)

    Google Scholar 

  6. Kerr, M.K., Churchill, G.A.: Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. PNAS 98, 8961–8965 (2001)

    Article  MATH  Google Scholar 

  7. Stein, B., Eissen, S.M.Z., WiĂźbrock, F.: On Cluster Validity and the Information Need of Users. In: 3rd IASTED Int. Conference on Artificial Intelligence and Applications (AIA 2003) (2003)

    Google Scholar 

  8. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering Validity Checking Methods: Part II. In: SIGMOD Record, vol. 31(3) (September 2002)

    Google Scholar 

  9. Kasturi, J., Acharya, R., Ramanathan, M.: An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics 19(4), 449–458 (2003)

    Article  Google Scholar 

  10. Sturn, A.: Cluster analysis for large scale gene expression studies. Master’s thesis, Graz University of Technology, Graz, Austria (2001)

    Google Scholar 

  11. Dhillon, I.S., Mallela, S., Kumar, R.: Enchanced Word Clustering for Hierarchical Text Classification. In: KDD 2002, pp. 191–200 (2002)

    Google Scholar 

  12. Dhillon, I.S., Mallela, S., Kumar, R.: Information Theoretic Feature Clustering for Text Classification. Journal of Machine Learning Research 3, 1265–1287 (2003)

    Article  MATH  Google Scholar 

  13. Boutin, F., Hascoer, M.: Cluster Validity Indices for Graph Partitioning. In: Proceedings of the Eighth International Conference on Information Visualisation (IV 2004), 1093-9547/04 IEEE (2004)

    Google Scholar 

  14. Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4(3), 95–104 (1974)

    Article  MathSciNet  Google Scholar 

  15. Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Learning 1(2) (1979)

    Google Scholar 

  16. Larsen, B., Aone, C.: Fast and Effective: Text Mining Using Linear-time Document Clustering. In: Proc. KDD 1999 Workshop, San Diego, CA, USA (1999)

    Google Scholar 

  17. Mobasher, B., Cooley, R., Srivastava, J.: Creating Adaptive Web Sites Through Usage-Based Clustering of URLs. In: Proccedings of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX 1999) (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Petridou, S.G., Koutsonikola, V.A., Vakali, A.I., Papadimitriou, G.I. (2006). A Divergence-Oriented Approach for Web Users Clustering. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751588_130

Download citation

  • DOI: https://doi.org/10.1007/11751588_130

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34072-0

  • Online ISBN: 978-3-540-34074-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics