A Divergence-Oriented Approach for Web Users Clustering

Petridou, Sophia G.; Koutsonikola, Vassiliki A.; Vakali, Athena I.; Papadimitriou, Georgios I.

doi:10.1007/11751588_130

Sophia G. Petridou²⁴,
Vassiliki A. Koutsonikola²⁴,
Athena I. Vakali²⁴ &
…
Georgios I. Papadimitriou²⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3981))

Included in the following conference series:

International Conference on Computational Science and Its Applications

613 Accesses
7 Citations

Abstract

Clustering web users based on their access patterns is a quite significant task in Web Usage Mining. Further to clustering it is important to evaluate the resulted clusters in order to choose the best clustering for a particular framework. This paper examines the usage of Kullback-Leibler divergence, an information theoretic distance, in conjuction with the k-means clustering algorithm. It compares KL-divergence with other well known distance measures (Euclidean, Standardized Euclidean and Manhattan) and evaluates clustering results using both objective function’s value and Davies-Bouldin index. Since it is imperative to assess whether the results of a clustering process are susceptible to noise, especially in noisy environments such as Web environment, our approach takes the impact of noise into account. The clusters obtained with KL approach seem to be superior to those obtained with the other distance measures in case our data have been corrupted by noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. In: SIGKDD Exploratios, vol. 1(2) (January 2000)
Google Scholar
Petridou, S., Pallis, G., Vakali, A., Papadimitriou, G., Pomportsis, A.: Web Data Accessing and the Web Searching Process. In: ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 2003), Tunis, Tunisia, July 14-18 (2003)
Google Scholar
Vakali, A., Papadimitriou, G.: Web Engineering: The Evolution of New Technologies. Guest Editorial in IEEE Computing in Science and Engineering 6(4), 10–11 (2004)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
McQueen, J.B.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proc. 5th Berkley Symposium on Mathematical Statistics and Probability, Statistics, vol. I, pp, 281–297 (1994)
Google Scholar
Kerr, M.K., Churchill, G.A.: Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. PNAS 98, 8961–8965 (2001)
Article MATH Google Scholar
Stein, B., Eissen, S.M.Z., Wißbrock, F.: On Cluster Validity and the Information Need of Users. In: 3rd IASTED Int. Conference on Artificial Intelligence and Applications (AIA 2003) (2003)
Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering Validity Checking Methods: Part II. In: SIGMOD Record, vol. 31(3) (September 2002)
Google Scholar
Kasturi, J., Acharya, R., Ramanathan, M.: An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics 19(4), 449–458 (2003)
Article Google Scholar
Sturn, A.: Cluster analysis for large scale gene expression studies. Master’s thesis, Graz University of Technology, Graz, Austria (2001)
Google Scholar
Dhillon, I.S., Mallela, S., Kumar, R.: Enchanced Word Clustering for Hierarchical Text Classification. In: KDD 2002, pp. 191–200 (2002)
Google Scholar
Dhillon, I.S., Mallela, S., Kumar, R.: Information Theoretic Feature Clustering for Text Classification. Journal of Machine Learning Research 3, 1265–1287 (2003)
Article MATH Google Scholar
Boutin, F., Hascoer, M.: Cluster Validity Indices for Graph Partitioning. In: Proceedings of the Eighth International Conference on Information Visualisation (IV 2004), 1093-9547/04 IEEE (2004)
Google Scholar
Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4(3), 95–104 (1974)
Article MathSciNet Google Scholar
Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Learning 1(2) (1979)
Google Scholar
Larsen, B., Aone, C.: Fast and Effective: Text Mining Using Linear-time Document Clustering. In: Proc. KDD 1999 Workshop, San Diego, CA, USA (1999)
Google Scholar
Mobasher, B., Cooley, R., Srivastava, J.: Creating Adaptive Web Sites Through Usage-Based Clustering of URLs. In: Proccedings of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX 1999) (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept of Informatics Aristotle University, 54124, Thessaloniki, Greece
Sophia G. Petridou, Vassiliki A. Koutsonikola, Athena I. Vakali & Georgios I. Papadimitriou

Authors

Sophia G. Petridou
View author publications
You can also search for this author in PubMed Google Scholar
Vassiliki A. Koutsonikola
View author publications
You can also search for this author in PubMed Google Scholar
Athena I. Vakali
View author publications
You can also search for this author in PubMed Google Scholar
Georgios I. Papadimitriou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Calgary, 2500 University Drive N.W., T2N 1N4, Calgary, AB, Canada
Marina L. Gavrilova
Department of Mathematics and Computer Science, University of Perugia, via Vanvitelli, 1, I-06123, Perugia, Italy
Osvaldo Gervasi
William Norris Professor, Head of the Computer Science and Engineering Department, University of Minnesota, USA
Vipin Kumar
OptimaNumerics Ltd., Cathedral House, 23-31 Waring Street, BT1 2DX, Belfast, UK
C. J. Kenneth Tan
Clayton School of IT, Monash University, 3800, Clayton, Australia
David Taniar
Department of Chemistry, University of Perugia, Via Elce di Sotto, 8, I-06123, Perugia, Italy
Antonio Laganá
School of Computing, Soongsil University, Seoul, Korea
Youngsong Mun
School of Information and Communication Engineering, Sungkyunkwan University, Korea
Hyunseung Choo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Petridou, S.G., Koutsonikola, V.A., Vakali, A.I., Papadimitriou, G.I. (2006). A Divergence-Oriented Approach for Web Users Clustering. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751588_130

Download citation

DOI: https://doi.org/10.1007/11751588_130
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34072-0
Online ISBN: 978-3-540-34074-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics