Interval Set Clustering of Web Users with Rough K-Means

Lingras, Pawan; West, Chad

doi:10.1023/B:JIIS.0000029668.88665.1a

Interval Set Clustering of Web Users with Rough K-Means

Published: July 2004

Volume 23, pages 5–16, (2004)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Pawan Lingras¹ &
Chad West¹

1371 Accesses
384 Citations
3 Altmetric
Explore all metrics

Abstract

Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

do Prado, H.A., Engel, P.M., and Filho, H.C. (2002). Rough Clustering: An Alternative to Finding Meaningful Clusters by Using the Reducts from a Dataset. In J. Alpigini, J.F. Peters, A. Skowron, N. Zhong (Eds.), Rough Sets and Current Trends in Computing (RSCTC'02). Springer-Verlag, Lecture notes in Artificial Intelligence 2475.
Hartigan, J.A. and Wong, M.A. (1979). Algorithm AS136: A K-Means Clustering Algorithm. Applied Statistics, 28, 100-108.
Google Scholar
Hathaway, R.J. and Bezdek, J.C. (1993). Switching Regression Models and Fuzzy Clustering. IEEE Transactions of Fuzzy Systems, 1(3), 195-204.
Google Scholar
Hirano, S. and Tsumoto, S. (2000). Rough Clustering and Its Application to Medicine. Journal of Information Science, 124, 125-137.
Google Scholar
Joachims, T., Armstrong, R., Freitag, D., and Mitchell, T. (1995). Webwatcher: A Learning Apprentice for the World Wide Web. In AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments.
Joshi, A. and Krishnapuram, R. (1998). Robust Fuzzy Clustering Methods to SupportWeb Mining. In Proceedings of the Workshop on Data Mining and Knowledge Discovery, SIGMOD '98 (pp. 15/1-15/8).
Krishnapuram, R., Frigui, H., and Nasraoui, O. (1995). Fuzzy and Possibilistic Shell Clustering Algorithms and Their Application to Boundary Detection and Surface Approximation: Parts I and II. IEEE Transactions on Fuzzy Systems, 3(1), 29-60.
Google Scholar
Krishnapuram, R. and Keller, J. (1993). A Possibilistic Approach to Clustering. IEEE Transactions on Fuzzy Systems, 1(2), 98-110.
Google Scholar
Lingras, P. (2001). Unsupervised Rough Set Classification Using GAs. Journal of Intelligent Information Systems, 16(3), 215-228.
Google Scholar
Lingras, P. (2002). Rough Set Clustering forWebMining. In Proceedings of 2002 IEEE International Conference on Fuzzy Systems.
Lingras, P. and Huang, X. (2002). Statistical, Evolutionary, and Neurocomputing Clustering Techniques: Cluster-Based Versus Object-Based Approaches. Intelligence Review (submitted).
MacQueen, J. (1967). Some Methods fir Classification and Analysis of Multivariate Observations. In L.M. Le Cam and J. Neyman (Eds.), Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1 (pp. 281-297).
Pawlak, Z. (1982). Rough Sets. International Journal of Information and Computer Sciences, 11, 145-172.
Google Scholar
Pawlak, Z. (1984). Rough Classification. International Journal of Man-Machine Studies, 20, 469-483.
Google Scholar
Pawlak, Z. (1992). Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers.
Polkowski, L. and Skowron. (1996). Rough Mereology: A New Paradigm for Approximate Reasoning. International Journal of Approximate Reasoning, 15(4), 333-365.
Google Scholar
Perkowitz, M. and Etzioni, O. (1997). Adaptive Web Sites: An AI Challenge. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence.
Perkowitz, M. and Etzioni, O. (1999). Adaptive Web Sites: Conceptual Cluster Mining. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence.
Peters, J.F., Skowron, A., Suraj, Z., Rzasa, W., and Borkowski, M. (2002). Clustering: A Rough Set Approach to Constructing Information Granules. In Z. Suraj (Ed.), Soft Computing and Distributed Processing, Proceedings of 6th International Conference, SCDP 2002 (pp. 57-61).
Skowron, A. and Stepaniuk, J. (1999). Information Granules in Distributed Environment. In S. Ohsuga, N. Zhong, and A. Skowron (Eds.), New Directions in Rough Sets, Data Mining, and Granular-Soft Computing (pp. 357-365). Springer-Verlag, Lecture notes in Artificial Intelligence 1711, Tokyo.
Google Scholar
Voges, K.E., Pope, N.K.Ll., and Brown, M.R. (2002a). Cluster Analysis of Marketing Data: A Comparison of K-Means, Rough Set, and Rough Genetic Approaches. In H.A. Abbas, R.A. Sarker, and C.S. Newton (Eds.), Heuristics and Optimization for Knowledge Discovery (pp. 208-216). Idea Group Publishing.
Voges, K.E., Pope, N.K.Ll., and Brown, M.R. (2002b). Cluster Analysis of Marketing Data Examining On-Line Shopping Orientation: A Comparison of K-Means, Rough Clustering Approaches. In H.A. Abbas, R.A. Sarker, and C.S. Newton (Eds.), Heuristics and Optimization for Knowledge Discovery (pp. 217-225). Idea Group Publishing.
Yao, Y.Y., Li, X., Lin, T.Y., and Liu, Q. (1994). Representation and Classification of Rough Set Models. In Proceeding of Third International Workshop on Rough Sets and Soft Computing (pp. 630-637).

Download references

Author information

Authors and Affiliations

Saint Mary's University, Halifax, Nova Scotia, B3H 3C3, Canada
Pawan Lingras & Chad West

Authors

Pawan Lingras
View author publications
You can also search for this author in PubMed Google Scholar
Chad West
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pawan Lingras.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lingras, P., West, C. Interval Set Clustering of Web Users with Rough K-Means. Journal of Intelligent Information Systems 23, 5–16 (2004). https://doi.org/10.1023/B:JIIS.0000029668.88665.1a

Download citation

Issue Date: July 2004
DOI: https://doi.org/10.1023/B:JIIS.0000029668.88665.1a

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interval Set Clustering of Web Users with Rough K-Means

Abstract

Access this article

Similar content being viewed by others

Advances in Rough and Soft Clustering: Meta-Clustering, Dynamic Clustering, Data-Stream Clustering

Various Types of Objective-Based Rough Clustering

Enhancing Rough Clustering with Outlier Detection Based on Evidential Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Interval Set Clustering of Web Users with Rough K-Means

Abstract

Access this article

Similar content being viewed by others

Advances in Rough and Soft Clustering: Meta-Clustering, Dynamic Clustering, Data-Stream Clustering

Various Types of Objective-Based Rough Clustering

Enhancing Rough Clustering with Outlier Detection Based on Evidential Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation