Skip to main content
Log in

Interval Set Clustering of Web Users with Rough K-Means

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • do Prado, H.A., Engel, P.M., and Filho, H.C. (2002). Rough Clustering: An Alternative to Finding Meaningful Clusters by Using the Reducts from a Dataset. In J. Alpigini, J.F. Peters, A. Skowron, N. Zhong (Eds.), Rough Sets and Current Trends in Computing (RSCTC'02). Springer-Verlag, Lecture notes in Artificial Intelligence 2475.

  • Hartigan, J.A. and Wong, M.A. (1979). Algorithm AS136: A K-Means Clustering Algorithm. Applied Statistics, 28, 100-108.

    Google Scholar 

  • Hathaway, R.J. and Bezdek, J.C. (1993). Switching Regression Models and Fuzzy Clustering. IEEE Transactions of Fuzzy Systems, 1(3), 195-204.

    Google Scholar 

  • Hirano, S. and Tsumoto, S. (2000). Rough Clustering and Its Application to Medicine. Journal of Information Science, 124, 125-137.

    Google Scholar 

  • Joachims, T., Armstrong, R., Freitag, D., and Mitchell, T. (1995). Webwatcher: A Learning Apprentice for the World Wide Web. In AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments.

  • Joshi, A. and Krishnapuram, R. (1998). Robust Fuzzy Clustering Methods to SupportWeb Mining. In Proceedings of the Workshop on Data Mining and Knowledge Discovery, SIGMOD '98 (pp. 15/1-15/8).

  • Krishnapuram, R., Frigui, H., and Nasraoui, O. (1995). Fuzzy and Possibilistic Shell Clustering Algorithms and Their Application to Boundary Detection and Surface Approximation: Parts I and II. IEEE Transactions on Fuzzy Systems, 3(1), 29-60.

    Google Scholar 

  • Krishnapuram, R. and Keller, J. (1993). A Possibilistic Approach to Clustering. IEEE Transactions on Fuzzy Systems, 1(2), 98-110.

    Google Scholar 

  • Lingras, P. (2001). Unsupervised Rough Set Classification Using GAs. Journal of Intelligent Information Systems, 16(3), 215-228.

    Google Scholar 

  • Lingras, P. (2002). Rough Set Clustering forWebMining. In Proceedings of 2002 IEEE International Conference on Fuzzy Systems.

  • Lingras, P. and Huang, X. (2002). Statistical, Evolutionary, and Neurocomputing Clustering Techniques: Cluster-Based Versus Object-Based Approaches. Intelligence Review (submitted).

  • MacQueen, J. (1967). Some Methods fir Classification and Analysis of Multivariate Observations. In L.M. Le Cam and J. Neyman (Eds.), Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1 (pp. 281-297).

  • Pawlak, Z. (1982). Rough Sets. International Journal of Information and Computer Sciences, 11, 145-172.

    Google Scholar 

  • Pawlak, Z. (1984). Rough Classification. International Journal of Man-Machine Studies, 20, 469-483.

    Google Scholar 

  • Pawlak, Z. (1992). Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers.

  • Polkowski, L. and Skowron. (1996). Rough Mereology: A New Paradigm for Approximate Reasoning. International Journal of Approximate Reasoning, 15(4), 333-365.

    Google Scholar 

  • Perkowitz, M. and Etzioni, O. (1997). Adaptive Web Sites: An AI Challenge. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence.

  • Perkowitz, M. and Etzioni, O. (1999). Adaptive Web Sites: Conceptual Cluster Mining. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence.

  • Peters, J.F., Skowron, A., Suraj, Z., Rzasa, W., and Borkowski, M. (2002). Clustering: A Rough Set Approach to Constructing Information Granules. In Z. Suraj (Ed.), Soft Computing and Distributed Processing, Proceedings of 6th International Conference, SCDP 2002 (pp. 57-61).

  • Skowron, A. and Stepaniuk, J. (1999). Information Granules in Distributed Environment. In S. Ohsuga, N. Zhong, and A. Skowron (Eds.), New Directions in Rough Sets, Data Mining, and Granular-Soft Computing (pp. 357-365). Springer-Verlag, Lecture notes in Artificial Intelligence 1711, Tokyo.

    Google Scholar 

  • Voges, K.E., Pope, N.K.Ll., and Brown, M.R. (2002a). Cluster Analysis of Marketing Data: A Comparison of K-Means, Rough Set, and Rough Genetic Approaches. In H.A. Abbas, R.A. Sarker, and C.S. Newton (Eds.), Heuristics and Optimization for Knowledge Discovery (pp. 208-216). Idea Group Publishing.

  • Voges, K.E., Pope, N.K.Ll., and Brown, M.R. (2002b). Cluster Analysis of Marketing Data Examining On-Line Shopping Orientation: A Comparison of K-Means, Rough Clustering Approaches. In H.A. Abbas, R.A. Sarker, and C.S. Newton (Eds.), Heuristics and Optimization for Knowledge Discovery (pp. 217-225). Idea Group Publishing.

  • Yao, Y.Y., Li, X., Lin, T.Y., and Liu, Q. (1994). Representation and Classification of Rough Set Models. In Proceeding of Third International Workshop on Rough Sets and Soft Computing (pp. 630-637).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pawan Lingras.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lingras, P., West, C. Interval Set Clustering of Web Users with Rough K-Means. Journal of Intelligent Information Systems 23, 5–16 (2004). https://doi.org/10.1023/B:JIIS.0000029668.88665.1a

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:JIIS.0000029668.88665.1a

Navigation