Skip to main content
Log in

Rough subspace-based clustering ensemble for categorical data

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Clustering categorical data arising as an important problem of data mining has recently attracted much attention. In this paper, the problem of unsupervised dimensionality reduction for categorical data is first studied. Based on the theory of rough sets, the attributes of categorical data are decomposed into a number of rough subspaces. A novel clustering ensemble algorithm based on rough subspaces is then proposed to deal with categorical data. The algorithm employs some of rough subspaces with high quality to cluster the data and yields a robust and stable solution by exploiting the resulting partitions. We also introduce a cluster index to evaluate the solution of clustering algorithm for categorical data. Experimental results for selected UCI data sets show that the proposed method produces better results than those obtained by other methods when being evaluated in terms of cluster validity indexes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Al-Razgan M, Domeniconi C, Barbara D (2008) Random subspace ensembles for clustering categorical data. SCI 126:31–48

    Google Scholar 

  • Anderberg MR (1973) Cluster analysis for applications. Academic Press, New York

    MATH  Google Scholar 

  • Ayad HG, Kamel MS (2008) Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173

    Article  Google Scholar 

  • Ayad HG, Kamel MS (2010) On voting-based consensus of cluster ensembles. Pattern Recogn 43(5):1943–1953

    Article  MATH  Google Scholar 

  • Ball GH, Hall DJ (1967) A clustering technique for summarizing multivariate data. Behav Sci 12(2):153–155

    Article  Google Scholar 

  • Bargiela A, Pedrycz W (2005) A model of granular data: a design problem with the Tchebyschev FCM. Soft Comput 9(3):155–163

    Article  MATH  Google Scholar 

  • Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell

    Book  MATH  Google Scholar 

  • Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data 2(4):1–40

    Article  Google Scholar 

  • Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: A cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning. pp 186–193

  • Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the 21th international conference on machine learning. Banff, Alberta, Canada

  • Fischer B, Buhmann JM (2003) Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell 25(11):1411–1415

    Article  Google Scholar 

  • Fred A, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850

    Article  Google Scholar 

  • Ghaemi R, Sulaiman MN, Ibrahim H et al (2009) A survey: clustering ensembles techniques. World Acad Sci Eng Technol 50:636–645

    Google Scholar 

  • Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30

    Article  Google Scholar 

  • Hadjitodorov ST, Kuncheva LI, Todorova LP (2006) Moderate diversity for better cluster ensembles. Inf Fusion 7(3):264–275

    Article  Google Scholar 

  • He ZY, Xu XF, Deng SC (2005) A cluster ensemble method for clustering categorical data. Inf Fusion 6(2):143–151

    Article  Google Scholar 

  • Hong Y, Kwong S, Chang YC et al (2008a) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn 41(9):2742–2756

    Article  MATH  Google Scholar 

  • Hong Y, Kwong S, Chang YC et al (2008b) Consensus unsupervised feature ranking from multiple views. Pattern Recogn Lett 29(5):595–602

    Article  Google Scholar 

  • Hore P, Hall LO, Goldgof DB (2009) A scalable framework for cluster ensembles. Pattern Recogn 42(5):676–688

    Article  MATH  Google Scholar 

  • Huang ZX, Ng MK (1999) A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans Fuzzy Syst 7(4):446–452

    Article  Google Scholar 

  • Iam-On N, Boongoen T, Garrett S et al (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409

    Article  Google Scholar 

  • Iam-On N, Boongeon T, Garrett S et al (2012) A link-based cluster ensemble approach for categorical data clustering. IEEE Trans Knowl Data Eng 24(3):413–425

    Article  Google Scholar 

  • Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River

    MATH  Google Scholar 

  • Jia JH, Xiao X, Liu BX et al (2011) Bagging-based spectral clustering ensemble selection. Pattern Recogn Lett 32(10):1456–1467

    Article  Google Scholar 

  • Jiang Y, Zhou Z-H (2004) SOM ensemble-based image segmentation. Neural Process Lett 20(3):171–178

    Article  Google Scholar 

  • Kuncheva LI, Vetrov DP (2006) Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans Pattern Anal Mach Intell 28(11):1798–1808

    Article  Google Scholar 

  • Lange T, Buhmann JM (2005) Combining partitions by probabilistic label aggregation. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining. pp 147–156

  • Li TY, Chen Y (2010) Fuzzy clustering ensemble with selection of number of clusters. J Comput 5(7):1112–1118

    Google Scholar 

  • Li T, Ogihara M, Ma S (2010) On combining multiple clusterings: an overview and a new perspective. Appl Intell 33(2):207–219

    Article  Google Scholar 

  • Liu Q (2001) Rough sets and rough reasoning. Science Press, Beijing (in Chinese)

    Google Scholar 

  • Luo HL, Jing FR, Xie XB (2006) Combining multiple clusterings using information theory based genetic algorithm. In: Proceedings of the 2006 international conference on computational intelligence and security. pp 84–89

  • Miao DQ, Li DG (2008) Rough sets theory, algorithms and applications. Tsinghua University Press, Beijing (in Chinese)

    Google Scholar 

  • Miao DQ, Zhao Y, Yao YY et al (2009) Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model. Inf Sci 179(24):4140–4150

    Article  MathSciNet  MATH  Google Scholar 

  • Minaei-Bidgoli B, Topchy A, Punch W (2004) A comparison of resampling methods for clustering ensembles. In: Proceedings of the international conference on artificial intelligence (IC-AI’04). pp 939–945

  • Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118

    Article  MATH  Google Scholar 

  • Øhrn A, Komorowski J (1997) ROSETTA: a rough set toolkit for analysis of data. In: Proceedings of the 3rd international joint conference on information sciences and 5th international workshop on rough sets and soft computing (RSSC’97), Durham, NC, USA, March. pp 403–407

  • Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356

    Article  MathSciNet  MATH  Google Scholar 

  • Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht

    MATH  Google Scholar 

  • Pedrycz W (1996) Conditional fuzzy C-means. Pattern Recogn Lett 17(6):625–632

    Article  Google Scholar 

  • Pedrycz W (2005) Knowledge based clustering: From data to information granules. Wiley, Hoboken

    Book  Google Scholar 

  • Pedrycz W, Loia V, Senatore S (2010) Fuzzy clustering with viewpoints. IEEE Trans Fuzzy Syst 18(2):274–284

    Google Scholar 

  • Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39

    Article  Google Scholar 

  • Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  Google Scholar 

  • Thangavel K, Pethalakshmi A (2009) Dimensionality reduction based on rough set theory: a review. Appl Soft Comput 9(1):1–12

    Article  Google Scholar 

  • Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881

    Article  Google Scholar 

  • Tumer K, Agogino AK (2008) Ensemble clustering with voting active clusters. Pattern Recogn Lett 29(14):1947–1953

    Article  Google Scholar 

  • Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25(3):337–372

    Article  MathSciNet  Google Scholar 

  • Wang GY (2001) Rough sets theory and knowledge acquisition. Xi’an Jiaotong University Press, Xi’an (in Chinese)

  • Wang JY, Gao C (2009) An improved algorithm for attribute reduction based on discernibility matrix. Comput Eng 35(3):66–68 (in Chinese)

    Google Scholar 

  • Wang WN, Zhang YJ (2007) On fuzzy cluster validity indices. Fuzzy Sets Syst 158(19):2095–2117

    Article  MATH  Google Scholar 

  • Yu ZW, Wong H-S (2009) Class discovery from gene expression data based on perturbation and cluster ensemble. IEEE Trans Nanobiosci 8(2):147–160

    Article  MathSciNet  Google Scholar 

  • Yu ZW, Wong H-S, Wang HQ (2007a) Graph-based consensus clustering for class discovery from gene expression data. Bioinformatics 23(21):2888–2896

    Article  Google Scholar 

  • Yu ZW, Zhang SH, Wong H-S, et al (2007) Image segmentation based on cluster ensemble. In: Proceedings of the 4th international symposium on neural networks: advances in neural networks, part III. Springer, Berlin, pp 894–903

  • Yu ZW, Deng ZK, Wong H-S, et al (2008) Fuzzy cluster ensemble and its application on 3D head model classification. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN 2008). pp 569–576

  • Yu ZW, Wong H-S, You J et al (2011) Knowledge based cluster ensemble for cancer discovery from biomolecular data. IEEE Trans Nanobiosci 10(2):76–85

    Article  Google Scholar 

  • Yu ZW, Wong H-S, You J et al (2012a) Hybrid cluster ensemble framework based on the random combination of data transformation operators. Pattern Recogn 45(5):1826–1837

    Article  MATH  Google Scholar 

  • Yu ZW, You J, Wong H-S et al (2012b) From cluster ensemble to structure ensemble. Inf Sci 198:81–99

    Article  MATH  Google Scholar 

  • Zhang WX, Wu WZ, Liang JY et al (2001) Rough sets theory and methods. Science Press, Beijing (in Chinese)

    Google Scholar 

  • Zhang XR, Jiao LC, Liu F et al (2008) Spectral clustering ensemble applied to SAR image segmentation. IEEE Trans Geosci Remote Sens 46(7):2126–2136

    Article  Google Scholar 

  • Zhou ZH, Wu JX, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou J, Miao DQ, Pedrycz W et al (2011) Analysis of alternative objective functions for attribute reduction in complete decision tables. Soft Comput 15(8):1601–1616

    Article  MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the Editors for their kindly help and the anonymous referees for their valuable comments and helpful suggestions. Special thanks go to Ms. Ting Zhu for her assistance in revising the paper. The work is partially supported by the National Natural Science Foundation of China (Serial No. 60970061, 61075056, 61103067, 61202170, 61203247, 61273304), China Postdoctoral Science Foundation (Serial No. 2011M500626, 2011M500815) and Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Can Gao.

Additional information

Communicated by A. Di Nola.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, C., Pedrycz, W. & Miao, D. Rough subspace-based clustering ensemble for categorical data. Soft Comput 17, 1643–1658 (2013). https://doi.org/10.1007/s00500-012-0972-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-012-0972-8

Keywords

Navigation