Rough subspace-based clustering ensemble for categorical data

Gao, Can; Pedrycz, Witold; Miao, Duoqian

doi:10.1007/s00500-012-0972-8

Rough subspace-based clustering ensemble for categorical data

Methodologies and Application
Published: 10 January 2013

Volume 17, pages 1643–1658, (2013)
Cite this article

Soft Computing Aims and scope Submit manuscript

Can Gao^1,2,
Witold Pedrycz^2,3 &
Duoqian Miao¹

655 Accesses
16 Citations
Explore all metrics

Abstract

Clustering categorical data arising as an important problem of data mining has recently attracted much attention. In this paper, the problem of unsupervised dimensionality reduction for categorical data is first studied. Based on the theory of rough sets, the attributes of categorical data are decomposed into a number of rough subspaces. A novel clustering ensemble algorithm based on rough subspaces is then proposed to deal with categorical data. The algorithm employs some of rough subspaces with high quality to cluster the data and yields a robust and stable solution by exploiting the resulting partitions. We also introduce a cluster index to evaluate the solution of clustering algorithm for categorical data. Experimental results for selected UCI data sets show that the proposed method produces better results than those obtained by other methods when being evaluated in terms of cluster validity indexes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient hybrid algorithms for density based subspace clustering to deal with density divergence for improved quality and conciseness

Article 30 October 2019

B. Jaya Lakshmi, K. B. Madhuri & M. Shashi

Subspace Clustering Technique Using Multi-objective Functions for Multi-class Categorical Data

Constraint Based Subspace Clustering for High Dimensional Uncertain Data

References

Al-Razgan M, Domeniconi C, Barbara D (2008) Random subspace ensembles for clustering categorical data. SCI 126:31–48
Google Scholar
Anderberg MR (1973) Cluster analysis for applications. Academic Press, New York
MATH Google Scholar
Ayad HG, Kamel MS (2008) Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173
Article Google Scholar
Ayad HG, Kamel MS (2010) On voting-based consensus of cluster ensembles. Pattern Recogn 43(5):1943–1953
Article MATH Google Scholar
Ball GH, Hall DJ (1967) A clustering technique for summarizing multivariate data. Behav Sci 12(2):153–155
Article Google Scholar
Bargiela A, Pedrycz W (2005) A model of granular data: a design problem with the Tchebyschev FCM. Soft Comput 9(3):155–163
Article MATH Google Scholar
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell
Book MATH Google Scholar
Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data 2(4):1–40
Article Google Scholar
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: A cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning. pp 186–193
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the 21th international conference on machine learning. Banff, Alberta, Canada
Fischer B, Buhmann JM (2003) Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell 25(11):1411–1415
Article Google Scholar
Fred A, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Article Google Scholar
Ghaemi R, Sulaiman MN, Ibrahim H et al (2009) A survey: clustering ensembles techniques. World Acad Sci Eng Technol 50:636–645
Google Scholar
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30
Article Google Scholar
Hadjitodorov ST, Kuncheva LI, Todorova LP (2006) Moderate diversity for better cluster ensembles. Inf Fusion 7(3):264–275
Article Google Scholar
He ZY, Xu XF, Deng SC (2005) A cluster ensemble method for clustering categorical data. Inf Fusion 6(2):143–151
Article Google Scholar
Hong Y, Kwong S, Chang YC et al (2008a) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn 41(9):2742–2756
Article MATH Google Scholar
Hong Y, Kwong S, Chang YC et al (2008b) Consensus unsupervised feature ranking from multiple views. Pattern Recogn Lett 29(5):595–602
Article Google Scholar
Hore P, Hall LO, Goldgof DB (2009) A scalable framework for cluster ensembles. Pattern Recogn 42(5):676–688
Article MATH Google Scholar
Huang ZX, Ng MK (1999) A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans Fuzzy Syst 7(4):446–452
Article Google Scholar
Iam-On N, Boongoen T, Garrett S et al (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409
Article Google Scholar
Iam-On N, Boongeon T, Garrett S et al (2012) A link-based cluster ensemble approach for categorical data clustering. IEEE Trans Knowl Data Eng 24(3):413–425
Article Google Scholar
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666
Article Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River
MATH Google Scholar
Jia JH, Xiao X, Liu BX et al (2011) Bagging-based spectral clustering ensemble selection. Pattern Recogn Lett 32(10):1456–1467
Article Google Scholar
Jiang Y, Zhou Z-H (2004) SOM ensemble-based image segmentation. Neural Process Lett 20(3):171–178
Article Google Scholar
Kuncheva LI, Vetrov DP (2006) Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans Pattern Anal Mach Intell 28(11):1798–1808
Article Google Scholar
Lange T, Buhmann JM (2005) Combining partitions by probabilistic label aggregation. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining. pp 147–156
Li TY, Chen Y (2010) Fuzzy clustering ensemble with selection of number of clusters. J Comput 5(7):1112–1118
Google Scholar
Li T, Ogihara M, Ma S (2010) On combining multiple clusterings: an overview and a new perspective. Appl Intell 33(2):207–219
Article Google Scholar
Liu Q (2001) Rough sets and rough reasoning. Science Press, Beijing (in Chinese)
Google Scholar
Luo HL, Jing FR, Xie XB (2006) Combining multiple clusterings using information theory based genetic algorithm. In: Proceedings of the 2006 international conference on computational intelligence and security. pp 84–89
Miao DQ, Li DG (2008) Rough sets theory, algorithms and applications. Tsinghua University Press, Beijing (in Chinese)
Google Scholar
Miao DQ, Zhao Y, Yao YY et al (2009) Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model. Inf Sci 179(24):4140–4150
Article MathSciNet MATH Google Scholar
Minaei-Bidgoli B, Topchy A, Punch W (2004) A comparison of resampling methods for clustering ensembles. In: Proceedings of the international conference on artificial intelligence (IC-AI’04). pp 939–945
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118
Article MATH Google Scholar
Øhrn A, Komorowski J (1997) ROSETTA: a rough set toolkit for analysis of data. In: Proceedings of the 3rd international joint conference on information sciences and 5th international workshop on rough sets and soft computing (RSSC’97), Durham, NC, USA, March. pp 403–407
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Article MathSciNet MATH Google Scholar
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht
MATH Google Scholar
Pedrycz W (1996) Conditional fuzzy C-means. Pattern Recogn Lett 17(6):625–632
Article Google Scholar
Pedrycz W (2005) Knowledge based clustering: From data to information granules. Wiley, Hoboken
Book Google Scholar
Pedrycz W, Loia V, Senatore S (2010) Fuzzy clustering with viewpoints. IEEE Trans Fuzzy Syst 18(2):274–284
Google Scholar
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39
Article Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
MathSciNet Google Scholar
Thangavel K, Pethalakshmi A (2009) Dimensionality reduction based on rough set theory: a review. Appl Soft Comput 9(1):1–12
Article Google Scholar
Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881
Article Google Scholar
Tumer K, Agogino AK (2008) Ensemble clustering with voting active clusters. Pattern Recogn Lett 29(14):1947–1953
Article Google Scholar
Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25(3):337–372
Article MathSciNet Google Scholar
Wang GY (2001) Rough sets theory and knowledge acquisition. Xi’an Jiaotong University Press, Xi’an (in Chinese)
Wang JY, Gao C (2009) An improved algorithm for attribute reduction based on discernibility matrix. Comput Eng 35(3):66–68 (in Chinese)
Google Scholar
Wang WN, Zhang YJ (2007) On fuzzy cluster validity indices. Fuzzy Sets Syst 158(19):2095–2117
Article MATH Google Scholar
Yu ZW, Wong H-S (2009) Class discovery from gene expression data based on perturbation and cluster ensemble. IEEE Trans Nanobiosci 8(2):147–160
Article MathSciNet Google Scholar
Yu ZW, Wong H-S, Wang HQ (2007a) Graph-based consensus clustering for class discovery from gene expression data. Bioinformatics 23(21):2888–2896
Article Google Scholar
Yu ZW, Zhang SH, Wong H-S, et al (2007) Image segmentation based on cluster ensemble. In: Proceedings of the 4th international symposium on neural networks: advances in neural networks, part III. Springer, Berlin, pp 894–903
Yu ZW, Deng ZK, Wong H-S, et al (2008) Fuzzy cluster ensemble and its application on 3D head model classification. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN 2008). pp 569–576
Yu ZW, Wong H-S, You J et al (2011) Knowledge based cluster ensemble for cancer discovery from biomolecular data. IEEE Trans Nanobiosci 10(2):76–85
Article Google Scholar
Yu ZW, Wong H-S, You J et al (2012a) Hybrid cluster ensemble framework based on the random combination of data transformation operators. Pattern Recogn 45(5):1826–1837
Article MATH Google Scholar
Yu ZW, You J, Wong H-S et al (2012b) From cluster ensemble to structure ensemble. Inf Sci 198:81–99
Article MATH Google Scholar
Zhang WX, Wu WZ, Liang JY et al (2001) Rough sets theory and methods. Science Press, Beijing (in Chinese)
Google Scholar
Zhang XR, Jiao LC, Liu F et al (2008) Spectral clustering ensemble applied to SAR image segmentation. IEEE Trans Geosci Remote Sens 46(7):2126–2136
Article Google Scholar
Zhou ZH, Wu JX, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263
Article MathSciNet MATH Google Scholar
Zhou J, Miao DQ, Pedrycz W et al (2011) Analysis of alternative objective functions for attribute reduction in complete decision tables. Soft Comput 15(8):1601–1616
Article MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank the Editors for their kindly help and the anonymous referees for their valuable comments and helpful suggestions. Special thanks go to Ms. Ting Zhu for her assistance in revising the paper. The work is partially supported by the National Natural Science Foundation of China (Serial No. 60970061, 61075056, 61103067, 61202170, 61203247, 61273304), China Postdoctoral Science Foundation (Serial No. 2011M500626, 2011M500815) and Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tongji University, Shanghai, 201804, People’s Republic of China
Can Gao & Duoqian Miao
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, T6G 2G7, Canada
Can Gao & Witold Pedrycz
System Research Institute, Polish Academy of Sciences, Warsaw, Poland
Witold Pedrycz

Authors

Can Gao
View author publications
You can also search for this author in PubMed Google Scholar
Witold Pedrycz
View author publications
You can also search for this author in PubMed Google Scholar
Duoqian Miao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Can Gao.

Additional information

Communicated by A. Di Nola.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, C., Pedrycz, W. & Miao, D. Rough subspace-based clustering ensemble for categorical data. Soft Comput 17, 1643–1658 (2013). https://doi.org/10.1007/s00500-012-0972-8

Download citation

Published: 10 January 2013
Issue Date: September 2013
DOI: https://doi.org/10.1007/s00500-012-0972-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rough subspace-based clustering ensemble for categorical data

Abstract

Access this article

Similar content being viewed by others

Efficient hybrid algorithms for density based subspace clustering to deal with density divergence for improved quality and conciseness

Subspace Clustering Technique Using Multi-objective Functions for Multi-class Categorical Data

Constraint Based Subspace Clustering for High Dimensional Uncertain Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Efficient hybrid algorithms for density based subspace clustering to deal with density divergence for improved quality and conciseness

Subspace Clustering Technique Using Multi-objective Functions for Multi-class Categorical Data

Constraint Based Subspace Clustering for High Dimensional Uncertain Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation