Efficient approaches for summarizing subspace clusters into k representatives

Chen, Guanhua; Ma, Xiuli; Yang, Dongqing; Tang, Shiwei; Shuai, Meng; Xie, Kunqing

doi:10.1007/s00500-010-0552-8

Efficient approaches for summarizing subspace clusters into k representatives

Focus
Published: 09 March 2010

Volume 15, pages 845–853, (2011)
Cite this article

Soft Computing Aims and scope Submit manuscript

Guanhua Chen^1,2,
Xiuli Ma^1,2,
Dongqing Yang^1,3,
Shiwei Tang^1,2,
Meng Shuai^1,2 &
…
Kunqing Xie^1,2

141 Accesses
3 Citations
Explore all metrics

Abstract

A major challenge in subspace clustering is that subspace clustering may generate an explosive number of clusters with high computational complexity, which severely restricts the usage of subspace clustering. The problem gets even worse with the increase of the data’s dimensionality. In this paper, we propose to summarize the set of subspace clusters into k representative clusters to alleviate the problem. Typically, subspace clusters can be clustered further into k groups, and the set of representative clusters can be selected from each group. In such a way, only the most representative subspace clusters will be returned to user. Unfortunately, when the size of the set of representative clusters is specified, the problem of finding the optimal set is NP-hard. To solve this problem efficiently, we present two approximate methods: PCoC and HCoC. The greatest advantage of our methods is that we only need a subset of subspace clusters as the input instead of the complete set of subspace clusters. Precisely, only the clusters in low-dimensional subspaces are computed and assembled into representative clusters in high-dimensional subspaces. The approximate results can be found in polynomial time. Our performance study shows both the effectiveness and efficiency of these methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Article 22 April 2020

Kavan Fatehi, Mohsen Rezvani & Mansoor Fateh

Information-Theoretic Non-redundant Subspace Clustering

Finding Well-Clusterable Subspaces for High Dimensional Data

References

Afrati F, Gionis A, Mannila H (2004) Approximating a collection of frequent sets. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04), pp 12–19
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of international conference on management of data (SIGMOD’98), pp 94–105
Assent I, Krieger R, Müller E, Seidl T (2007) DUSC: dimensionality unbiased subspace clustering. In: Proceedings of the seventh IEEE international conference on data mining (ICDM’07), pp 409–414
Baumgartner C, Kailing K, Kriegel H-P, Kroger P, Plant C (2004) Subspace selection for clustering high dimensional data. In: Proceedings of the fourth IEEE international conference on data mining (ICDM’04), pp 11–18
Bohm C, Kailing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Proceedings of the fourth IEEE international conference on data mining (ICDM’04), pp 27–34
Chen G, Ma X, Yang D, Tang S (2008) Discovering the skyline of subspace clusters in high-dimensional data, In: Proceedings of the fifth international conference on fuzzy systems and knowledge discovery (FSKD ‘08), vol 2, pp 439–443
Cheng CH, Fu AC, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’99), pp 84–93
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second ACM SIGKDD international conference on knowledge discovery and data mining (KDD’96), pp 226–231
Jin R, Abu-Ata M, Xiang Y, Ruan N (2008) Effective and efficient itemset pattern summarization: regression-based approaches, In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’08), pp 399–407
Kriegel HP, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the fifth IEEE international conference on data mining (ICDM’05)
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Lecture notes in computer science: database theory, ICDT, pp 398–416
Wang C, Parthasarathy S (2006) Summarizing itemset patterns using probabilistic models. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’06), pp 730–735
Xin D, Han J, Yan X, Cheng H (2005) Mining compressed frequent-pattern sets. In: Proceedings of the 31st international conference on very large data bases (VLDB’05), pp 709–720
Yan X, Cheng H, Han J, Xin D (2005) Summarizing itemset patterns: a profile-based approach. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05), pp 314–323

Download references

Acknowledgments

This work was supported by the National High-Tech Research and Development Plan of China (863) under Grant No. 2007AA120502; NSFC 60874082.

Author information

Authors and Affiliations

School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Guanhua Chen, Xiuli Ma, Dongqing Yang, Shiwei Tang, Meng Shuai & Kunqing Xie
Key Laboratory of Machine Perception (Peking University), Ministry of Education, Beijing, China
Guanhua Chen, Xiuli Ma, Shiwei Tang, Meng Shuai & Kunqing Xie
Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing, China
Dongqing Yang

Authors

Guanhua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiuli Ma
View author publications
You can also search for this author in PubMed Google Scholar
Dongqing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Shiwei Tang
View author publications
You can also search for this author in PubMed Google Scholar
Meng Shuai
View author publications
You can also search for this author in PubMed Google Scholar
Kunqing Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiuli Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, G., Ma, X., Yang, D. et al. Efficient approaches for summarizing subspace clusters into k representatives. Soft Comput 15, 845–853 (2011). https://doi.org/10.1007/s00500-010-0552-8

Download citation

Published: 09 March 2010
Issue Date: May 2011
DOI: https://doi.org/10.1007/s00500-010-0552-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient approaches for summarizing subspace clusters into k representatives

Abstract

Access this article

Similar content being viewed by others

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Information-Theoretic Non-redundant Subspace Clustering

Finding Well-Clusterable Subspaces for High Dimensional Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient approaches for summarizing subspace clusters into k representatives

Abstract

Access this article

Similar content being viewed by others

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Information-Theoretic Non-redundant Subspace Clustering

Finding Well-Clusterable Subspaces for High Dimensional Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation