An extended EM algorithm for subspace clustering

Chen, Lifei; Jiang, Qingshan

doi:10.1007/s11704-008-0007-x

An extended EM algorithm for subspace clustering

Research Article
Published: 28 March 2008

Volume 2, pages 81–86, (2008)
Cite this article

Frontiers of Computer Science in China Aims and scope Submit manuscript

Lifei Chen¹ &
Qingshan Jiang²

55 Accesses
Explore all metrics

Abstract

Clustering high dimensional data has become a challenge in data mining due to the curse of dimensionality. To solve this problem, subspace clustering has been defined as an extension of traditional clustering that seeks to find clusters in subspaces spanned by different combinations of dimensions within a dataset. This paper presents a new subspace clustering algorithm that calculates the local feature weights automatically in an EM-based clustering process. In the algorithm, the features are locally weighted by using a new unsupervised weighting method, as a means to minimize a proposed clustering criterion that takes into account both the average intra-clusters compactness and the average inter-clusters separation for subspace clustering. For the purposes of capturing accurate subspace information, an additional outlier detection process is presented to identify the possible local outliers of subspace clusters, and is embedded between the E-step and M-step of the algorithm. The method has been evaluated in clustering real-world gene expression data and high dimensional artificial data with outliers, and the experimental results have shown its effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Berkhin P. A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M, eds. Grouping multidimensional data: recent advances in clustering. Berlin: Springer, 2006, 25–71
Chapter Google Scholar
Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 90–105
Article Google Scholar
Hinneburg A, Aggarwal C C, Kaim D. What is the nearest neighbor in high dimensional spaces. In: Proceedings of VLDB. Berlin: Springer, 2000, 506–515
Google Scholar
Dash M, Liu M, Yao J. Dimensionality reduction for unsupervised data. In: Proceedings of ICTAI. Newport Beach: IEEE Computer Society, 1997, 532–539
Google Scholar
Han E-H, Karypis G. Clustering in a high-dimensional space using hypergraph models. Technical Report, TR-97-063, Universyty of Minnesota, 1997
Aggarwal C C, Procopiuc C, Wolf J L, et al. Fast algorithm for projected clustering. In: Proceedings of ACM SIGMOD. New York: ACM, 1999, 61–72
Google Scholar
Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD. New York: ACM, 1998, 94–105
Google Scholar
Cheng C H, Fu A W, Zhang Y. Entropy-based subspace clustering for mining numerical data. In: Proceedings of ACM SIGKDD. New York: ACM, 1999, 84–93
Google Scholar
Goil S, Nagesh H, Choudhary A. Mafia: efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University, 1999
Domeniconi C, Gunopulos D, Ma S, et al. Locally adaptive metrics for clustering high dimensional data. Technical Report ISE-TR-06-04, 2006
Jing L, Ng M K, Xu J, et al. On the performance of feature weighting K-means for text subspace clustering. In: Proceedings of WAIM, 2005, 205–212
Wu C F J. On the convergence properties of the EM algorithm. Annals of Statistics, 1983, 11(1): 95–103
Article MATH MathSciNet Google Scholar
Friedman J H, Meulman J J. Clustering objects on subsets of attributes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2004, 66(4): 815–849
Article MATH MathSciNet Google Scholar
Candillier L, Tellier I, Torre F, et al. SuSE: subspace selection embedded in an EM algorithm. In: Proceedings of CAP, 2006, 331–345
Chen L F, Jiang Q S, Wang S R. A new unsupervised term weighting scheme for document clustering. Journal of Computational Information Systems, 2007, 3(4): 1455–1464
Google Scholar
Aggarwal C C, Yu P S. Outlier detection for high dimensional data. In: Proceedings of ACM SIGMOD. New York: ACM, 2001, 219–234
Google Scholar
Gan G, Wu J, Yang Z. A fuzzy subspace algorithm for clustering high dimensional data. LNAI, 2006, 4093: 271–278
Google Scholar
Sun H, Wang S, Jiang Q. FCM-based model selection algorithms for determining the number of clusters. Pattern Recognition, 2004, 37(10): 2027–2037
Article MATH Google Scholar
Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286: 531–537
Article Google Scholar
Gordon G J, Jensen R V, Hsiao L L, et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gege expression ratios in lung cancer and mesothelioma. Cancer Research, 2002, 62: 4963–4967
Google Scholar
Tan S, Cheng X, Ghanem M M, et al. A novel refinement approach for text categorization, In: Proceedings of ACM CIKM. New York: ACM, 2005, 469–476
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Xiamen University, Xiamen, 361005, China
Lifei Chen
Software School, Xiamen University, Xiamen, 361005, China
Qingshan Jiang

Authors

Lifei Chen
View author publications
Search author on:PubMed Google Scholar
Qingshan Jiang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Qingshan Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Jiang, Q. An extended EM algorithm for subspace clustering. Front. Comput. Sci. China 2, 81–86 (2008). https://doi.org/10.1007/s11704-008-0007-x

Download citation

Received: 02 September 2007
Accepted: 23 December 2007
Published: 28 March 2008
Issue Date: March 2008
DOI: https://doi.org/10.1007/s11704-008-0007-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An extended EM algorithm for subspace clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel algorithm for fast and scalable subspace clustering of high-dimensional data

A novel subspace outlier detection method by entropy-based clustering algorithm

Efficient Density-Based Subspace Clustering in High Dimensions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

An extended EM algorithm for subspace clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel algorithm for fast and scalable subspace clustering of high-dimensional data

A novel subspace outlier detection method by entropy-based clustering algorithm

Efficient Density-Based Subspace Clustering in High Dimensions

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now