Connectedness-based subspace clustering

Jain, Namita; Murthy, C. A.

doi:10.1007/s10115-018-1181-2

Connectedness-based subspace clustering

Regular Paper
Published: 20 March 2018

Volume 58, pages 9–34, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Namita Jain¹ &
C. A. Murthy¹

355 Accesses
3 Citations
Explore all metrics

Abstract

An algorithm for density-based subspace clustering of given data is proposed here. Unlike the existing density-based subspace clustering algorithms which find clusters using spatial proximity, existence of common high-density regions is the condition for grouping of features here. The proposed method is capable of finding subspace clusters based on both linear and nonlinear relationships between features. Unlike existing density-based subspace clustering algorithms, the values of parameters for density estimation need not be provided by the user. These values are calculated for each pair of features using data distribution in space corresponding to the particular pair of features. This allows proposed approach to find subspace clusters where relationship between different features exists at different scales. The performance of proposed algorithm is compared with other subspace clustering methods using artificial and real-life datasets. The proposed method is seen to find subspace clusters embedded in 5 artificial datasets with greater G score. It is also seen that the proposed method is able to find subspace clusters corresponding to known classes in 4 real-life datasets, with greater accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Article 22 April 2020

Efficient hybrid algorithms for density based subspace clustering to deal with density divergence for improved quality and conciseness

Article 30 October 2019

Subspace Clustering—A Survey

References

Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. SIGMOD Rec. 28(2):61–72. https://doi.org/10.1145/304181.304188
Article Google Scholar
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2):94–105. https://doi.org/10.1145/276305.276314
Article Google Scholar
Aguilar-Ruiz JS (2005) Shifting and scaling patterns from gene expression data. Bioinformatics 21(20):3840–3845. https://doi.org/10.1093/bioinformatics/bti641
Article Google Scholar
Ahmed HA, Mahanta P, Bhattacharyya DK, Kalita JK (2014) Shifting-and-scaling correlation based biclustering algorithm. IEEE ACM Trans Comput Biol Bioinform 11(6):1239–1252
Article Google Scholar
Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W, Bijnens L, Ghlmann HWH, Shkedy Z, Clevert D-A (2010) Fabia: factor analysis for bicluster acquisition. Bioinformatics 26:1520
Article Google Scholar
Bergmann S, Ihmels J, Barkai N (2003) Iterative signature algorithm for the analysis of large-scale gene expression. Phys Rev E Stat Nonlinear Soft Matter Phys 67:131902
Article Google Scholar
Carmona-Saez P, Pascual-Marqui RD, Tirado F, Carazo JM, Pascual-Montano A (2006) Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinform 7(1):78. https://doi.org/10.1186/1471-2105-7-78
Article Google Scholar
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology. AAAI Press, pp 93–103. http://dl.acm.org/citation.cfm?id=645635.660833
Cheung L, Yip KY, Cheung DW, Kao B, Ng MK (2005) On mining micro-array data by order-preserving submatrix. In: 21st International conference on data engineering workshops (ICDEW’05), pp 1153–1153
Costeira JP, Kanade T (1998) A multibody factorization method for independently moving objects. Int J Comput Vis 29(3):159–179. https://doi.org/10.1023/A:1008000628999
Article Google Scholar
Divina F, Aguilar-Ruiz JS (2006) Biclustering of expression data with evolutionary computation. IEEE Trans Knowl Data Eng 18(5):590–602
Article Google Scholar
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, KDD’96. AAAI Press, pp 226–231. http://dl.acm.org/citation.cfm?id=3001460.3001507
Gallo CA, Carballido JA, Ponzoni I (2009) Bihea: a hybrid evolutionary approach for microarray biclustering, In: Guimarães A, Katia S, Panchenko, Przytycka TM (eds) Proceedings of the advances in bioinformatics and computational biology: 4th Brazilian symposium on bioinformatics, BSB 2009, Porto Alegre, Brazil, July 29–31, 2009. Springer, Berlin, pp 36–47. https://doi.org/10.1007/978-3-642-03223-3
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129
Article Google Scholar
Hassani M, Hansen M (2015) subspace: interface to OpenSubspace. R package version 1.0.4. http://CRAN.R-project.org/package=subspace
Jain N, Murthy CA (2016) A new estimate of mutual information based measure of dependence between two variables: properties and fast implementation. Int J Mach Learn Cybern 7(5):857–875. https://doi.org/10.1007/s13042-015-0418-6
Article Google Scholar
Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the SIAM international Conference on data mining (SDM’04), vol 4
Kriegel H-P, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the fifth IEEE international conference on data mining, ICDM ’05. IEEE Computer Society, Washington, DC, USA, pp 250–257. https://doi.org/10.1109/ICDM.2005.5
Kriegel H-P, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1):1:1–1:58. https://doi.org/10.1145/1497577.1497578
Article Google Scholar
Kriegel H-P, Zimek A (2010) Subspace clustering, ensemble clustering, alternative clustering, multiview clustering: what can we learn from each other? In: Proceedings of the 1st international workshop on discovering, summarizing and using multiple clusterings (MultiClust) held in conjunction with KDD
Lazzeroni L, Owen A (2002) Plaid models for gene expression data. Stat Sin 12(1):61–86
MathSciNet MATH Google Scholar
Li G, Ma Q, Tang H, Paterson AH, Xu Y (2009) Qubic: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Res 37(15):e101. https://doi.org/10.1093/nar/gkp491
Article Google Scholar
Ling RF (1973) A probability theory of cluster analysis. J Am Stat Assoc 68(341):159–164
Article MathSciNet MATH Google Scholar
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE ACM Trans Comput Biol Bioinform 1(1):24–45
Article Google Scholar
Mandal DP, Murthy CA (1997) Selection of alpha for alpha-hull in \(\{\text{ R2 }\}\). Pattern Recognit 30(10):1759–1767
Article MATH Google Scholar
Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recognit 39(12):2464–2477
Article MATH Google Scholar
Moise G, Sander J, Ester M (2008) Robust projected clustering. Knowl. Inf. Syst. 14(3):273–298. https://doi.org/10.1007/s10115-007-0090-6
Article MATH Google Scholar
Müller AC, Nowozin S, Lampert CH (2012) Information theoretic clustering using minimum spanning trees. Springer, Berlin, pp 205–215
Google Scholar
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor. Newsl. 6(1):90–105. https://doi.org/10.1145/1007730.1007731
Article Google Scholar
Parzen E (1962) On estimation of a probability density function and mode. Annals Math Stat 33(3):1065–1076
Article MathSciNet MATH Google Scholar
Pontes B, Girldez R, Aguilar-Ruiz JS (2015) Biclustering on expression data: a review. J Biomed Inform 57:163–180
Article Google Scholar
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 16:1518–1524
Article MATH Google Scholar
Selim Jahan EJ (2015) Human development report 2015: work for human development. http://hdr.undp.org/en/content/human-development-report-2015-work-human-development
Seridi K, Jourdan L, Talbi EG (2011) Multi-objective evolutionary algorithm for biclustering in microarrays data. In: 2011 IEEE congress of evolutionary computation (CEC), pp 2593–2599
Sim K, Gopalkrishnan V, Zimek A, Cong G (2013) A survey on enhanced subspace clustering. Data Min Knowl Discov 26(2):332–397
Article MathSciNet MATH Google Scholar
Steele JM, Snyder TL (1989) Worst-case growth rates of some classical problems of combinatorial optimization. SIAM J Comput 18(2):278–287. https://doi.org/10.1137/0218019
Article MathSciNet MATH Google Scholar
SzéKely GJ, Rizzo ML (2009) Brownian distance covariance. Annals Appl Stat 3(4):1236–1265
Article MathSciNet MATH Google Scholar
Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:S136–S144
Article Google Scholar
Wang Z, Li G, Robinson RW, Huang X (2016) Unibic: sequential row-based biclustering algorithm for analysis of gene expression data. Scientific reports. https://doi.org/10.1038/srep23466
Yun T, Yi G-S (2013) Biclustering for the comprehensive search of correlated gene expression patterns using clustered seed expansion. BMC Genom 14:144
Article Google Scholar

Download references

Author information

Authors and Affiliations

Machine Intelligence Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, Kolkata, 700108, India
Namita Jain & C. A. Murthy

Authors

Namita Jain
View author publications
You can also search for this author in PubMed Google Scholar
C. A. Murthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Namita Jain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jain, N., Murthy, C.A. Connectedness-based subspace clustering. Knowl Inf Syst 58, 9–34 (2019). https://doi.org/10.1007/s10115-018-1181-2

Download citation

Received: 28 October 2016
Revised: 10 December 2017
Accepted: 14 March 2018
Published: 20 March 2018
Issue Date: 08 January 2019
DOI: https://doi.org/10.1007/s10115-018-1181-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Connectedness-based subspace clustering

Abstract

Access this article

Similar content being viewed by others

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Efficient hybrid algorithms for density based subspace clustering to deal with density divergence for improved quality and conciseness

Subspace Clustering—A Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Connectedness-based subspace clustering

Abstract

Access this article

Similar content being viewed by others

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Efficient hybrid algorithms for density based subspace clustering to deal with density divergence for improved quality and conciseness

Subspace Clustering—A Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation