GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data

Mansoori, Eghbal G.

doi:10.1007/s00500-013-1105-8

GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data

Methodologies and Application
Published: 11 August 2013

Volume 18, pages 905–922, (2014)
Cite this article

Soft Computing Aims and scope Submit manuscript

Eghbal G. Mansoori¹

554 Accesses
12 Citations
Explore all metrics

Abstract

This paper proposes a grid-based hierarchical clustering algorithm (GACH) as an efficient and robust method to explore clusters in high-dimensional data with no prior knowledge. It discovers the initial positions of the potential clusters automatically and then combines them hierarchically to obtain the final clusters. In this regard, GACH first projects the data patterns on a two-dimensional space (i.e., on a plane established by two features) to overcome the curse of dimensionality problem in high-dimensional data. To choose these two well-informed features, a simple and fast feature selection algorithm is proposed. Then, through meshing the plane with grid lines, GACH detects the crowded grid points. The nearest data patterns around these grid points are considered as initial members of some potential clusters. By returning the patterns back to their true dimensions, GACH refines these clusters. In the merging phase, GACH combines the closely adjacent clusters in a hierarchical bottom-up manner to construct the final clusters’ members. The main features of GACH are: (1) it automatically discovers the clusters, (2) the obtained clusters are stable, (3) it is efficient for data sets with high dimensions, and (4) its merging process involves a threshold which can be obtained in advance for well-clustered data. To assess our proposed algorithm, it is applied on some benchmark data sets and the validity of obtained clusters is compared with the results of some other clustering algorithms. This comparison shows that GACH is accurate, efficient and feasible to discover clusters in high-dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic sub-space clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD International Conference on MOD, pp 94–105
Asuncion A, Newman DJ (2007) UCI machine learning repository. Department of Information and Computer Science, University of California, Irvine
Benson SYL, Hong Y (2007) Assessment of microarray data clustering results based on a new geometrical index for cluster validity. Soft Comput 11(4):341–348
Google Scholar
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Chandra B, Gupta M (2013) A novel approach for distance-based semi-supervised clustering using functional link neural network. Soft Comput 17(3):369–379
Google Scholar
Chang CI, Lin NP, Jan NY (2009) An axis shifted clustering algorithm. Tamkang J Sci Eng 12(2):183–192
Google Scholar
Everitt B, Landau S, Leese M (2001) Cluster analysis. Arnold, London
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Google Scholar
Hinneburg A, Keim D (1999) Optimal grid-clustering: toward breaking the curse of dimensionality in high-dimensional clustering. In: Proceedings of the 25th VLDB Conference, pp 506–517
Ilango M, Mohan V (2010) A survey of grid based clustering algorithms. Int J Eng Sci Tech 2(8):3441–3446
Google Scholar
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
Jarman IH, Etchells TA, Bacciu D, Garibaldi JM, Ellis IO, Lisboa PJG (2011) Clustering of protein expression data: a benchmark of statistical and neural approaches. Soft Comput 15(8):1459–1469
Article Google Scholar
Kohavi R, Provost F (1998) Glossary of terms. Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process 30(2/3)
Krishnapuram R, Keller JM (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110
Article Google Scholar
Mansoori EG (2011) FRBC: a fuzzy rule-based clustering algorithm. IEEE Trans Fuzzy Syst 19(5):960–971
Article Google Scholar
Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654
Article Google Scholar
McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of Fifth Berkeley Symposium, Math Statistics and Probability, pp 281–297
Monmarché N, Slimane M, Venturini G (1999) AntClass: discovery of clusters in numeric data by an hybridization of an ant colony with the Kmeans algorithm. Internal Report No 213, E3i
Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in \(k\)-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 29(3):503–507
Article Google Scholar
Ordonez C, Omiecinski E (2004) Efficient disk-based K-means clustering for relational databases. IEEE Trans Knowl Data Eng 16(8):909–921
Article Google Scholar
Schikuta E (1993) Grid-clustering: a hierarchical clustering method for very large data sets. In: Technical Report TR-CRPC No. 93358, Center for Research on Parallel Computation, Rice University, Houston
Sheikholeslami G, Chatterjee S, Zhang A (2000) WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. VLDB J Very Large Data Bases 8:289–304
Article Google Scholar
Schockaert S, De-Cock M, Cornelis C, Kerr EE (2007) Clustering web search results using fuzzy ants. Int J Intell Syst 22(5):455–474
Article MATH Google Scholar
Sledge IJ, Havens TC, Huband JM, Bezdek JC, Keller JM (2009) Finding the number of clusters in ordered dissimilarities. Soft Comput 13(12):1125–1142
Article Google Scholar
Vicente D, Vellido A (2004) A review of hierarchical models for data clustering and visualization. In: Gir’aldez R, Riquelme JC, Aguilar-Ruiz JS (eds) Tendencias de la Minería de Datos en España. Red Española de Minería de Datos
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar
Yager R, Filev D (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24(8):1279–1284
Article Google Scholar
Yang W, Muntz R, Wang W, Yang J (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of 23rd International Conference on VLDB, pp 186–195
Yue S, Wei M, Wang J, Wang H (2008) A general grid-clustering approach. Pattern Recognit Lett 29:1372–1384
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
Eghbal G. Mansoori

Authors

Eghbal G. Mansoori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eghbal G. Mansoori.

Additional information

Communicated by W. Pedrycz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mansoori, E.G. GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data. Soft Comput 18, 905–922 (2014). https://doi.org/10.1007/s00500-013-1105-8

Download citation

Published: 11 August 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s00500-013-1105-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation