A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

Mishra, Gaurav; Mohanty, Sraban Kumar

doi:10.1007/s10844-020-00602-z

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

Published: 22 April 2020

Volume 55, pages 587–606, (2020)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Gaurav Mishra¹ &
Sraban Kumar Mohanty¹

536 Accesses
4 Citations
Explore all metrics

Abstract

Clustering being an unsupervised learning technique, has been used extensively for knowledge discovery due to its less dependency on domain knowledge. Many clustering techniques were proposed in the literature to recognize the cluster of different characteristics. Most of them become inadequate either due to their dependency on user-defined parameters or when they are applied on multi-scale datasets. Hybrid clustering techniques have been proposed to take the advantage of both Partitional and Hierarchical techniques by first partitioning the dataset into several dense sub-clusters and merging them into actual clusters. However, the universality of the partition and merging criteria are not sufficient to capture many characteristics of the clusters. Minimum spanning tree (MST) has been used extensively for clustering because it preserves the intrinsic nature of the dataset even after the sparsification of the graph. In this paper, we propose a parameter-free, minimum spanning tree based efficient hybrid clustering algorithm to cluster the multi-scale datasets. In the first phase, we construct a MST of the dataset to capture the neighborhood information of data points and employ box-plot, an outlier detection technique on MST edge weights for effectively selecting the inconsistent edges to partition the data points into several dense sub-clusters. In the second phase, we propose a novel merging criterion to find the genuine clusters by iteratively merging only the pairs of adjacent sub-clusters. The merging technique involves both dis-connectivity and intra-similarity using the topology of two adjacent pairs which helps to identify the arbitrary shape and varying density clusters. Experiment results on various synthetic and real world datasets demonstrate the superior performance of the proposed technique over other popular clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data clustering: application and trends

Article 27 November 2022

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Article 09 February 2021

Comprehensive survey on hierarchical clustering algorithms and the recent developments

Article 26 December 2022

References

Bezdek, J.C., & Pal, N.R. (1998). Some new indexes of cluster validity. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 28(3), 301–315.
Article Google Scholar
Blake, C., & Merz, C. (1998). Uci repository of machine learning databases [http://www.ics.uci.edu/mlearn/mlrepository.html], department of information and computer science, University of California, Irvine, CA, Vol. 55.
Chen, X. (2013). Clustering based on a near neighbor graph and a grid cell graph. Journal of Intelligent Information Systems, 40(3), 529–554.
Article Google Scholar
Cheng, Q., Liu, Z., Huang, J., & Cheng, G. (2016a). Community detection in hypernetwork via density-ordered tree partition. Applied Mathematics and Computation, 276, 384–393.
Article MathSciNet Google Scholar
Cheng, Q., Lu, X., Liu, Z., Huang, J., & Cheng, G. (2016b). Spatial clustering with density-ordered tree. Physica A:, Statistical Mechanics and its Applications, 460, 188–200.
Article Google Scholar
Chung, C.H., & Dai, B.R. (2014). A fragment-based iterative consensus clustering algorithm with a robust similarity. Knowledge and information systems, 41(3), 591–609.
Article Google Scholar
Das, A.K., & Sil, J. (2007). Cluster validation using splitting and merging technique, International conference on computational intelligence and multimedia applications (ICCIMA 2007), vol. 2, pp. 56–60. IEEE.
Du, M., Ding, S., Xue, Y., & Shi, Z. (2019). A novel density peaks clustering with sensitivity of local density and density-adaptive metric. Knowledge and Information Systems, 59(2), 285–309.
Article Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X., & et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, vol. 96, pp. 226–231.
Grygorash, O., Zhou, Y., & Jorgensen, Z. (2006). Minimum spanning tree based clustering algorithms. In 18Th IEEE international conference on tools with artificial intelligence (ICTAI’06), pp. 73–81. IEEE.
Guha, S., Rastogi, R., & Shim, K. (1998). Cure: an efficient clustering algorithm for large databases. ACM Sigmod Record, 27(2), 73–84.
Article Google Scholar
Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of intelligent information systems, 17(2-3), 107–145.
Article Google Scholar
Hartigan, J.A., & Wong, M.A. (1979). Algorithm as 136: a k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100–108.
MATH Google Scholar
Hu, W., & he Pan, Q. (2015). Data clustering and analyzing techniques using hierarchical clustering method. Multimedia Tools and Applications, 74(19), 8495–8504.
Article Google Scholar
Hyde, R., & et al. (2015). Lancaster university clustering datasets. http://www.lancaster.ac.uk/pg/hyder/Downloads/downloads.html.
Jain, A.K., & Dubes, R.C. (1988). Algorithms for clustering data, Prentice-Hall, Inc.
Jiau, H.C., Su, Y.J., Lin, Y.M., & Tsai, S.R. (2006). Mpm: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data. Journal of Intelligent Information Systems, 26(2), 185–207.
Article Google Scholar
Jothi, R., Mohanty, S.K., & Ojha, A. (2016). Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph. Computers in biology and medicine, 71, 135–148.
Article Google Scholar
Jothi, R., Mohanty, S.K., & Ojha, A. (2016). On careful selection of initial centers for k-means algorithm. In Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, pp. 435–445. Springer.
Jothi, R., Mohanty, S.K., & Ojha, A. (2018). Fast approximate minimum spanning tree based clustering algorithm. Neurocomputing, 272, 542–557.
Article Google Scholar
Karypis, G., Han, E.H., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8), 68–75.
Article Google Scholar
Kavitha, E., & Tamilarasan, R. (2019). Agglo-hi clustering algorithm for gene expression micro array data using proximity measures. Multimedia Tools and Applications, 79, 9003–9017.
Article Google Scholar
Koga, H., Ishibashi, T., & Watanabe, T. (2007). Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge and Information Systems, 12(1), 25–53.
Article Google Scholar
Kriegel, H.P., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering. Wiley Interdisciplinary Reviews:, Data Mining and Knowledge Discovery, 1 (3), 231–240.
Google Scholar
Kumar, K.M., & Reddy, A.R.M. (2016). A fast dbscan clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognition, 58, 39–48.
Article Google Scholar
Li, J., Wang, X., & Wang, X. (2019). A scaled-mst-based clustering algorithm and application on image segmentation, Journal of Intelligent Information Systems, pp 1–25. https://doi.org/10.1007/s10844-019-00572-x.
Li, X., Kao, B., Luo, S., & Ester, M. (2018). Rosc: Robust spectral clustering on multi-scale data. In Proceedings of the 2018 World Wide Web Conference, pp. 157–166.
Limwattanapibool, O., & Arch-int, S. (2017). Determination of the appropriate parameters for k-means clustering using selection of region clusters based on density dbscan (srcd-dbscan). Expert Systems, 34(3), 12204.
Article Google Scholar
Lin, C.R., & Chen, M.S. (2005). Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Transactions on Knowledge and Data Engineering, 17(2), 145–159.
Article Google Scholar
Mishra, G., & Mohanty, S. (2020). Rdmn: a relative density measure based on mst neighborhood for clustering multi-scale datasets, IEEE Transactions on Knowledge and Data Engineering, pp 1–1, https://doi.org/10.1109/TKDE.2020.2982400.
Mishra, G., & Mohanty, S.K. (2019). A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree. Expert Systems with Applications, 132, 28–43.
Article Google Scholar
Otoo, E.J., Shoshani, A., & Hwang, S.w. (2001). Clustering high dimensional massive scientific datasets. Journal of Intelligent Information Systems, 17(2-3), 147–168.
Article Google Scholar
Pasi, F., & et al. (2015). Clustering datasets. http://cs.uef.fi/sipu/datasets/.
Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), 846–850.
Article Google Scholar
Schlitter, N., Falkowski, T., & Lässig, J. (2014). Dengraph-ho: a density-based hierarchical graph clustering algorithm. Expert Systems, 31(5), 469–479.
Article Google Scholar
Tong, T., Zhu, X., & Du, T. (2019). Connected graph decomposition for spectral clustering. Multimedia Tools and Applications, 78(23), 33247–33259.
Article Google Scholar
Wagner, S., & Wagner, D. (2007). Comparing clusterings: an overview. Universität Karlsruhe: Fakultät für Informatik Karlsruhe.
Google Scholar
Walker, M., & Chakraborti, S. (2013). An asymmetrically modified boxplot for exploratory data analysis. The University of Alabama: Department of Information Systems Statistics, and Management Science.
Google Scholar
Wang, X., Wang, X.L., Chen, C., & Wilkes, D.M. (2013). Enhancing minimum spanning tree-based clustering by removing density-based outliers. Digital Signal Processing, 23(5), 1523–1538.
Article MathSciNet Google Scholar
Wickham, H., & Stryjewski, L. (2011). 40 years of boxplots. Am Statistician.
Zahn, C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on computers, 100(1), 68–86.
Article Google Scholar
Zhong, C., Miao, D., & Fränti, P. (2011). Minimum spanning tree based split-and-merge: a hierarchical clustering method. Information Sciences, 181(16), 3397–3410.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, India
Gaurav Mishra & Sraban Kumar Mohanty

Authors

Gaurav Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Sraban Kumar Mohanty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gaurav Mishra.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mishra, G., Mohanty, S.K. A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets. J Intell Inf Syst 55, 587–606 (2020). https://doi.org/10.1007/s10844-020-00602-z

Download citation

Received: 06 October 2019
Revised: 27 March 2020
Accepted: 29 March 2020
Published: 22 April 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10844-020-00602-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

Abstract

Access this article

Similar content being viewed by others

Data clustering: application and trends

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Comprehensive survey on hierarchical clustering algorithms and the recent developments

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

Abstract

Access this article

Similar content being viewed by others

Data clustering: application and trends

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Comprehensive survey on hierarchical clustering algorithms and the recent developments

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation