Skip to main content
Log in

A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Clustering being an unsupervised learning technique, has been used extensively for knowledge discovery due to its less dependency on domain knowledge. Many clustering techniques were proposed in the literature to recognize the cluster of different characteristics. Most of them become inadequate either due to their dependency on user-defined parameters or when they are applied on multi-scale datasets. Hybrid clustering techniques have been proposed to take the advantage of both Partitional and Hierarchical techniques by first partitioning the dataset into several dense sub-clusters and merging them into actual clusters. However, the universality of the partition and merging criteria are not sufficient to capture many characteristics of the clusters. Minimum spanning tree (MST) has been used extensively for clustering because it preserves the intrinsic nature of the dataset even after the sparsification of the graph. In this paper, we propose a parameter-free, minimum spanning tree based efficient hybrid clustering algorithm to cluster the multi-scale datasets. In the first phase, we construct a MST of the dataset to capture the neighborhood information of data points and employ box-plot, an outlier detection technique on MST edge weights for effectively selecting the inconsistent edges to partition the data points into several dense sub-clusters. In the second phase, we propose a novel merging criterion to find the genuine clusters by iteratively merging only the pairs of adjacent sub-clusters. The merging technique involves both dis-connectivity and intra-similarity using the topology of two adjacent pairs which helps to identify the arbitrary shape and varying density clusters. Experiment results on various synthetic and real world datasets demonstrate the superior performance of the proposed technique over other popular clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Bezdek, J.C., & Pal, N.R. (1998). Some new indexes of cluster validity. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 28(3), 301–315.

    Article  Google Scholar 

  • Blake, C., & Merz, C. (1998). Uci repository of machine learning databases [http://www.ics.uci.edu/mlearn/mlrepository.html], department of information and computer science, University of California, Irvine, CA, Vol. 55.

  • Chen, X. (2013). Clustering based on a near neighbor graph and a grid cell graph. Journal of Intelligent Information Systems, 40(3), 529–554.

    Article  Google Scholar 

  • Cheng, Q., Liu, Z., Huang, J., & Cheng, G. (2016a). Community detection in hypernetwork via density-ordered tree partition. Applied Mathematics and Computation, 276, 384–393.

    Article  MathSciNet  Google Scholar 

  • Cheng, Q., Lu, X., Liu, Z., Huang, J., & Cheng, G. (2016b). Spatial clustering with density-ordered tree. Physica A:, Statistical Mechanics and its Applications, 460, 188–200.

    Article  Google Scholar 

  • Chung, C.H., & Dai, B.R. (2014). A fragment-based iterative consensus clustering algorithm with a robust similarity. Knowledge and information systems, 41(3), 591–609.

    Article  Google Scholar 

  • Das, A.K., & Sil, J. (2007). Cluster validation using splitting and merging technique, International conference on computational intelligence and multimedia applications (ICCIMA 2007), vol. 2, pp. 56–60. IEEE.

  • Du, M., Ding, S., Xue, Y., & Shi, Z. (2019). A novel density peaks clustering with sensitivity of local density and density-adaptive metric. Knowledge and Information Systems, 59(2), 285–309.

    Article  Google Scholar 

  • Ester, M., Kriegel, H.P., Sander, J., Xu, X., & et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, vol. 96, pp. 226–231.

  • Grygorash, O., Zhou, Y., & Jorgensen, Z. (2006). Minimum spanning tree based clustering algorithms. In 18Th IEEE international conference on tools with artificial intelligence (ICTAI’06), pp. 73–81. IEEE.

  • Guha, S., Rastogi, R., & Shim, K. (1998). Cure: an efficient clustering algorithm for large databases. ACM Sigmod Record, 27(2), 73–84.

    Article  Google Scholar 

  • Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of intelligent information systems, 17(2-3), 107–145.

    Article  Google Scholar 

  • Hartigan, J.A., & Wong, M.A. (1979). Algorithm as 136: a k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100–108.

    MATH  Google Scholar 

  • Hu, W., & he Pan, Q. (2015). Data clustering and analyzing techniques using hierarchical clustering method. Multimedia Tools and Applications, 74(19), 8495–8504.

    Article  Google Scholar 

  • Hyde, R., & et al. (2015). Lancaster university clustering datasets. http://www.lancaster.ac.uk/pg/hyder/Downloads/downloads.html.

  • Jain, A.K., & Dubes, R.C. (1988). Algorithms for clustering data, Prentice-Hall, Inc.

  • Jiau, H.C., Su, Y.J., Lin, Y.M., & Tsai, S.R. (2006). Mpm: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data. Journal of Intelligent Information Systems, 26(2), 185–207.

    Article  Google Scholar 

  • Jothi, R., Mohanty, S.K., & Ojha, A. (2016). Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph. Computers in biology and medicine, 71, 135–148.

    Article  Google Scholar 

  • Jothi, R., Mohanty, S.K., & Ojha, A. (2016). On careful selection of initial centers for k-means algorithm. In Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, pp. 435–445. Springer.

  • Jothi, R., Mohanty, S.K., & Ojha, A. (2018). Fast approximate minimum spanning tree based clustering algorithm. Neurocomputing, 272, 542–557.

    Article  Google Scholar 

  • Karypis, G., Han, E.H., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8), 68–75.

    Article  Google Scholar 

  • Kavitha, E., & Tamilarasan, R. (2019). Agglo-hi clustering algorithm for gene expression micro array data using proximity measures. Multimedia Tools and Applications, 79, 9003–9017.

    Article  Google Scholar 

  • Koga, H., Ishibashi, T., & Watanabe, T. (2007). Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge and Information Systems, 12(1), 25–53.

    Article  Google Scholar 

  • Kriegel, H.P., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering. Wiley Interdisciplinary Reviews:, Data Mining and Knowledge Discovery, 1 (3), 231–240.

    Google Scholar 

  • Kumar, K.M., & Reddy, A.R.M. (2016). A fast dbscan clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognition, 58, 39–48.

    Article  Google Scholar 

  • Li, J., Wang, X., & Wang, X. (2019). A scaled-mst-based clustering algorithm and application on image segmentation, Journal of Intelligent Information Systems, pp 1–25. https://doi.org/10.1007/s10844-019-00572-x.

  • Li, X., Kao, B., Luo, S., & Ester, M. (2018). Rosc: Robust spectral clustering on multi-scale data. In Proceedings of the 2018 World Wide Web Conference, pp. 157–166.

  • Limwattanapibool, O., & Arch-int, S. (2017). Determination of the appropriate parameters for k-means clustering using selection of region clusters based on density dbscan (srcd-dbscan). Expert Systems, 34(3), 12204.

    Article  Google Scholar 

  • Lin, C.R., & Chen, M.S. (2005). Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Transactions on Knowledge and Data Engineering, 17(2), 145–159.

    Article  Google Scholar 

  • Mishra, G., & Mohanty, S. (2020). Rdmn: a relative density measure based on mst neighborhood for clustering multi-scale datasets, IEEE Transactions on Knowledge and Data Engineering, pp 1–1, https://doi.org/10.1109/TKDE.2020.2982400.

  • Mishra, G., & Mohanty, S.K. (2019). A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree. Expert Systems with Applications, 132, 28–43.

    Article  Google Scholar 

  • Otoo, E.J., Shoshani, A., & Hwang, S.w. (2001). Clustering high dimensional massive scientific datasets. Journal of Intelligent Information Systems, 17(2-3), 147–168.

    Article  Google Scholar 

  • Pasi, F., & et al. (2015). Clustering datasets. http://cs.uef.fi/sipu/datasets/.

  • Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), 846–850.

    Article  Google Scholar 

  • Schlitter, N., Falkowski, T., & Lässig, J. (2014). Dengraph-ho: a density-based hierarchical graph clustering algorithm. Expert Systems, 31(5), 469–479.

    Article  Google Scholar 

  • Tong, T., Zhu, X., & Du, T. (2019). Connected graph decomposition for spectral clustering. Multimedia Tools and Applications, 78(23), 33247–33259.

    Article  Google Scholar 

  • Wagner, S., & Wagner, D. (2007). Comparing clusterings: an overview. Universität Karlsruhe: Fakultät für Informatik Karlsruhe.

    Google Scholar 

  • Walker, M., & Chakraborti, S. (2013). An asymmetrically modified boxplot for exploratory data analysis. The University of Alabama: Department of Information Systems Statistics, and Management Science.

    Google Scholar 

  • Wang, X., Wang, X.L., Chen, C., & Wilkes, D.M. (2013). Enhancing minimum spanning tree-based clustering by removing density-based outliers. Digital Signal Processing, 23(5), 1523–1538.

    Article  MathSciNet  Google Scholar 

  • Wickham, H., & Stryjewski, L. (2011). 40 years of boxplots. Am Statistician.

  • Zahn, C.T. (1971). Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on computers, 100(1), 68–86.

    Article  Google Scholar 

  • Zhong, C., Miao, D., & Fränti, P. (2011). Minimum spanning tree based split-and-merge: a hierarchical clustering method. Information Sciences, 181(16), 3397–3410.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gaurav Mishra.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mishra, G., Mohanty, S.K. A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets. J Intell Inf Syst 55, 587–606 (2020). https://doi.org/10.1007/s10844-020-00602-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-020-00602-z

Keywords

Navigation