Abstract
Hierarchical clustering (HC for short) outputs a dendrogram that offers more topological information than flat clustering (e.g., k-means). However, the existing HC algorithms focus on either the quality of the dendrogram or the ability of mining arbitrary shaped clusters. To address the above two aspects simultaneously, we present HICMEN by adopting (1) the classic agglomerative clustering framework that can generate a complete dendrogram, and (2) a novel similarity measure based on mutual k-nearest neighbors to capture the connectivity of data points and help properly merge up each arbitrary shaped cluster piece by piece. More importantly, we prove that the similarity measure has a nice property called weak monotonicity, which guarantees the quality of the dendrogram generated by HICMEN. Extensive experimental results show that HICMEN is capable of mining arbitrary shaped clusters effectively, and can simultaneously output a high quality dendrogram.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ankerst, M.: OPTICS: ordering points to identify the clustering structure. In: SIGMOD, pp. 49–60 (1999)
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD, pp. 29–38 (2003)
Chaoji, V., Hasan, M.A., Salem, S., Zaki, M.J.: SPARCL: an efficient and effective shape-based clustering. Knowl. Inf. Syst. 21(2), 201–229 (2009)
Chaoji, V., Li, G., Yildirim, H., Zaki, M.J.: ABACUS: mining arbitrary shaped clusters from large datasets based on backbone identification. In: SDM, pp. 295–306 (2011)
Chen, Y.-A., Tripathi, L.P., Dessailly, B.H., Nyström-Persson, J., Ahmad, S., Mizuguchi, K.: Integrated pathway clusters with coherent biological themes for target prioritisation. Plos One 9(6), e99030 (2014)
Correa, C.D., Lindstrom, P.: Locally-scaled spectral clustering using empty region graphs. In: KDD, pp. 1330–1338 (2012)
Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SDM, pp. 47–58 (2003)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
SIPU Clustering datasets. http://cs.joensuu.fi/sipu/datasets/
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: ICDE, pp. 512–521 (1999)
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. Inf. Syst. 26(1), 35–58 (2001)
Houle, M.E.: The relevant-set correlation model for data clustering. In: SDM, pp. 775–786 (2008)
Hu, T., Liu, C., Tang, Y., Sun, J., Song, H., Sung, S.Y.: High-dimensional clustering: a clique-based hypergraph partitioning frameworks. Knowl. Inf. Syst. 39(1), 61–88 (2014)
Huang, H., Gao, Y., Chen, L., Li, R., Chiew, K., He, Q.: Browse with a social web directory. In: SIGIR, pp. 865–868 (2013)
Huang, H., Gao, Y., Chiew, K., Chen, L., He, Q.: Towards effective and efficient mining of arbitrary shaped clusters. In: ICDE, pp. 28–39 (2014)
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: hierarchical clustering using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)
Li, J., Xia, Y., Shan, Z., Liu, Y.: Scalable constrained spectral clustering. IEEE Trans. Knowl. Data Eng. 27(2), 589–593 (2015)
Mok, P.K., Huang, H.Q., Kwok, Y.L., Au, J.S.: A robust adaptive clustering analysis method for automatic identification of clusters. Pattern Recogn. 45(8), 3017–3033 (2012)
Alex, R., Alessandro, L.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16(1), 30–34 (1973)
Sokal, R.R., Rohlf, F.J.: The comparison of dendrograms by objective methods. Taxon 11(2), 33–40 (1962)
Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inf. Process. Manag. 22(6), 465–476 (1985)
Yang, Y., Ma, Z., Yang, Y., Nie, F., Shen, H.T.: Multitask spectral clustering by exploring intertask correlation. IEEE Trans. Cybern. 45(5), 1069–1080 (2015)
Kim, Y., Shim, K., Kim, M.-S., Lee, J.S.: DBCURE-MR: an efficient density-based clustering algorithm for large data using MapReduce. Inf. Syst. 42, 15–35 (2014)
Acknowledgements
This work was supported in part by NSFC Grants (61502347, 61502504, 61522208, 61572376, 61472359, 61379033, 61373038, and 61364025), the Fundamental Research Funds for the Central Universities (2015XZZX005-07, 2015XZZX004-18, and 2042015kf0038), and the Research Funds for Introduced Talents of WHU.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Huang, H. et al. (2016). Mining Arbitrary Shaped Clusters and Outputting a High Quality Dendrogram. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9827. Springer, Cham. https://doi.org/10.1007/978-3-319-44403-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-44403-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44402-4
Online ISBN: 978-3-319-44403-1
eBook Packages: Computer ScienceComputer Science (R0)