Abstract
Extracting a flat solution from a clustering hierarchy, as opposed to deriving it directly from data using a partitional clustering algorithm, is advantageous as it allows the hierarchical relationships between clusters and sub-clusters as well their stability across different hierarchical levels to be revealed before any decision on what clusters are more relevant is made. Traditionally, flat solutions are obtained by performing a global, horizontal cut through a clustering hierarchy (e.g. a dendrogram). This problem has gained special importance in the context of density-based hierarchical algorithms, because only sophisticated cutting strategies, in particular non-horizontal local cuts, are able to select clusters at different density levels. In this paper, we propose an adaptation of a variant of the Modularity Q measure, widely used in the realm of community detection in complex networks, so that it can be applied as an optimization criterion to the problem of optimal local cuts through clustering hierarchies. Our results suggest that the proposed measure is a competitive alternative, especially for high-dimensional data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
HDBSCAN* has an optional parameter \(m_{\text {ClSize}}\) that has not been used (\(m_{\text {ClSize}} = 1\)).
- 3.
An exception is \(m_{pts}= 4\), where Mod-Knn has outperformed Stability in most cases.
References
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., Englewood Cliffs (1988)
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis. Oxford University Press, Oxford (2001)
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985)
Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Min. Knowl. Discov. 27(3), 344–371 (2013)
Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data 10(1), 1–51 (2015)
Kriegel, H.P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. WIREs: Data Min. Knowl. Discov. 1(3), 231–240 (2011)
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: PAKDD, pp. 160–172 (2013)
Piekenbrock, M., Hahsler, M.: HDBSCAN with the ‘dbscan’ package. https://cran.r-project.org/web/packages/dbscan/vignettes/hdbscan.html (nd)
McInnes, L., Healy, J., Astels, S.: The ‘hdbscan’ clustering library (Python Scikit-learn docs). http://hdbscan.readthedocs.io/en/latest/index.html (nd)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Boudaillier, E., Hébrail, G.: Interactive interpretation of hierarchical clustering. Intell. Data Anal. 2, 229–244 (1998)
Ferraretti, D., Gamberoni, G., Lamma, E.: Automatic cluster selection using index driven search strategy. In: AI*IA, pp. 172–181 (2009)
Gupta, G., Liu, A., Ghosh, J.: Automated hierarchical density shaving: a robust automated clustering and visualization framework for large biological data sets. IEEE/ACM Trans. Comp. Biol. Bioinform. 7(2), 223–237 (2010)
Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)
Stuetzle, W., Nugent, R.: A generalized single linkage method for estimating the cluster tree of a density. J. Comp. Graph. Stat. 19(2), 397–418 (2010)
Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A.: Automatic extraction of clusters from hierarchical clustering representations. In: PAKDD, pp. 75–87 (2003)
Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 28(3), 301–315 (1998)
Jaskowiak, P.A., Moulavi, D., Furtado, A.C., Campello, R.J., Zimek, A., Sander, J.: On strategies for building effective ensembles of relative clustering validity criteria. Knowl. Inf. Syst. 47(2), 329–354 (2016)
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)
Feng, Z., Xu, X., Yuruk, N., Schweiger, T.A.J.: A novel similarity-based modularity function for graph partitioning. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 385–396. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74553-2_36
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.: Scan: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007)
Huang, J., Sun, H., Song, Q., Deng, H., Han, J.: Revealing density-based clustering structure from the core-connected tree of a network. IEEE Trans. Knowl. Data Eng. 25(8), 1876–1889 (2013)
Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)
Naldi, M.C., Campello, R.J.G.B., Hruschka, E.R., Carvalho, A.C.P.L.F.: Efficiency issues of evolutionary k-means. Appl. Soft Comput. 11(2), 1938–1952 (2011)
Paulovich, F.V., Nonato, L.G., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Vis. Comput. Graph. 14, 564–575 (2008)
Yeung, K., Fraley, C., Murua, A., Raftery, A., Ruzzo, W.: Model-based clustering and data transformations for gene expression data. Bioinf. 17(10), 977–987 (2001)
Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: Clustering gene-expression data with repeated measurements. Genome Biol. 4(5), R34 (2003)
Lichman, M.: UCI machine learn. Repository (2013). http://archive.ics.uci.edu/ml
Horta, D., Campello, R.J.G.B.: Automatic aspect discrimination in data clustering. Pattern Recognit. 45(12), 4370–4388 (2012)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Acknowledgements
CNPq and CAPES (Brazil), NSERC (Canada).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
dos Anjos, F.d.A.R., Gertrudes, J.C., Sander, J., Campello, R.J.G.B. (2019). A Modularity-Based Measure for Cluster Selection from Clustering Hierarchies. In: Islam, R., et al. Data Mining. AusDM 2018. Communications in Computer and Information Science, vol 996. Springer, Singapore. https://doi.org/10.1007/978-981-13-6661-1_20
Download citation
DOI: https://doi.org/10.1007/978-981-13-6661-1_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6660-4
Online ISBN: 978-981-13-6661-1
eBook Packages: Computer ScienceComputer Science (R0)