A Modularity-Based Measure for Cluster Selection from Clustering Hierarchies

dos Anjos, Francisco de Assis Rodrigues; Gertrudes, Jadson Castro; Sander, Jörg; Campello, Ricardo J. G. B.

doi:10.1007/978-981-13-6661-1_20

A Modularity-Based Measure for Cluster Selection from Clustering Hierarchies

Francisco de Assis Rodrigues dos Anjos¹⁶,
Jadson Castro Gertrudes¹⁶,
Jörg Sander¹⁷ &
…
Ricardo J. G. B. Campello¹⁸

Conference paper
First Online: 16 February 2019

1100 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 996))

Abstract

Extracting a flat solution from a clustering hierarchy, as opposed to deriving it directly from data using a partitional clustering algorithm, is advantageous as it allows the hierarchical relationships between clusters and sub-clusters as well their stability across different hierarchical levels to be revealed before any decision on what clusters are more relevant is made. Traditionally, flat solutions are obtained by performing a global, horizontal cut through a clustering hierarchy (e.g. a dendrogram). This problem has gained special importance in the context of density-based hierarchical algorithms, because only sophisticated cutting strategies, in particular non-horizontal local cuts, are able to select clusters at different density levels. In this paper, we propose an adaptation of a variant of the Modularity Q measure, widely used in the realm of community detection in complex networks, so that it can be applied as an optimization criterion to the problem of optimal local cuts through clustering hierarchies. Our results suggest that the proposed measure is a competitive alternative, especially for high-dimensional data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://personalpages.manchester.ac.uk/staff/Julia.Handl/generators.html.
2.
HDBSCAN* has an optional parameter \(m_{\text {ClSize}}\) that has not been used (\(m_{\text {ClSize}} = 1\)).
3.
An exception is \(m_{pts}= 4\), where Mod-Knn has outperformed Stability in most cases.

References

Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., Englewood Cliffs (1988)
MATH Google Scholar
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)
Book Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)
MATH Google Scholar
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis. Oxford University Press, Oxford (2001)
MATH Google Scholar
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985)
Article Google Scholar
Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Min. Knowl. Discov. 27(3), 344–371 (2013)
Article MathSciNet Google Scholar
Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data 10(1), 1–51 (2015)
Article Google Scholar
Kriegel, H.P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. WIREs: Data Min. Knowl. Discov. 1(3), 231–240 (2011)
Google Scholar
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: PAKDD, pp. 160–172 (2013)
Google Scholar
Piekenbrock, M., Hahsler, M.: HDBSCAN with the ‘dbscan’ package. https://cran.r-project.org/web/packages/dbscan/vignettes/hdbscan.html (nd)
McInnes, L., Healy, J., Astels, S.: The ‘hdbscan’ clustering library (Python Scikit-learn docs). http://hdbscan.readthedocs.io/en/latest/index.html (nd)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Article Google Scholar
Boudaillier, E., Hébrail, G.: Interactive interpretation of hierarchical clustering. Intell. Data Anal. 2, 229–244 (1998)
Article Google Scholar
Ferraretti, D., Gamberoni, G., Lamma, E.: Automatic cluster selection using index driven search strategy. In: AI*IA, pp. 172–181 (2009)
Google Scholar
Gupta, G., Liu, A., Ghosh, J.: Automated hierarchical density shaving: a robust automated clustering and visualization framework for large biological data sets. IEEE/ACM Trans. Comp. Biol. Bioinform. 7(2), 223–237 (2010)
Article Google Scholar
Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)
Article MathSciNet Google Scholar
Stuetzle, W., Nugent, R.: A generalized single linkage method for estimating the cluster tree of a density. J. Comp. Graph. Stat. 19(2), 397–418 (2010)
Article MathSciNet Google Scholar
Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A.: Automatic extraction of clusters from hierarchical clustering representations. In: PAKDD, pp. 75–87 (2003)
Chapter Google Scholar
Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 28(3), 301–315 (1998)
Article Google Scholar
Jaskowiak, P.A., Moulavi, D., Furtado, A.C., Campello, R.J., Zimek, A., Sander, J.: On strategies for building effective ensembles of relative clustering validity criteria. Knowl. Inf. Syst. 47(2), 329–354 (2016)
Article Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)
Article MathSciNet Google Scholar
Feng, Z., Xu, X., Yuruk, N., Schweiger, T.A.J.: A novel similarity-based modularity function for graph partitioning. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 385–396. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74553-2_36
Chapter Google Scholar
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.: Scan: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007)
Google Scholar
Huang, J., Sun, H., Song, Q., Deng, H., Han, J.: Revealing density-based clustering structure from the core-connected tree of a network. IEEE Trans. Knowl. Data Eng. 25(8), 1876–1889 (2013)
Article Google Scholar
Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)
Article Google Scholar
Naldi, M.C., Campello, R.J.G.B., Hruschka, E.R., Carvalho, A.C.P.L.F.: Efficiency issues of evolutionary k-means. Appl. Soft Comput. 11(2), 1938–1952 (2011)
Google Scholar
Paulovich, F.V., Nonato, L.G., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Vis. Comput. Graph. 14, 564–575 (2008)
Article Google Scholar
Yeung, K., Fraley, C., Murua, A., Raftery, A., Ruzzo, W.: Model-based clustering and data transformations for gene expression data. Bioinf. 17(10), 977–987 (2001)
Article Google Scholar
Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: Clustering gene-expression data with repeated measurements. Genome Biol. 4(5), R34 (2003)
Article Google Scholar
Lichman, M.: UCI machine learn. Repository (2013). http://archive.ics.uci.edu/ml
Horta, D., Campello, R.J.G.B.: Automatic aspect discrimination in data clustering. Pattern Recognit. 45(12), 4370–4388 (2012)
Article Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article Google Scholar

Download references

Acknowledgements

CNPq and CAPES (Brazil), NSERC (Canada).

Author information

Authors and Affiliations

University of São Paulo, São Carlos, SP, Brazil
Francisco de Assis Rodrigues dos Anjos & Jadson Castro Gertrudes
University of Alberta, Edmonton, AB, Canada
Jörg Sander
University of Newcastle, Callaghan, NSW, Australia
Ricardo J. G. B. Campello

Authors

Francisco de Assis Rodrigues dos Anjos
View author publications
You can also search for this author in PubMed Google Scholar
Jadson Castro Gertrudes
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Sander
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo J. G. B. Campello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo J. G. B. Campello .

Editor information

Editors and Affiliations

School of Computing and Mathematics, Charles Sturt University, Albury, NSW, Australia
Rafiqul Islam
University of Auckland, Auckland, New Zealand
Yun Sing Koh
CSIRO Scientific Computing, Canberra, Australia
Yanchang Zhao
Data Science and Engineering, Australian Taxation Office, Canberra, Australia
Graco Warwick
Department of Information Technology, University of Wollongong, Wollongong, NSW, Australia
David Stirling
School of Computing and Mathematics, Charles Sturt University, Wagga Wagga, Australia
Chang-Tsun Li
School of Computing and Mathematics, Charles Sturt University, Bathurst, Australia
Zahidul Islam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

dos Anjos, F.d.A.R., Gertrudes, J.C., Sander, J., Campello, R.J.G.B. (2019). A Modularity-Based Measure for Cluster Selection from Clustering Hierarchies. In: Islam, R., et al. Data Mining. AusDM 2018. Communications in Computer and Information Science, vol 996. Springer, Singapore. https://doi.org/10.1007/978-981-13-6661-1_20

Download citation

DOI: https://doi.org/10.1007/978-981-13-6661-1_20
Published: 16 February 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6660-4
Online ISBN: 978-981-13-6661-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics