Skip to main content

A Modularity-Based Measure for Cluster Selection from Clustering Hierarchies

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 996))

Abstract

Extracting a flat solution from a clustering hierarchy, as opposed to deriving it directly from data using a partitional clustering algorithm, is advantageous as it allows the hierarchical relationships between clusters and sub-clusters as well their stability across different hierarchical levels to be revealed before any decision on what clusters are more relevant is made. Traditionally, flat solutions are obtained by performing a global, horizontal cut through a clustering hierarchy (e.g. a dendrogram). This problem has gained special importance in the context of density-based hierarchical algorithms, because only sophisticated cutting strategies, in particular non-horizontal local cuts, are able to select clusters at different density levels. In this paper, we propose an adaptation of a variant of the Modularity Q measure, widely used in the realm of community detection in complex networks, so that it can be applied as an optimization criterion to the problem of optimal local cuts through clustering hierarchies. Our results suggest that the proposed measure is a competitive alternative, especially for high-dimensional data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://personalpages.manchester.ac.uk/staff/Julia.Handl/generators.html.

  2. 2.

    HDBSCAN* has an optional parameter \(m_{\text {ClSize}}\) that has not been used (\(m_{\text {ClSize}} = 1\)).

  3. 3.

    An exception is \(m_{pts}= 4\), where Mod-Knn has outperformed Stability in most cases.

References

  1. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., Englewood Cliffs (1988)

    MATH  Google Scholar 

  2. Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)

    Book  Google Scholar 

  3. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)

    MATH  Google Scholar 

  4. Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis. Oxford University Press, Oxford (2001)

    MATH  Google Scholar 

  5. Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985)

    Article  Google Scholar 

  6. Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Min. Knowl. Discov. 27(3), 344–371 (2013)

    Article  MathSciNet  Google Scholar 

  7. Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data 10(1), 1–51 (2015)

    Article  Google Scholar 

  8. Kriegel, H.P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. WIREs: Data Min. Knowl. Discov. 1(3), 231–240 (2011)

    Google Scholar 

  9. Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: PAKDD, pp. 160–172 (2013)

    Google Scholar 

  10. Piekenbrock, M., Hahsler, M.: HDBSCAN with the ‘dbscan’ package. https://cran.r-project.org/web/packages/dbscan/vignettes/hdbscan.html (nd)

  11. McInnes, L., Healy, J., Astels, S.: The ‘hdbscan’ clustering library (Python Scikit-learn docs). http://hdbscan.readthedocs.io/en/latest/index.html (nd)

  12. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)

    Article  Google Scholar 

  13. Boudaillier, E., Hébrail, G.: Interactive interpretation of hierarchical clustering. Intell. Data Anal. 2, 229–244 (1998)

    Article  Google Scholar 

  14. Ferraretti, D., Gamberoni, G., Lamma, E.: Automatic cluster selection using index driven search strategy. In: AI*IA, pp. 172–181 (2009)

    Google Scholar 

  15. Gupta, G., Liu, A., Ghosh, J.: Automated hierarchical density shaving: a robust automated clustering and visualization framework for large biological data sets. IEEE/ACM Trans. Comp. Biol. Bioinform. 7(2), 223–237 (2010)

    Article  Google Scholar 

  16. Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)

    Article  MathSciNet  Google Scholar 

  17. Stuetzle, W., Nugent, R.: A generalized single linkage method for estimating the cluster tree of a density. J. Comp. Graph. Stat. 19(2), 397–418 (2010)

    Article  MathSciNet  Google Scholar 

  18. Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A.: Automatic extraction of clusters from hierarchical clustering representations. In: PAKDD, pp. 75–87 (2003)

    Chapter  Google Scholar 

  19. Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 28(3), 301–315 (1998)

    Article  Google Scholar 

  20. Jaskowiak, P.A., Moulavi, D., Furtado, A.C., Campello, R.J., Zimek, A., Sander, J.: On strategies for building effective ensembles of relative clustering validity criteria. Knowl. Inf. Syst. 47(2), 329–354 (2016)

    Article  Google Scholar 

  21. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)

    Google Scholar 

  22. Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  23. Feng, Z., Xu, X., Yuruk, N., Schweiger, T.A.J.: A novel similarity-based modularity function for graph partitioning. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 385–396. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74553-2_36

    Chapter  Google Scholar 

  24. Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.: Scan: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007)

    Google Scholar 

  25. Huang, J., Sun, H., Song, Q., Deng, H., Han, J.: Revealing density-based clustering structure from the core-connected tree of a network. IEEE Trans. Knowl. Data Eng. 25(8), 1876–1889 (2013)

    Article  Google Scholar 

  26. Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)

    Article  Google Scholar 

  27. Naldi, M.C., Campello, R.J.G.B., Hruschka, E.R., Carvalho, A.C.P.L.F.: Efficiency issues of evolutionary k-means. Appl. Soft Comput. 11(2), 1938–1952 (2011)

    Google Scholar 

  28. Paulovich, F.V., Nonato, L.G., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Vis. Comput. Graph. 14, 564–575 (2008)

    Article  Google Scholar 

  29. Yeung, K., Fraley, C., Murua, A., Raftery, A., Ruzzo, W.: Model-based clustering and data transformations for gene expression data. Bioinf. 17(10), 977–987 (2001)

    Article  Google Scholar 

  30. Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: Clustering gene-expression data with repeated measurements. Genome Biol. 4(5), R34 (2003)

    Article  Google Scholar 

  31. Lichman, M.: UCI machine learn. Repository (2013). http://archive.ics.uci.edu/ml

  32. Horta, D., Campello, R.J.G.B.: Automatic aspect discrimination in data clustering. Pattern Recognit. 45(12), 4370–4388 (2012)

    Article  Google Scholar 

  33. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  Google Scholar 

Download references

Acknowledgements

CNPq and CAPES (Brazil), NSERC (Canada).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo J. G. B. Campello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

dos Anjos, F.d.A.R., Gertrudes, J.C., Sander, J., Campello, R.J.G.B. (2019). A Modularity-Based Measure for Cluster Selection from Clustering Hierarchies. In: Islam, R., et al. Data Mining. AusDM 2018. Communications in Computer and Information Science, vol 996. Springer, Singapore. https://doi.org/10.1007/978-981-13-6661-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-6661-1_20

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-6660-4

  • Online ISBN: 978-981-13-6661-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics