Skip to main content

Hierarchical Clustering via Penalty-Based Aggregation and the Genie Approach

  • Conference paper
  • First Online:
Modeling Decisions for Artificial Intelligence (MDAI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9880))

Abstract

The paper discusses a generalization of the nearest centroid hierarchical clustering algorithm. A first extension deals with the incorporation of generic distance-based penalty minimizers instead of the classical aggregation by means of centroids. Due to that the presented algorithm can be applied in spaces equipped with an arbitrary dissimilarity measure (images, DNA sequences, etc.). Secondly, a correction preventing the formation of clusters of too highly unbalanced sizes is applied: just like in the recently introduced Genie approach, which extends the single linkage scheme, the new method averts a chosen inequity measure (e.g., the Gini-, deĀ Vergottini-, or Bonferroni-index) of cluster sizes from raising above a predefined threshold. Numerous benchmarks indicate that the introduction of such a correction increases the quality of the resulting clusterings significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)

    MATHĀ  Google ScholarĀ 

  2. Aristondo, O., GarcĆ­a-Lapresta, J., de la Vega, C.L., Pereira, R.M.: Classical inequality indices, welfare and illfare functions, and the dual decomposition. Fuzzy Sets Syst. 228, 114ā€“136 (2013)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  3. Beliakov, G., Bustince, H., Calvo, T.: A Practical Guide to Averaging Functions. Springer, Heidelberg (2016)

    BookĀ  Google ScholarĀ 

  4. Bortot, S., Marques Pereira, R.: On a new poverty measure constructed from the exponential mean. In: Proceedings of IFSA/EUSFLATā€™15, pp. 333ā€“340. Atlantis Press (2015)

    Google ScholarĀ 

  5. Cena, A., Gagolewski, M.: Fuzzy K-minpen clustering and K-nearest-minpen classification procedures incorporating generic distance-based penalty minimizers. In: Carvalho, J.P., Lesot, M.-J., Kaymak, U., Vieira, S., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2016. CCIS, vol. 611, pp. 445ā€“456. Springer, Heidelberg (2016). doi:10.1007/978-3-319-40581-0_36

    ChapterĀ  Google ScholarĀ 

  6. Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2013)

    BookĀ  MATHĀ  Google ScholarĀ 

  7. Gagolewski, M.: Data Fusion: Theory, Methods, and Applications. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (2015)

    Google ScholarĀ 

  8. Gagolewski, M., Bartoszuk, M., Cena, A.: Genie: a new, fast, and outlier-resistant hierarchical clustering algorithm. Inf. Sci. 363, 8ā€“23 (2016)

    ArticleĀ  Google ScholarĀ 

  9. GarcĆ­a-Lapresta, J., Lasso de la Vega, C., Marques Pereira, R., Urrutia, A.: A new class of fuzzy poverty measures. In: Proceedings of IFSA/EUSFLAT 2015, pp. 1140ā€“1146. Atlantis Press (2015)

    Google ScholarĀ 

  10. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2013)

    MATHĀ  Google ScholarĀ 

  11. Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. Comput. J. 9(4), 373ā€“380 (1967)

    ArticleĀ  Google ScholarĀ 

  12. Legendre, P., Legendre, L.: Numerical Ecology. Elsevier Science BV, Amsterdam (2003)

    MATHĀ  Google ScholarĀ 

  13. MĆ¼llner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv:1109.2378 [stat.ML] (2011)

  14. Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Comput. 21, 1313ā€“1325 (1995)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  15. R Development Core Team: \({\sf {R}}\): A Language and Environment for Statistical Computing. \({\sf {R}}\) Foundation for Statistical Computing, Vienna (2016). http://www.R-project.org

Download references

Acknowledgments

This study was supported by the National Science Center, Poland, research project 2014/13/D/HS4/01700.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Gagolewski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gagolewski, M., Cena, A., Bartoszuk, M. (2016). Hierarchical Clustering via Penalty-Based Aggregation and the Genie Approach. In: Torra, V., Narukawa, Y., Navarro-Arribas, G., YaƱez, C. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2016. Lecture Notes in Computer Science(), vol 9880. Springer, Cham. https://doi.org/10.1007/978-3-319-45656-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45656-0_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45655-3

  • Online ISBN: 978-3-319-45656-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics