Abstract
The paper discusses a generalization of the nearest centroid hierarchical clustering algorithm. A first extension deals with the incorporation of generic distance-based penalty minimizers instead of the classical aggregation by means of centroids. Due to that the presented algorithm can be applied in spaces equipped with an arbitrary dissimilarity measure (images, DNA sequences, etc.). Secondly, a correction preventing the formation of clusters of too highly unbalanced sizes is applied: just like in the recently introduced Genie approach, which extends the single linkage scheme, the new method averts a chosen inequity measure (e.g., the Gini-, deĀ Vergottini-, or Bonferroni-index) of cluster sizes from raising above a predefined threshold. Numerous benchmarks indicate that the introduction of such a correction increases the quality of the resulting clusterings significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)
Aristondo, O., GarcĆa-Lapresta, J., de la Vega, C.L., Pereira, R.M.: Classical inequality indices, welfare and illfare functions, and the dual decomposition. Fuzzy Sets Syst. 228, 114ā136 (2013)
Beliakov, G., Bustince, H., Calvo, T.: A Practical Guide to Averaging Functions. Springer, Heidelberg (2016)
Bortot, S., Marques Pereira, R.: On a new poverty measure constructed from the exponential mean. In: Proceedings of IFSA/EUSFLATā15, pp. 333ā340. Atlantis Press (2015)
Cena, A., Gagolewski, M.: Fuzzy K-minpen clustering and K-nearest-minpen classification procedures incorporating generic distance-based penalty minimizers. In: Carvalho, J.P., Lesot, M.-J., Kaymak, U., Vieira, S., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2016. CCIS, vol. 611, pp. 445ā456. Springer, Heidelberg (2016). doi:10.1007/978-3-319-40581-0_36
Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2013)
Gagolewski, M.: Data Fusion: Theory, Methods, and Applications. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (2015)
Gagolewski, M., Bartoszuk, M., Cena, A.: Genie: a new, fast, and outlier-resistant hierarchical clustering algorithm. Inf. Sci. 363, 8ā23 (2016)
GarcĆa-Lapresta, J., Lasso de la Vega, C., Marques Pereira, R., Urrutia, A.: A new class of fuzzy poverty measures. In: Proceedings of IFSA/EUSFLAT 2015, pp. 1140ā1146. Atlantis Press (2015)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2013)
Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. Comput. J. 9(4), 373ā380 (1967)
Legendre, P., Legendre, L.: Numerical Ecology. Elsevier Science BV, Amsterdam (2003)
MĆ¼llner, D.: Modern hierarchical, agglomerative clustering algorithms. arXiv:1109.2378 [stat.ML] (2011)
Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Comput. 21, 1313ā1325 (1995)
R Development Core Team: \({\sf {R}}\): A Language and Environment for Statistical Computing. \({\sf {R}}\) Foundation for Statistical Computing, Vienna (2016). http://www.R-project.org
Acknowledgments
This study was supported by the National Science Center, Poland, research project 2014/13/D/HS4/01700.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Gagolewski, M., Cena, A., Bartoszuk, M. (2016). Hierarchical Clustering via Penalty-Based Aggregation and the Genie Approach. In: Torra, V., Narukawa, Y., Navarro-Arribas, G., YaƱez, C. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2016. Lecture Notes in Computer Science(), vol 9880. Springer, Cham. https://doi.org/10.1007/978-3-319-45656-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-45656-0_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45655-3
Online ISBN: 978-3-319-45656-0
eBook Packages: Computer ScienceComputer Science (R0)